michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
iive has quit [Quit: They came for me...]
<steven-netint>
hi everyone, does anyone know if HW accelerators like qsv/nvenc/amf are tested in FATE? and, are they tested on every patch, nightly, or only during releases?
<jamrial>
no, they are not
<jamrial>
fate tests don't cover hw or external modules
minimal has quit [Quit: Leaving]
<frankplow>
averne: Is there some syntax element which can be used to indicate more than 1 coefficient is insignificant? HEVC and VVC have similar scan orders, and there it is because the TU is divided into CGs aka subblocks
<BtbN>
steven-netint: how would a fate test for that even work? The output can chance on any driver update.
<BtbN>
*change
<BtbN>
you can test hwdecs though, but we don't do that either
<steven-netint>
im curious how other HW accel vendors would ensure their code is not broken after a ffmpeg patch in framework, etc.
<frankplow>
averne: I don't know if I would say it's a hybrid of Morton and zig-zag scans, for 2x2 scans (intra-subblock and inter-subblock scans in the top-left 4x4 region) the zig-zag scan and Morton scan are equivalent so you can view the whole thing as a two-level hierarchy of zig-zag scans of different sizes
<BtbN>
not sure what you mean
<BtbN>
FFmpeg has little to no influence on the output of the hwencs
<BtbN>
it can set parameters, sure. But what's done with those is out of its control
<steven-netint>
i was thinking I could probably setup a nightly regression pulling from FFmpeg-master, compile with the hw accel, and run internal tests against it. If test failure is found, check-in a patch to fix
<BtbN>
And how do you determine if the breakage is in FFmpeg or the driver?
<BtbN>
There is very little FFmpeg can or will do to break the hwenc wrappers
<steven-netint>
look through FFmpeg-master commit history, analyze code change which could've caused breakage, likely fix vendor code as necessary
<BtbN>
seems like largely pointless effort to me
<BtbN>
I can't recall any of the hwenc ever having been just randomly broken
<BtbN>
They're mostly just thin wrappers with not all that much actual work going on
<steven-netint>
BtbN: my hwcontext was affected by a change in fftools after n7.1 :'(
<BtbN>
in fftools?
<BtbN>
your hwcontext? what?
<steven-netint>
its the netint HW accel codecs/filters that aren't in FFmpeg-master
<steven-netint>
Though, I'm trying to upstream it now on the mailing-list :)
<BtbN>
so you mean some internal API got broken? Cause that can happen at any time.
<BtbN>
There is no guarantee on anything that's not public API, it can change at any moment
<steven-netint>
I fixed the issue in my hwcontext. Basically there was a bug which would not allow HWframes to be free'd after enc instance is closed. It was revealed by a commit on master branch (64f3feb) which closed enc instance before filter session. I fixed it now, but I was just thinking about how i can catch these kinds of issues in the future.
<steven-netint>
*bug in my hwcontext
<BtbN>
that's hard to say for out of tree internal code
<BtbN>
Cause it can break in near unlimited ways
<steven-netint>
yes, that is so. But, i'd think other large HW vendors would want to test FFmpeg master branch frequently to make sure their codecs/filters aren't broken when it comes time to make a FFmpeg release, no?
<steven-netint>
Curious what they're doing about it if anyone knows
<BtbN>
I don't think any hw vendors are directly involved like that
<BtbN>
AMD and Nvidia both have engineers working on FFmpeg, but that's limited to occasional patches for new features and consultation for problems we run into
<BtbN>
Intel probably has the largest team and effort going on, with their ffmpeg cartwheels and everything
System_Error has quit [Ping timeout: 244 seconds]
System_Error has joined #ffmpeg-devel
cone-459 has quit [Quit: transmission timeout]
rvalue has quit [Read error: Connection reset by peer]
rvalue has joined #ffmpeg-devel
<CounterPillow>
I don't think a lot of places pay for mailing list toddler tantrums (the only one I know of is the bcachefs Patreon) so I'd say corporate involvement in ffmpeg has a fairly bleak outlook.
Martchus has joined #ffmpeg-devel
Martchus_ has quit [Ping timeout: 252 seconds]
jamrial has quit []
Mirarora has joined #ffmpeg-devel
Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]
mkver has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
System_Error has quit [Remote host closed the connection]
TheVibeCoder has joined #ffmpeg-devel
TheVibeCoder has quit [Changing host]
TheVibeCoder has joined #ffmpeg-devel
<kasper93>
BtbN: I don't agree testing hwdec is completely useless, if anything you can monitor status of it and go in-front of user reports that something "doesn't work"
<kasper93>
also "thin wrapper" is maybe true for some hwdec
<kasper93>
but for example Vulkan is huge and over the years had multitude fixes in ffmpeg itself
<Lynne>
at least 2 rewrites
<beastd>
steven-netint,BtbN,kasper93: I would say in general it would be possible to at least test some basic scenarios and check if it runs expected with no errors (or with errors if that is expected) and
<beastd>
no crashes
<beastd>
maybe some more superficial stuff could be checked.
<steven-netint>
thanks for the input. I'm definitely interested in keeping issues from becoming user reports so I'll likely setup internal nightly tests of FFmpeg master branch with my hw codecs/filters once its upstreamed
<mkver>
Lynne: ff_opus_rc_enc_end() can call ff_opus_rc_put_raw() to write 32 bits, which leads to an abort in av_zero_extend() if assert_level >= 2 (and is also not supported otherwise).
microchip_ has quit [Quit: There is no spoon!]
kasper93 has joined #ffmpeg-devel
microchip_ has joined #ffmpeg-devel
averne has joined #ffmpeg-devel
Anthony_ZO has quit [Remote host closed the connection]
<kierank>
04:04:46 <CounterPillow> I don't think a lot of places pay for mailing list toddler tantrums (the only one I know of is the bcachefs Patreon) so I'd say corporate involvement in ffmpeg has a fairly bleak outlook.
<kierank>
LOOOL
averne has quit [Quit: quit]
averne has joined #ffmpeg-devel
<averne>
frankplow: "Is there some syntax element which can be used to indicate more than 1 coefficient is insignificant" -> yeah, AC coefficients are run-level encoded, with the runs being all zeroes except for the last element. There is no block subdivision, the transform always operates on 8x8 blocks
<frankplow>
averne: In HEVC/VVC you have these subblock significance flags which sit between the usual run-level significance coding and individual coefficient flags. The transform itself still operates on the entire block, the subblocks are only relevant when coding the coefficients.
<frankplow>
averne: But it doesn't sound like that's the case here. As I understand it the subblocks have two roles: bitrate reduction for large blocks via the significance flags, and they're easier to implement in hardware for varying block sizes (particularly in the case of VVC). For a codec with only 8x8 transform block sizes of course neither apply so I'm not sure why they'd lay out the scan like that.
<averne>
frankplow: ah I see, but yeah it doesn't sound like wwhat prores does. In general it's low-complexity and focuses strongly on efficient decoding rather than compression
<BtbN>
AV_PIX_FMT_YUV444P16 is also a weird case. It uses that for both 10 and 12 bit, cause there is no equivalent format in FFmpeg
<BtbN>
AV_PIX_FMT_YUV444P10 and AV_PIX_FMT_YUV444P12 expect the data in the LSB iirc, but nvidia puts it into the MSB, like with the Pxxx formats
secondcreek has quit [Remote host closed the connection]
secondcreek has joined #ffmpeg-devel
System_Error has joined #ffmpeg-devel
<ePirat>
BtbN, I think it should be fine at least with a major bump?
<BtbN>
Yeah, gonna have to cook up version guards to it bumps automatically on the next major bump
<ePirat>
mkver, do you have an opinion on my tee refactor?
<ePirat>
I really want to make av_dict_get const and its the only thing in the way
<fflogger>
[editedticket] nyanmisaka: Ticket #11655 ([avcodec] Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video) updated https://trac.ffmpeg.org/ticket/11655#comment:9
<BtbN>
Is there some magic somewhere, that if you enter just "p010" as a format= filter, that it appends the native endianness?
<BtbN>
There sure is, right in av_get_pix_fmt
<kasper93>
averne: I was responding to "but I wouldn't know how to test hwencs"
<Lynne>
TheVibeCoder: nethier version 0 nor 1 uses each tile's qscale value?
<Lynne>
why do they write that in the bitstream?
mkver has quit [Ping timeout: 265 seconds]
<TheVibeCoder>
its always fixed value?
<TheVibeCoder>
so not actually qscale?
mkver has joined #ffmpeg-devel
ngaullier has quit [Remote host closed the connection]
<mkver>
ePirat: I don't consider the macro to be unreadable and think that your patch does not make it more readable; but I agree that abusing the AVDictionary API should stop.
<Lynne>
TheVibeCoder: I'm seeing a difference here, between RAW and RAW HQ
<Lynne>
16429 (raw) vs 16399 (raw hq)
<Lynne>
the raw (not hq) image also looks washed out, which leads me to believe that qscale is used somehow
<ePirat>
mkver, I dont mind keeping the macro stuff if people prefer that, but I really want to get rid of it fiddling with the dict internals, especially for nearly no gain here
<Lynne>
(qscale - 16384) >> 1 as a constant qmat seems to fit both, but this is just a guesswork on my part
<mkver>
kasper93: Do you think it would be good to use plain malloc instead of av_malloc() for allocations in lavu with a dedicated deallocator (like AVBufferRef, AVBuffer, AVFrame, AVDictionary) that don't need to be overaligned?
<TheVibeCoder>
also add cached AVFrame allocations
<TheVibeCoder>
these are performance killer
<TheVibeCoder>
very old bitcoin wallets reactivated?
<TheVibeCoder>
someone cracked them?
<kasper93>
mkver: possibly, can't tell if there would be tangible gain
E81l7HT8T7sF9JdA has quit [Quit: Leaving]
<BtbN>
oh god... my addition of a "Data in MSB" pixel format completely confuses swscale
<BtbN>
It works if I disable ASM, which makes it infinitely worse
<BtbN>
I don't understand how the x86 assembly path can seemingly set one general yuv2yuvX to yuv2planeX and yuv2plane1, but the C path has a million different functions
<BtbN>
Where does it get all the needed info from
<TheVibeCoder>
haasn: what happened with swscale2?
minimal has joined #ffmpeg-devel
<BtbN>
This is quite baffling. Does the assembly parts of swscale just assume they can handle all formats?
<BtbN>
I see very little format checks there, it just blindly sets c->yuv2planeX and c->yuv2plane1 if the CPU supports the extensions
<BtbN>
Also, I added debug prints to me C conversion function. And despite it not being used cause of the ASM, the debug print still happened? Does it just scale twice, one pointless C run?
<jamrial>
maybe an autoinserted scaler?
<BtbN>
No, I'm adding a new pixfmt to swscale
<BtbN>
oh, you mean it's scaling twice?
<BtbN>
But why would one instance use the asm function, and the other the c one?
<BtbN>
I'm a bit baffled that this is such a problem for swscale, given stuff like P010 exists, which already has stuff in the MSB
bsFFFFFF has joined #ffmpeg-devel
<jamrial>
BtbN: for 10bit, the ASSIGN_VSCALEX_FUNC macro checks that isSemiPlanarYUV(dstfmt) is false before setting anything
<jamrial>
in x86/swscale.c
<jamrial>
so p010 is not covered
<jamrial>
for this new fmt, you may need to add a isDataInHighBits() check
<BtbN>
But ff_sws_init_swscale_x86 near instantly sets c->yuv2planeX, without ANY conditions
<BtbN>
except the CPU supporting the instructions
<BtbN>
Which I think is the relevant function when converting _to_ yuv444p10msb, given I also had to set it in output.c for the C variant?
<jamrial>
only if use_mmx_vfilter is set it seems
<BtbN>
This is such a mess, my god
<compnn>
can you set the testsrc2 output pixfmt ?
* compnn
runs
<TheVibeCoder>
yes, but testsrc2 supports only some formats
<BtbN>
jamrial: adding !isDataInHighBits into there seems to have done it
<BtbN>
but this seems insanely brittle
<TheVibeCoder>
rm -rf libswscale/
witchymary has joined #ffmpeg-devel
<jamrial>
BtbN: the whole logic for setting those function pointers is madness, yeah
<BtbN>
I think it works now
<BtbN>
but my god
kasper93 has quit [Quit: kasper93]
kasper93 has joined #ffmpeg-devel
bsFFFFFF has quit [Quit: bsFFFFFF]
kurosu has quit [Quit: Connection closed for inactivity]
<kierank>
what is yuv444p10msb
<kierank>
surely that's yuv444p16?
<kierank>
(kinda)
<BBB>
it shows that ffmpeg is just a wrapper of other things nowadays...
* BBB
runs
<BtbN>
Well, it is yuv444p16, but 10 bit.
<BtbN>
Same as P010 is P016 but 10 bit.
Mirarora has joined #ffmpeg-devel
<BtbN>
I don't think that format has an actual name anywhere. Better ideas for its name are welcome.
<jkqxz>
Depends on range - it's yuv444p16 * ((2^16-2^6) / (2^16-1)) as full range, so can only be used interchangably if you don't mind a bit of error.
<jkqxz>
(Also things might care about ensuring that the low bits don't contain anything funny.)
<BtbN>
It's been causing issues that nvdec/nvenc use yuv444p16, where it's actually only 10 or 12 bit, with the lowest bits just zeroed out
<BtbN>
So I'd kinda like to get away from that
<BtbN>
AV_PIX_FMT_P012 and AV_PIX_FMT_P212 are the semi-planar equivalent
<jkqxz>
Won't nvenc use a 2:10:10:10 4-byte container anyway? Microsoft mandates that for hardware 4:4:4, so it would mess with all use on windows if nvenc didn't match.
<jkqxz>
For 12, yes, the layout will be the same as 16.
<BtbN>
nvenc only supports 8 or 10 bit, no 12 bit so far
<BtbN>
And for 10 bit, the input format is either P010 or this new YUV444P10MSB
<BtbN>
The bigger problem is how nvdec outputs 10 and 12 bit 444 content exclusively in YUV444P10MSB, which right now is simply pretended to be AV_PIX_FMT_YUV444P16
<jkqxz>
Lol. So it can't do interop with D3D and all microsoft stuff? Great plan.
<BtbN>
hm?
<BtbN>
For interop with D3D stuff, they added AV_PIX_FMT_X2RGB10 support
<jkqxz>
Then just always use that and ignore this new format which nothing else cares about?
<BtbN>
How would I use an RGB format when decoding YUV video?
<BtbN>
And again, nvdec _exclusively_ decodes 10 and 12 bit 4:4:4 content to YUV444P10MSB
mkver has quit [Ping timeout: 265 seconds]
<jkqxz>
Microsoft mandates that YUV 4:4:4 is decoded to 2:10:10:10, so nvidia must implement that to do D3D. Do they not expose it in nvidialand?
<BtbN>
for decoding? no
<BtbN>
4:4:4 ends up as AV_PIX_FMT_YUV444P, AV_PIX_FMT_YUV444P10MSB or AV_PIX_FMT_YUV444P12MSB
<jkqxz>
Can you ask them to expose the hardware which gives you Y410 in that API, since the hardware certainly does it for D3D?
<jkqxz>
Yes, because we don't want people to think there is an empty alpha channel there.
<BtbN>
Actually, cuviddec seems to have grown a cudaVideoSurfaceFormat_P216 at some point
<BtbN>
So for 4:2:2 it's a non-issue
<BtbN>
It's specifically 4:4:4 where they invented a new format
<BtbN>
And adding anything to that API will have a round trip time of multiple years at best
<jkqxz>
The P216 is still going to mess with you on the range (do you need to multiply by (2^16-2^4)/(2^16-1) or not?).
<BtbN>
multiply?
<BtbN>
It's documented to have to LSB zeroed out
<BtbN>
which matches how our pix_fmts are defined
<jkqxz>
Yes. So if you read a P016 value which was 12-bit at source then you need to correct for the fact that the low bits are zero but you want it to map to 1.
<jkqxz>
(In a GPU sampler, most notably.)
<BtbN>
I don't understand why
<BtbN>
our AV_PIX_FMT_P010LE/BE says "zeros in the low bits". And nvdec delivers zeros in the low bits.
<BtbN>
So mapping their P016 to P010 and P012 seems like it works perfectly
<jkqxz>
If you sample a 16-bit value as UNORM then it maps 16'b1111_1111_1111_1111 -> 1.0. But that's wrong, because you want 16'b1111_1111_1111_0000 -> 1.0.
<jkqxz>
This is well-understood on P010 as well, but since the pixfmt tells you the origin depth you can apply the correction.
<jkqxz>
But if you are pretending all of these are P016 then it goes wrong.
<BtbN>
The nvdec API is explicitly documented to return zeros in the unused low bits when decoding 10 or 12 bit content and selecting P016/P216/YUV444P_16BIT
<BtbN>
So I just set nvdec to return cudaVideoSurfaceFormat_P016, and tell the ffmpeg side it's P010 or P012 respectively. And it's a perfect match.
<jkqxz>
Yes, so sample 16'b1111_1111_1111_0000 as UNORM and you get .99977 rather than the 1.0 which you wanted
<BtbN>
I don't know what you mean
<BtbN>
it's never accessed as 16 bit value. All sides agree on the format.
<jkqxz>
Ok, so you are telling the consumer that it is P010 or P012 so they can sample correctly?
<jkqxz>
If you return a AV_PIX_FMT_P016 then it is wrong because the consumer needs to know the source format.
<BtbN>
Like I said, the cuvid/nvdec API only has P016, P216 and YUV444_16Bit as possible output settings.
<BtbN>
But it's documented to zero out the low bits in each, depending on the contents bit depth
<BtbN>
So when cuvid is set to P016 for 10 bit content, P010 comes out, or P012 for 12 bit content.
<BtbN>
Guess it saves them a few enum values or something.
<BtbN>
The odd one out is cudaVideoSurfaceFormat_YUV444_16Bit...
<BtbN>
Cause they just followed the pattern of all the Pxxx formats, and stuffed the data in the MSB, and zeroed out the unused LSB.
<BtbN>
Which is not a format that exists anywhere else but in nvidia land as far as I can tell
<iive>
if you extend from e.g. 12 to 16 bits, you copy some of the valid bits into the "empty" part.
<iive>
e.g. 0x55a could become 0x55a5 or 0x55aa
<jkqxz>
It sounds like you can't avoid adding a new pixfmt for that in each bit depth if the API is like that.
<iive>
this way 0xfff becomes 0xffff
<iive>
and 0x000 stays 0x0000
<BtbN>
Yeah, which is how yuv444p10msb and yuv444p12msb were born :D
<jkqxz>
iive: Yes, if you have software to do the extension then you copy the high bits into the low bits (xyz0 -> xyzx), but in this case it's all in GPU surfces which can't be easily modified like that.
<BtbN>
it's yuv444p16 with data in the high bits and zeroed out low bits
lemourin has joined #ffmpeg-devel
<BtbN>
It's not hard to modify it, but even to modify it I need to first get it out of the decoder in that format
<BtbN>
that's kinda the whole idea, I need to get it out of the hwdec in the correct format, so scale_cuda can turn it into something normal with a cuda kernel
<BtbN>
Which is exactly what I'm trying to fix with the new formats.
<BtbN>
I just needed an equivalent to P010/P210 and friends for YUV444P16
<BtbN>
At least I think I can't sensibly switch those pixel formats outside of a major bump... Cause it's very much an API break if the decoder suddenly returns a different pixel format.
lemourin has quit [Client Quit]
lemourin has joined #ffmpeg-devel
<iive>
BtbN, can't you have both pixel formats? The api used to probe multiple format until one is accepted.
<BtbN>
well, the API for that negotiates the CUDA pix_fmts
<BtbN>
I don't think there is anything to negotiate the sw_format inside of the CUDA one
<BtbN>
I also think it's not really applicable here, since the currently returned formats are just flat out wrong
<iive>
:}
<BtbN>
There just was nothing better when it was originally implemented
Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]
<BtbN>
There is some code in scale_cuda that just assumed YUV444P16 is 10 bit
<BtbN>
cause for the longest time, that'd always hold true cause nothing else was supported
<BtbN>
but now it could be 12 bit as well. Or someone could upload actual 16 bit content from elsewhere
Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]
<jamrial>
BtbN: can we ask nvidia to support outputting p416?
<jamrial>
instead of adding these msb planar formats
<BtbN>
I do plan to ask them to support more sane formats
<BtbN>
but like I said, the turnaround times for that are LONG, and it would also mean people with not even that old hardware will never be able to use it
<BtbN>
Cause legacy drivers won't ever gain those new features
<jamrial>
cards that only support legacy drivers probably can't decode 10 and 12bit 4:4:4 :p
bwu25 has joined #ffmpeg-devel
<jamrial>
kinda weird that they output semiplanar for everything but 4:4:4
<BtbN>
Well, by the time a feature addition like that would ever see the light of day, they will
<BtbN>
1000 series are legacy now
Mirarora has joined #ffmpeg-devel
<BtbN>
yeah, that choice of format is SUPER weird
MisterMinister has joined #ffmpeg-devel
Mirarora has quit [Quit: Mirarora encountered a fatal error and needs to close]
kasper93_ has joined #ffmpeg-devel
kasper93 is now known as Guest4536
Guest4536 has quit [Killed (calcium.libera.chat (Nickname regained by services))]