#ffmpeg-devel on 2025-05-08 — irc logs at libera.catirclogs.org

2025-03-03 01:04 michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct

00:20 thilo has quit [Ping timeout: 260 seconds]

00:22 thilo has joined #ffmpeg-devel

00:28 Kei_N_ has joined #ffmpeg-devel

00:30 Kei_N has quit [Ping timeout: 244 seconds]

00:36 <jamrial> jkqxz: openapv just accepts any kind of input and will always report profile_idc 33, lol

00:37 <jamrial> maybe we should remove the yuv444p10 support until the library is a bit more mature

01:02 <fflogger> [editedticket] Balling: Ticket #11578 ([ffmpeg] Waveform discontinuity in decoded E-AC-3) updated https://trac.ffmpeg.org/ticket/11578#comment:1

01:09 <fflogger> [editedticket] j7n: Ticket #11578 ([ffmpeg] Waveform discontinuity in decoded E-AC-3) updated https://trac.ffmpeg.org/ticket/11578#comment:2

01:20 mkver has quit [Ping timeout: 244 seconds]

01:50 System_Error has quit [Ping timeout: 264 seconds]

03:16 Martchus has joined #ffmpeg-devel

03:18 Martchus_ has quit [Ping timeout: 276 seconds]

03:21 jamrial has quit []

05:23 <fflogger> [editedticket] MasterQuestionable: Ticket #11557 ([avcodec] Abnormally choked loading for "non-existing PPS 0 referenced"?) updated https://trac.ffmpeg.org/ticket/11557#comment:19

05:52 <fflogger> [newticket] SYamaguchi: Ticket #11579 ([ffplay] The tints appear to be different when the same image with different resolutions is played with ffplay.) created https://trac.ffmpeg.org/ticket/11579

06:04 <fflogger> [editedticket] wazer: Ticket #11557 ([avcodec] Abnormally choked loading for "non-existing PPS 0 referenced"?) updated https://trac.ffmpeg.org/ticket/11557#comment:20

06:17 _whitelogger has joined #ffmpeg-devel

06:28 _whitelogger has joined #ffmpeg-devel

06:57 _whitelogger has joined #ffmpeg-devel

07:30 TheVibeCoder has joined #ffmpeg-devel

07:49 <TheVibeCoder> Now, make it a Killer-Feature!

08:42 hbbs has quit [Quit: bye]

08:50 <fflogger> [editedticket] Balling: Ticket #11578 ([ffmpeg] Waveform discontinuity in decoded E-AC-3) updated https://trac.ffmpeg.org/ticket/11578#comment:3

08:54 hbbs has joined #ffmpeg-devel

08:54 hbbs has quit [Changing host]

08:54 hbbs has joined #ffmpeg-devel

09:13 rvalue- has joined #ffmpeg-devel

09:13 rvalue has quit [Ping timeout: 248 seconds]

09:15 System_Error has joined #ffmpeg-devel

09:19 rvalue- is now known as rvalue

10:10 <fflogger> [editedticket] MasterQuestionable: Ticket #11557 ([avcodec] Abnormally choked loading for "non-existing PPS 0 referenced"?) updated https://trac.ffmpeg.org/ticket/11557#comment:21

10:11 <TheVibeCoder> Master & Balling

10:20 mkver has joined #ffmpeg-devel

10:22 <fflogger> [editedticket] MasterQuestionable: Ticket #11579 ([ffplay] The tints appear to be different when the same image with different resolutions is played with ffplay.) updated https://trac.ffmpeg.org/ticket/11579#comment:2

10:34 kunkku has joined #ffmpeg-devel

10:38 TheVibeCoder has quit [Quit: Client closed]

10:38 TheVibeCoder has joined #ffmpeg-devel

10:41 minimal has joined #ffmpeg-devel

11:20 <Lynne> I think it was too early to give softworkz push access

11:30 \\Mr_C\\ has joined #ffmpeg-devel

11:49 secondcreek has quit [Ping timeout: 260 seconds]

12:10 TheVibeCoder has quit [Quit: Client closed]

12:22 jamrial has joined #ffmpeg-devel

12:27 <kasper93> what happened?

12:58 <fflogger> [editedticket] wazer: Ticket #11557 ([avcodec] Abnormally choked loading for "non-existing PPS 0 referenced"?) updated https://trac.ffmpeg.org/ticket/11557#comment:22

13:15 av500 has quit [Remote host closed the connection]

13:15 av500 has joined #ffmpeg-devel

13:31 <fflogger> [newticket] juanitotc: Ticket #11580 ([build system] ffmpeg-7.1.1 build fails with nasm) created https://trac.ffmpeg.org/ticket/11580

13:55 <haasn> wow

13:56 <haasn> 2 bit gray is almost indistiguishable from 10 bit gray on my display, with temporal dithering on a 64x64 blue noise texture

13:56 <haasn> at 1 bit depth you can definitely tell the dither pattern but the 240 Hz temporal dithering smooth it out so well that even 2 bits is basically visually transparent

13:57 <haasn> that's kinda insane, you definitely don't get that level of smoothness at 60 Hz

13:57 <haasn> but the eye just fuses the temporal dither pattern into a single dither pattern with a much higher resolution

13:57 <haasn> anyway, I've determined experimentally that 64x64 blue noise dither provides equivalent or better quality compared to error diffusion

13:58 <haasn> especially in motion

13:59 <haasn> with temporal dither, it's even better than ED

14:01 <jkqxz> jamrial: Yeah, I think only 422-10 profile is correct.

14:02 <jkqxz> They also need to sort out the stability, they can't have regular ABI breaks if it is being packaged. (E.g. see top PR now.)

14:03 <jamrial> yeah, saw that

14:06 <jamrial> jkqxz: their headers also don't even report a version, so...

14:06 <jamrial> could be a good oportunity for them to clean them up and introduce a version define once it's stable

14:09 <haasn> well, with non-temporal dithering even 64x64 blue noise is a bit "rougher" than ED at 1 bit depth, particularly for midtones

14:09 <haasn> though this is a flawed comparison anyway because we are not doing gamma aware dithering (I'm working on that)

14:20 cone-227 has joined #ffmpeg-devel

14:20 <cone-227> ffmpeg James Almer master:244ad944e947: avcodec/liboapvenc: remove 4:4:4 support until it's properly handled

14:54 <ramiro> haasn: when inputting yuv444p12le, the max values are set to 65535. shouldn't it be 4095, or are we accepting that the input might be invalid?

14:55 <haasn> I’m undecided on the issue

14:56 <haasn> I think I may split it into max possible and max legal

14:56 <haasn> Well, no, that wouldn’t really help

14:58 <ramiro> because, currently, -src yuv444p12le -dst yuv444p16le goes through f32 and scale, but it could be done with no converting and a simple left shift.

15:09 <ramiro> haasn: I have 3 classes of converters that are still slower: 1) simple shuffles (but I'm almost done with this one using your shuffle_solver), 2) the issue I mentioned above, converting from a smaller yuv444p to one with more bits (where a simple left should would be faster), and 3) conversions that only add or remove alpha planes (such as yuv444p -> yuva444p), where a wrapper with memcpy and memset

15:09 <ramiro> would be faster.

15:20 jamrial has quit [Ping timeout: 244 seconds]

15:23 jamrial has joined #ffmpeg-devel

15:33 <haasn> I'll revive the dedicated memcpy backend

15:34 <haasn> the only case when it's not faster is when a plane needs to be duplicated, e.g. gray -> yuvj

15:34 <haasn> gray -> gbrp rather

15:36 <haasn> ramiro: I think the best way around the unnecessary clamp issue is to require SWS_OP_LUT to clamp its own input if it may exceed the LUT range

15:36 <haasn> e.g. doing a 10-bit LUT lookup on a uint16_t input

15:38 <haasn> now that we have access to the SwsFormat in the SwsOpList I can actually cleanly infer the expected signal range inside the optimizer

15:38 <haasn> that's something we simply didn't have access to before, which is why I made it based on the pixel range instead of the legal range

15:49 <haasn> ramiro: https://0x0.st/8JgZ.diff this solves yuv444p10 -> yuv444p16

15:49 <haasn> for the shuffle solver, should we lift some portions of it to the common code?

15:50 <haasn> since I assume you're copy/pasting it atm

16:03 <haasn> ramiro: something we also don't handle completely atm is the alpha_blend_mode, e.g. blending to checkerboard

16:03 <haasn> currently we just completely drop the alpha channel when converting e.g. rgba -> rgb24; which is obviously not desirable in practice

16:17 <haasn> ramiro: pushed ff_sws_solve_shuffe() and the above fix to haasn/swscale6

16:37 th3synth4x has joined #ffmpeg-devel

16:38 th3synth4x has quit [Client Quit]

16:41 minimal has quit [Quit: Leaving]

16:57 <haasn> ramiro: implemented a memcpy backend, 25% faster for yuv444p -> yuva444p (now matches reference)

17:00 <haasn> and 32% faster for gray -> yuvj444p (memset chroma)

17:01 <haasn> I wonder how we can handle e.g. gray -> yuv444p, which still wants memset on the chroma; one of the things I'm thinking in the back of my mind is that we want some mechanism for trying to separate planes from each other

17:02 <haasn> we want this anyway for e.g. turning gray -> yuvj into a refcopy

17:06 <haasn> pushed it to swscale6 as well, give it a try

17:08 <ramiro> haasn: thanks for "swscale/optimizer: use legal value range for determining clamp requirements", now yuv444p10 -> yuv444p16 is much faster, but it's still not as fast as legacy. I believe it's because legacy does it one plane at a time. do we already have in place a mechanism to detect when the planes don't depend on each other?

17:08 <haasn> not yet, that's what I was just talking about :)

17:08 <haasn> how is doing it one plane at a time faster?

17:08 <ramiro> haasn: oh, I started writing that message before I read your last messages :P

17:09 <haasn> also how much faster are we talking about?

17:10 <ramiro> haasn: I guess it's faster to do one plane at a time because then you don't have to access 3 memory regions at once. do the entire first plane, then entire second plane, and then entire third plane...

17:10 <ramiro> but that's just a guess, I haven't written code to test that yet

17:11 <ramiro> haasn: asmjit code is currently 0.855x slower compared to planarCopyWrapper (which is pure c and does one pixel at a time, so I suspect we could be much faster)

17:11 <haasn> I will add a 4x4 dependency matrix for starters

17:11 <haasn> that way we can try to split planes in general, maybe it's always faster to process one plane at a time?

17:12 <ramiro> perhaps people more knowledgeable in the inner workings of many differet CPUs can give us a better answer

17:12 <ramiro> Lynne: ^^

17:14 <haasn> fun, memcpy backend doesn't pass checkasm because it "over-writes" into the stride area

17:14 <haasn> I guess we actually _don't_ want to check for that

17:16 <haasn> or rather, we should check for over-write only after the last line

17:16 <BtbN> Isn't that exactly what that area is there for? :D

17:17 <BtbN> including after the last line

17:17 <ramiro> haasn: "backend_murder" :P

17:17 <BtbN> if I see a frame with a linesize/stride of 1024, I'd expect to be able to write up to the full 1024 bytes each line without any averse effects

17:20 cone-227 has quit [Quit: transmission timeout]

17:21 <haasn> right

18:00 <ramiro> haasn: https://github.com/ramiropolla/ffmpeg/commit/b392afe5d6e0a00daae6d6d0aaff313d6d7f4618

18:01 <Lynne> averne: grats

18:02 <Lynne> ramiro: what's the question?

18:04 <haasn> Lynne: tl;dr ramiro suspects that for (i < size) { y[i] <<= 6; u[i] <<= 6; v[i] <<= 6; } is slower than for (i < size) y[i] <<= 6; for (i < size) u[i] <<= 6; for (i < size) v[i] <<= 6;

18:04 <ramiro> Lynne: is it always faster to process one plane at a time, when they're independent, than processing them all at once? for example ld1/lsl/st1 per plane, or ld1/ld1/ld1/lsl/lsl/lsl/st1/st1/st1

18:13 <Lynne> for arm, the mantra is "unused registers are wasted; instruction decoding and binary is cheap"

18:13 <Lynne> can you do both?

18:16 <ramiro> Lynne: sure, but what about memory access? reading from 3 planes and writing from 3 planes at once, or reading and writing from 1 plane at a time?

18:16 <Lynne> as for the if+shifts, the former is faster imho unless you're in a brach heavy code (predictor has limited res) and somehow the compiler can cmov all

18:16 <ramiro> because currently planarCopyWrapper is faster than a very tight neon loop that does 3 planes at once.

18:17 <Lynne> cpu should be able to pipeline that, shouldn't it

18:18 <Lynne> also if on an in-order cpu, you can just manually move the loads and spread them out

18:18 IndecisiveTurtle has joined #ffmpeg-devel

18:21 <jkqxz> I would guess the two approaches are equal on a big-core CPU, but the all-in-one-loop might have pathological edge cases because of memory aliasing being caught out.

18:22 <haasn> ramiro: what is that solving?

18:22 <jkqxz> Each step is short and independent, so the CPU can happily fill up its rename capacity with however many of them fit regardless of whether they are together or not.

18:23 <jkqxz> But you could fall over in the together loop if it ever accidentally thinks that some of the paths alias (because most of the address bits are the same or something), and that will have huge negative consequences.

18:25 Teukka has quit [Read error: Connection reset by peer]

18:29 Teukka has joined #ffmpeg-devel

18:29 Teukka has quit [Changing host]

18:29 Teukka has joined #ffmpeg-devel

18:29 <ramiro> haasn: for a clear_val of 0x80, the end result would be 0x81, 0x82, 0x83... with this patch it always sets 0x80 as clear_val.

18:30 <haasn> ah gotcha

18:30 <haasn> that case was ignored in the x86 backend because only the high bit mattered

18:30 <ramiro> haasn: not much of an issue since they're all treated the same by pshufb and tbl, but still. it makes the assembly cleaner.

18:36 iive has joined #ffmpeg-devel

19:53 rvalue- has joined #ffmpeg-devel

19:54 rvalue has quit [Ping timeout: 272 seconds]

19:59 Guest71 has joined #ffmpeg-devel

20:00 rvalue- is now known as rvalue

20:25 mkver has quit [Ping timeout: 252 seconds]

20:28 Traneptora has joined #ffmpeg-devel

20:49 Guest71 has quit [Quit: Client closed]

21:01 mkver has joined #ffmpeg-devel

21:04 novaphoenix has quit [Quit: i quit]

21:04 novaphoenix has joined #ffmpeg-devel

21:22 lemourin has quit [Ping timeout: 245 seconds]

21:23 IndecisiveTurtle has quit [Ping timeout: 265 seconds]

21:29 IndecisiveTurtle has joined #ffmpeg-devel

21:33 IndecisiveTurtle has quit [Ping timeout: 265 seconds]

21:38 novaphoenix has quit [Quit: i quit]

21:39 lemourin has joined #ffmpeg-devel

21:48 novaphoenix has joined #ffmpeg-devel

22:15 <fflogger> [newticket] cus: Ticket #11581 ([avformat] WAV demuxer codec probe misdetects PCM data as MP3) created https://trac.ffmpeg.org/ticket/11581

23:03 <fflogger> [newticket] Anton1699: Ticket #11582 ([ffmpeg] Please add an option to make the new "elapsed" stat optional) created https://trac.ffmpeg.org/ticket/11582

23:08 IndecisiveTurtle has joined #ffmpeg-devel