michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
halloy5771 has quit [Read error: Connection reset by peer]
<cone-867>
ffmpeg Emma Worley master:854b8690a628: Add myself to MAINTAINERS for dxv/dxvenc
<fflogger>
[editedticket] CosmicSkye: Ticket #9996 ([ffmpeg] Write joc_complexity_index to dec3 (EAC3SpecificBox), Windows and Android need it to play atmos) updated https://trac.ffmpeg.org/ticket/9996#comment:14
<Lynne>
michaelni: err, you've added a person with 0 commits to maintainers?
<Lynne>
oh, nevermind
mkver has quit [Ping timeout: 248 seconds]
usagi_mimi has quit [Quit: WeeChat 4.6.3]
jamrial has quit []
Martchus has joined #ffmpeg-devel
usagi_mimi has joined #ffmpeg-devel
Martchus_ has quit [Ping timeout: 252 seconds]
System_Error has quit [Ping timeout: 264 seconds]
<cone-867>
ffmpeg Emma Worley master:6fdb54ddee69: lavc/hashtable: create generic robin hood hash table
<cone-867>
ffmpeg Emma Worley master:2de0d095b84f: lavc/dxvenc: migrate DXT1 encoder to lavc hashtable
<cone-867>
ffmpeg Emma Worley master:d4556c98f02e: lavc/dxvenc: improve compatibility with Resolume products
System_Error has joined #ffmpeg-devel
halloy5771 has joined #ffmpeg-devel
halloy5771 has quit [Quit: halloy5771]
System_Error has quit [Remote host closed the connection]
<ramiro>
haasn: on swscale6_clean, by just setting "sws[1]->flags = mode.flags | SWS_UNSTABLE;", I get a failure in "./libswscale/tests/swscale -unscaled 1 -src yuv444p -dst argb": SSIM {Y=0.925251 U=0.975193 V=0.962438 A=1.000000}, loss 0.0660362 is WORSE by 0.0660214, expected loss 1.48416e-05
<ramiro>
also with -cpuflags 0. the error oscillates between SSIM {Y=0.788072 U=0.857437 V=0.786510 A=1.000000} and SSIM {Y=0.660499 U=0.932519 V=0.900159 A=1.000000} with the same command line on different runs
<ramiro>
valgrind doesn't complain. this is... odd. I haven't investigated further
<ramiro>
for neon I have no issues with cross-lane shuffles. I have a shuffle mask of up to 128 bytes, and the num_groups calculation is a little bit different
<haasn>
ramiro: can't you just always set size = 128 and then only use the subset that you care about? (i.e. the largest gcd of *read_bytes and *write_bytes that is a multiple of the lane size)
<haasn>
not a huge fan of leaking the internal representation from inside ff_sws_solve_shuffle to the caller, but we can find a different solution
<haasn>
e.g. we could have a minimum_size and a maximum_size
<haasn>
or a lane_size and vector_size
<haasn>
where lane_size = vector_size if cross-lane shuffles are not supported
<haasn>
(or lane_size and max_lanes)
<fflogger>
[newticket] Wallboy: Ticket #11621 ([ffmpeg] Add datetime/time prefix support for FFREPORT as well) created https://trac.ffmpeg.org/ticket/11621
<thardin>
haasn: I told the local spotify guy about your scaling work. he sounded mighty impressed
<ramiro>
haasn: setting size=128 and recalculating block_size/read_bytes/write_bytes after the call to ff_sws_solve_shuffle() works. it's not pretty, but it works. thanks for the suggestion!
<haasn>
thardin: I don't suppose they would be interested in funding it?
<haasn>
ramiro: I was thinking that we should be able to output a separate or_mask to handle clearing to nonzero values
<haasn>
or does NEON have a magic value that clears to 0xff?
<ramiro>
haasn: no, I use an or mask. outputting a separate or_mask that supports all consts is a superior solution.
<haasn>
okay, I will implement that (in a bit)
<haasn>
useful for x86 as well
<haasn>
I am a bit wary about growing the shuffle solver too much, since the idea was to _avoid_ having so many bespoke fast paths, in favor of having a fast general solution
<haasn>
but we just can't beat existing asm without _some_ level of fast shuffle
<haasn>
the alternative idea I had floating in my mind was to have a dedicated SWS_OP_SHUFFLE and optimize packed reads etc down to shuffle instructions
<ramiro>
btw I think there was no non-ff const in the conversions that used the shuffle solver. I think I ended up getting rid of that TODO from my commit
<haasn>
(this could be a generalization of byte swapping)
<ramiro>
hadn't you already tried that and decided to do it otherwise? it would require some generalized packed/planar byte representation or something
<haasn>
I don't remember exactly why I abandoned that idea
<ramiro>
thardin: also if they're interested in funding neon optimizations also let me know :P
<ramiro>
haasn: time to dig irclogs :)
<haasn>
I was thinking gray -> vuyx could need a non-1/0 clear but gray is always full range
<haasn>
anyways, I guess it's low importance
<haasn>
but I think given that the implementation is basically the same anyways, I might as well write the code to support all clear values
<haasn>
some food for thought is that if we end up splitting components into separate chains like I proposed several times, we may end up with some chains that actually have no SWS_OP_READ
<haasn>
consisting of just SWS_OP_CLEAR and SWS_OP_WRITE
<haasn>
you could revive your memset fast path for those
mkver has joined #ffmpeg-devel
jamrial has joined #ffmpeg-devel
<thardin>
haasn: good question
<thardin>
I think they have enough compute on the backend. biggest problem seems to be ||izing decode
<ramiro>
haasn: yes, I kind of already do that with asmjit. if it's clear+(optional bswap)+write in a planar format, the clear is done only once in setup
<fflogger>
[editedticket] francoisk: Ticket #11620 ([avutil] av_malloc_array() and av_realloc_array(): nmemb and size arguments transposed) updated https://trac.ffmpeg.org/ticket/11620#comment:2
<ramiro>
haasn: another thing, one benefit of the memops neon backend I had written is that it aligns the writes. I haven't checked the impact vs just having a tight loop and unaligned writes, but if this is how memset/memcpy is implemented in libc, I guess it must be worth it.
<haasn>
ramiro: what if you define av_memset16 and av_memset32 in libavutil?
<ramiro>
haasn: that would be cleaner and make more sense
<ramiro>
I had also added a left shift operator, but that's too specific. it might be better to just drop it and have a normal loop in the asmjit backend.
<ramiro>
oh, and bswap16/32. that one might be useful as well.
<haasn>
we could allow the packed shuffle solver to handle those
<haasn>
I guess there's no real reason it currently forbids single plane outputs
<ramiro>
haasn: I tried accepting either input or output as planar, but I quickly gave up. I just didn't try hard enough. but yes, that would help. especially with independent planes
<haasn>
that + plane splitting is my preferred solution here
<haasn>
rather than trying to modify the solver to support planar
<ramiro>
haasn: that helped with gray/yuvj444p/ya8/ya16be -> gray16[bl]e
<ramiro>
I'm looking forward to seeing plane splitting :)
<ramiro>
I think I'll finally be able to have all asmjit conversions being faster than legacy swscale
<ramiro>
currently there are only a dozen or two that are slower, down to 0.5 or 0.7 iirc.
Traneptora has quit [Quit: Quit]
minimal has joined #ffmpeg-devel
<haasn>
I kinda wanted to tackle scaling before plane splitting :p
<haasn>
but I may procrastinate from that just a bit longer...
<haasn>
Lynne: have you ever tried reducing the number of queues you allocate per VkDevice?
<Lynne>
no, not really, do you think this would help with OOMs?
<haasn>
nvidia in particular seems to take significantly longer to create devices the more queues you request, and I strongly doubt there is any practical performance benefit to allocating more than, say, 2 graphics queues
<fflogger>
[newticket] lelegard: Ticket #11622 ([undetermined] Low bitrate data PID in MPEGTS disrupts live output rate) created https://trac.ffmpeg.org/ticket/11622
<haasn>
at 16 graphics queues startup overhead is around 500 ms (!)
<Lynne>
wow
<Lynne>
that's pretty bad, yeah
<haasn>
there also seems to be a static limit of 64 queues per.. process, I think
<haasn>
actually I use only a single graphics queue as I never found any benefit to multiple
<haasn>
(in libplacebo)
<haasn>
if I try to create more than 7 VkDevices the 8th fails with VK_ERROR_INITIALIZATION_FAILED
<Lynne>
there's a performance increase with using multiple video queues, beyond what just using multiple submissions
<haasn>
with 1 queue per device I can create up to 63
<Lynne>
I don't mind sending a patch to allocate 1 graphics queue unless it's the only queue
englishm has quit [Ping timeout: 268 seconds]
DauntlessOne496 has joined #ffmpeg-devel
DauntlessOne49 has quit [Ping timeout: 272 seconds]
DauntlessOne496 is now known as DauntlessOne49
englishm has joined #ffmpeg-devel
kylophone has quit [Ping timeout: 268 seconds]
Son_Goku has quit [Ping timeout: 268 seconds]
mindfreeze has quit [Ping timeout: 252 seconds]
termos__ has quit [Ping timeout: 252 seconds]
kylophone has joined #ffmpeg-devel
zulleyy3 has quit [Ping timeout: 252 seconds]
<haasn>
Lynne: https://github.com/haasn/vulkan_limits I don't suppose you mind seeing if it affects your machine as well, and is not something weird about my environment / docker container setup?
<fflogger>
[editedticket] Balling: Ticket #9996 ([ffmpeg] Write joc_complexity_index to dec3 (EAC3SpecificBox), Windows and Android need it to play atmos) updated https://trac.ffmpeg.org/ticket/9996#comment:15
<fflogger>
[editedticket] Noki0100: Ticket #11618 ([ffmpeg] hwupload filter fails with "Cannot allocate memory" for VA-API on AMD RX 7900 XT (Navi 31) preventing H.264/HEVC hardware encoding initialization.) updated https://trac.ffmpeg.org/ticket/11618#comment:4
<fflogger>
[editedticket] Noki0100: Ticket #11618 ([ffmpeg] hwupload filter fails with "Cannot allocate memory" for VA-API on AMD RX 7900 XT (Navi 31) preventing H.264/HEVC hardware encoding initialization.) updated https://trac.ffmpeg.org/ticket/11618#comment:5
linkmauve has left #ffmpeg-devel [Error from remote client]
<fflogger>
[newticket] giuseppeM99: Ticket #11623 ([ffplay] FFplay crashes when seeking in .ogg file with images) created https://trac.ffmpeg.org/ticket/11623