michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 8.0 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
System_Error has quit [Remote host closed the connection]
_whitelogger has joined #ffmpeg-devel
System_Error has joined #ffmpeg-devel
mkver has quit [Ping timeout: 255 seconds]
<Lynne>
kierank: could you do some light trolling on twitter?
<kierank>
Sure
<kierank>
As long as it's not the FFmpreg joke
<Lynne>
no, blender refuse to implement Vulkan compute renderer because they think its a graphics-only thing, but is perfectly happy to do so for Metal
<Lynne>
let me draft something and link their statement
<Lynne>
whilst we implemented encoders and decoders
<kierank>
Ok, you have 140 chars
<Lynne>
I thought X allowed up to 280, plus some exceptions
<kierank>
Yeah but I think it doesn't show
mkver has joined #ffmpeg-devel
<Lynne>
"We implemented whole codecs in Vulkan Compute, but @Blender refuse to implement it for Cycles rendering, relying on CUDA and HIP" == 120 chars minus "@Blender", since IIRC @s don't count
<Lynne>
you can cut after the last comma if you need it shorter
<Lynne>
I'm irked, since Vulkan *compute* even supports accelerated raytracing, but HIP and Metal don't
<kierank>
Hmm might need some rewording
<Lynne>
feel free to, you're a master at this
<kierank>
Also a bit unfair to link to something from 2023
<Lynne>
its not that long ago
<kierank>
I'm good at trolling the compiler lovers
<kierank>
I think that's why Chris fettner followed us and unfollowed a week later lol
<Lynne>
lol
<kierank>
Too much trolling about how compilers suck as we need hand written asm
<Lynne>
the worst part is that you can't really work in their Vulkan GUI renderer (eevee) and expect the same code and materials to work in Cycles when you decice to render the project for real
<Lynne>
so you need to have CUDA, HIP or Metal installed and constantly compare if you ever expect to properly render your project in a realistic way
<Lynne>
or have your desktop be a $15k Xeon render monster
<galad>
I would say that Cycles is a bit more complex than one of the codecs implemented in FFmpeg
indecisiveturtle has quit [Ping timeout: 244 seconds]
funkylab_ has joined #ffmpeg-devel
<funkylab_>
hi! Assuming I was to improve libavformat/ebur128's performance significantly, mostly by vectorizing loops that do things like s = \sum_i…N_frames samples[i]², or are convolutional filters:
<funkylab_>
avoiding reinventing wheels, is there already a header/impl within the tree that offers things like "calculate the energy" or "convolve this with that"?
<funkylab_>
(just for reference, on a large corpus of MP3 and OPUS encoded music, the EBUR128 gating's squaring functionality takes about 10× the CPU that decoding MP3 and OPUS takes, which is not great, especialyl because the next biggest CPU consumer is ebur128_filter_short.)
<Compnn>
ehe
<Compnn>
funkylab_, there might be, but its possible that the people who would know are afk right now. have to stick around to find out
<funkylab_>
I was also pretty surprised to find `perf top -a` to be so clear about ebur128_filter_short and ebur128_calc_gating_block
<funkylab_>
but that's where we are; I can see how the way the code is written is maximally hard for a compiler to automatically vectorize: enough redirections to have a hard time precluding aliasing, hence leading to very granular loads and stores,
<funkylab_>
and the math being all in double, which precludes a compiler from reordering summation, because a += b ; a += c is not the same as a += (a+b)
<Compnn>
funkylab_, previously haasn has done some ebur128 work. maybe he can enlighten the simd energy routines
<Compnn>
are you looking at the c routine or the simd route ?
<funkylab_>
oh there's a SIMD routine? I'm looking at the C routine, because that's what gets executed when my audio player calculates normalization factors!
<funkylab_>
ah found it
<funkylab_>
nice
<funkylab_>
see, that's why I came here :) big code base
<Compnn>
maybe we should detail the internal benchmark routine somewhere in the developer docs
<Compnn>
jamrial can also hint at ebur128 stuff
* Compnn
afk
<jamrial>
not really
<Compnn>
oh ok :D
<Compnn>
just saw you in some commits and patch reviews , my mistake
<haasn>
funkylab_: there are two ebu r128 implementations
<funkylab_>
haasn: hi :) yeah, I'm browsing through your work in libavfilter/x86/f_ebur*, and that seems to be the one that's not using up all my CPU
<haasn>
you are using the af_ebur128 filter or af_loudnorm?
<funkylab_>
haasn: good question, I came from the other end: strawberry full library rescan (that's an audio player) takes a long time, using perf to find ebur128_{filter_short,calc_gating_block} consuming about a 100% of a CPU core
<funkylab_>
let me quickly check what strawberry actuall invokes
<funkylab_>
hahaha oh noes
<Compnn>
haasn, shh, we almost got him to speed up our code. instead you're telling him hes not using that code at all... /s
<funkylab_>
Compnn: you jinxed it ^^ kind of.
<haasn>
Compnn: I rather we merge the two impls
<Compnn>
agree
<Compnn>
before it turns into prores fiasco again
<haasn>
The truth is, the f_ebur128 filter got love only because there was a public bug bounty to improve it
<haasn>
But in an ideal world it wouldn’t exist
<funkylab_>
haasn, worse: local definition lookup (in my /usr/include) led me to wrongly believe it's using libavfilter's ebur128_calc_gating_block etc, but in fact, its using the, seemingly included libebur128 from https://github.com/jiixyj/libebur128
<funkylab_>
so, technically, I'm using neither
<Compnn>
lol
<funkylab_>
rof lol
<haasn>
Fun
<funkylab_>
I kinda still owe you an improvement, don't I, now?
<haasn>
Now would be a good time to try af_ebur128 then
<funkylab_>
oh, true
<haasn>
Iirc it’s close to memory bound
<haasn>
No, that was a different filter
<funkylab_>
It honestly kind of should be close to membound (in an ideal world, spherical normalizer in frictionless vacuum and all that)
<haasn>
But anyway the slowest part is audio resampling for the 192khz true peak
<Compnn>
funkylab_, i was so happy. someone is coming in and they know how to pref and optimize code... :D