michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct
<mkver>
jamrial, Gramner: The hevc_idct_transpose_dxd functions in lavc/x86/hevc/idct.asm are only called from the same assembly file, yet they are marked as cglobal and therefore exported from said file. What would need to be used instead of cglobal to change this?
<cone-982>
ffmpeg Andreas Rheinhardt master:7684243fbe6e: fftools/textformat/avtextformat: Remove unused variable
minimal has quit [Quit: Leaving]
<Gramner>
i usually put helper functions as a sublabel of another function, assuming you don't make use of any generated prologue/epilogue code
iive has quit [Quit: They came for me...]
thilo has quit [Ping timeout: 252 seconds]
thilo has joined #ffmpeg-devel
<haasn>
ramiro: I am thinking that all asm implementations will compile this to some variant of pshufb
<haasn>
So no need to support variants
<haasn>
I am more worried about GPU implementations
<haasn>
They want to do something like a subgroup shuffle
<haasn>
But that can only shuffle across adjacent pixels
<Lynne>
what about simple shared memory + lookup table?
<haasn>
I guess one possible implementation strategy would be to broadcast into clusters of the same size as the pixel
IndecisiveTurtle has quit [Ping timeout: 268 seconds]
compn has quit [Read error: Connection reset by peer]
LainIwakura has quit [Quit: Client closed]
compn has joined #ffmpeg-devel
derpydoo has quit [Read error: Connection reset by peer]
compn has quit [Read error: Connection reset by peer]
compn has joined #ffmpeg-devel
linkmauve has joined #ffmpeg-devel
Marth64[m] has joined #ffmpeg-devel
Marth64 has quit [Ping timeout: 272 seconds]
witchymary has quit [Remote host closed the connection]
jamrial has joined #ffmpeg-devel
witchymary has joined #ffmpeg-devel
ngaullier has quit [Ping timeout: 252 seconds]
blb has quit [Ping timeout: 245 seconds]
blb has joined #ffmpeg-devel
ngaullier has joined #ffmpeg-devel
minimal has joined #ffmpeg-devel
<haasn>
ramiro: I wrote code to optimize read{packed 4x8} + swizzle + write{packed 4x8} into a single read{planar 1x32} + shuffle + write{planar 1x32} but it turned out to actually make it _slower_
<haasn>
I think the problem there is that the planar 1x32 read code only reads a single register group, so processing 64 pixels at a time instead of the 256 pixels being processed by the packed 4x8 read/write code
<haasn>
I _could_ just go ahead and define special variants for exactly those functions that can expand to use all available registers
<haasn>
but I really want to know if there's a better way to solve this on a fundamental level without resorting to jit
blb has quit [Quit: brb]
blb has joined #ffmpeg-devel
Traneptora has joined #ffmpeg-devel
Marth64[m] has quit [Remote host closed the connection]
Sean_McG has quit [Ping timeout: 268 seconds]
Sean_McG has joined #ffmpeg-devel
<haasn>
ramiro: I think the best way forward is to give the x86 ops a prepass that lifts an exclusively 1/2-component pipeline to be promoted to a "4 component" pipeline on 4x the block size
<haasn>
with a special read/write function for this case
<haasn>
I'll have a think about it
<haasn>
that would also allow us to have basically 4x the throughput on other 1 component pipelines like scaling gray->gray
Teukka has quit [Read error: Connection reset by peer]
Teukka has joined #ffmpeg-devel
Teukka has quit [Changing host]
Teukka has joined #ffmpeg-devel
mkver has quit [Ping timeout: 265 seconds]
Anthony_ZO has quit [Ping timeout: 260 seconds]
psykose has quit [Remote host closed the connection]
<haasn>
in retrospect I'm not sure I even see a need for a generic SHUFFLE op, I think this is better handled by the individual implementations
<haasn>
I think I will revert it and go back to just having a swap_bytes op
psykose has joined #ffmpeg-devel
ngaullier has quit [Remote host closed the connection]
mkver has joined #ffmpeg-devel
minimal has quit [Quit: Leaving]
IndecisiveTurtle has joined #ffmpeg-devel
kasper93 has quit [Remote host closed the connection]
kasper93 has joined #ffmpeg-devel
k777 has joined #ffmpeg-devel
<jkqxz>
Is there any documentation of the how to write checkasm things?
<jkqxz>
Or the simplest example.
<jkqxz>
I have my asm which passes a load of randomised testing against the reference already, really I am looking for the benchmarking part.
System_Error has quit [Remote host closed the connection]
Sean_McG has quit [Quit: leaving]
k777 has quit [Remote host closed the connection]
System_Error has joined #ffmpeg-devel
jamrial has quit []
<ramiro>
haasn: could you save the packed->planar optimization somewhere so that I can have a look later? this could also include swap_bytes and expand
jamrial has joined #ffmpeg-devel
<jamrial>
jkqxz: a very simple one is checkasm/jpeg2000dsp.c
<jkqxz>
Thank you, that one is simpler.
<jkqxz>
Is there a normal way to more consistent results? (I'm assuming frequency scaling stuff is messing with me.)
<jamrial>
make sure to call bench_new with the same amount of data to process every time, use big buffers, and yeah, disable power saving features so cpu frequency doesn't fluctuate
<jkqxz>
I'm doing one 8x8 block at a time. Should I make lots of them and do them all together?
<jamrial>
probably
<jamrial>
oh, maybe also run with the same seed every time, so it's the same data
<jkqxz>
My test data is currently pretty terrible. Picking values uniformly over the whole range means the outputs are mostly saturated, but I don't think much I am doing should be data-dependent.
kasper93 has quit [Quit: kasper93]
kasper93 has joined #ffmpeg-devel
kasper93_ has joined #ffmpeg-devel
kasper93__ has joined #ffmpeg-devel
kasper93 has quit [Ping timeout: 252 seconds]
kasper93_ has quit [Ping timeout: 252 seconds]
kasper93__ is now known as kasper93
kasper93_ has joined #ffmpeg-devel
kasper93__ has joined #ffmpeg-devel
kasper93 is now known as Guest2094
Guest2094 has quit [Killed (copper.libera.chat (Nickname regained by services))]
kasper93__ is now known as kasper93
kasper93_ has quit [Ping timeout: 248 seconds]
kasper93 has quit [Quit: kasper93]
kasper93 has joined #ffmpeg-devel
kasper93 has joined #ffmpeg-devel
kasper93 is now known as Guest8026
Guest8026 has quit [Killed (lead.libera.chat (Nickname regained by services))]
kasper93 has quit [Client Quit]
kasper93 has joined #ffmpeg-devel
kasper93 has joined #ffmpeg-devel
kasper93 is now known as Guest6365
Guest6365 has quit [Killed (platinum.libera.chat (Nickname regained by services))]
mkver has quit [Ping timeout: 244 seconds]
witchymary has quit [Remote host closed the connection]
witchymary has joined #ffmpeg-devel
rvalue has quit [Read error: Connection reset by peer]