#ffmpeg-devel on 2025-05-09 — irc logs at libera.catirclogs.org

2025-03-03 01:04 michaelni changed the topic of #ffmpeg-devel to: Welcome to the FFmpeg development channel | Questions about using FFmpeg or developing with libav* libs should be asked in #ffmpeg | This channel is publicly logged | FFmpeg 7.1.1 has been released! | Please read ffmpeg.org/developer.html#Code-of-conduct

00:19 thilo has quit [Ping timeout: 248 seconds]

00:21 thilo has joined #ffmpeg-devel

01:12 iive has quit [Quit: They came for me...]

01:12 \\Mr_C\\ has quit [Remote host closed the connection]

01:23 kasper93 has quit [Remote host closed the connection]

01:27 kasper93 has joined #ffmpeg-devel

02:06 realies9 has joined #ffmpeg-devel

02:08 realies has quit [Ping timeout: 276 seconds]

02:08 realies9 is now known as realies

03:00 mkver has quit [Ping timeout: 248 seconds]

03:15 Martchus_ has joined #ffmpeg-devel

03:16 Martchus has quit [Ping timeout: 252 seconds]

03:18 jamrial has quit []

03:37 _whitelogger has joined #ffmpeg-devel

03:45 secondcreek has joined #ffmpeg-devel

03:55 MisterMinister has joined #ffmpeg-devel

03:56 MisterMinister has quit [Remote host closed the connection]

03:57 MisterMinister has joined #ffmpeg-devel

04:18 secondcreek1 has joined #ffmpeg-devel

04:19 kepstin has quit [Ping timeout: 252 seconds]

04:19 kepstin has joined #ffmpeg-devel

04:20 secondcreek has quit [Ping timeout: 276 seconds]

04:20 secondcreek1 is now known as secondcreek

04:31 bcheng has quit [Ping timeout: 252 seconds]

04:31 bcheng has joined #ffmpeg-devel

05:00 System_Error has quit [Remote host closed the connection]

05:01 Kei_N_ has quit [Remote host closed the connection]

05:01 Kei_N has joined #ffmpeg-devel

05:05 Kei_N has quit [Ping timeout: 248 seconds]

05:07 Kei_N has joined #ffmpeg-devel

05:07 System_Error has joined #ffmpeg-devel

05:26 ___nick___ has joined #ffmpeg-devel

05:31 cone-078 has joined #ffmpeg-devel

05:31 <cone-078> ffmpeg Manuel Lauss master:9369ebf23806: avcodec/sanm: recognize common FOBJ sizes

05:31 <cone-078> ffmpeg Manuel Lauss master:7f0b7b049696: avcodec/sanm: ignore codec48 compression type 6

05:31 <cone-078> ffmpeg Manuel Lauss master:37064b2d1661: avcodec/sanm: support "StarWars - Making Magic" video

05:37 ___nick___ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

05:47 lka has joined #ffmpeg-devel

05:48 lka has quit [Client Quit]

06:34 secondcreek has quit [Ping timeout: 244 seconds]

06:36 bcheng has quit [Ping timeout: 272 seconds]

06:37 bcheng has joined #ffmpeg-devel

06:37 blb has quit [Ping timeout: 248 seconds]

06:39 blb has joined #ffmpeg-devel

07:26 lemourin has quit [Ping timeout: 260 seconds]

07:45 lemourin has joined #ffmpeg-devel

07:50 lemourin has quit [Client Quit]

07:52 lemourin has joined #ffmpeg-devel

07:55 secondcreek has joined #ffmpeg-devel

08:11 <fflogger> [newticket] Size4658: Ticket #11583 ([undetermined] ffmpeg with certain files will output getting a 1kB output file) created https://trac.ffmpeg.org/ticket/11583

08:31 cone-078 has quit [Quit: transmission timeout]

09:47 j45 has quit [Ping timeout: 276 seconds]

09:47 j45 has joined #ffmpeg-devel

09:47 j45 has quit [Changing host]

09:47 j45 has joined #ffmpeg-devel

09:49 aaabbb has quit [Changing host]

09:49 aaabbb has joined #ffmpeg-devel

10:12 \\Mr_C\\ has joined #ffmpeg-devel

10:19 DodoGTA has quit [Quit: DodoGTA]

10:21 DodoGTA has joined #ffmpeg-devel

10:22 <haasn> if we were to implement temporal dithering in swscale, should the dither offset index be based on the frame PTS or the output frame count? (or configurable as an expression?)

10:23 <haasn> basing it on the frame PTS would lead to a reproducible result when seeking

10:23 <haasn> (most likely in terms of the swscale API it would just be an arbitrary 8-bit integer, so this is more of a question of how vf_scale should set this integer)

10:24 <nevcairiel> is there a real difference ultimately?

10:24 <haasn> probably not, I guess you wouldn't enable temporal dither when you need reproducible results

10:25 <haasn> then maybe it should not even be a configurable expression and libswscale should just directly use the frame PTS internally, saves us from even having to add API for it

10:35 \\Mr_C\\ has quit [Remote host closed the connection]

10:37 <Lynne> haasn: I'd prefer based on PTS, since its technically more correct, and you would have bitexact result between chunks extracted from streams

10:38 <haasn> input frame or output frame?

10:39 <haasn> probably input frame, since you're more likely to cut off the input than the output stream

10:40 <haasn> for vf_scale it doesn't matter since output pts = input pts

10:44 <Lynne> yeah, input frame

11:05 <BtbN> haasn: would you say what you're doing with swscale can be used to also make it work on CUDA frames, by generating PTX code or something? Not sure how it all works yet.

11:05 <haasn> BtbN: yes

11:05 <haasn> that's a stated design goal

11:06 <haasn> well, not 100% sure about CUDA internals and how easy it is to autogenerate code at runtime there

11:06 <haasn> but on vulkan you can pretty straightforwardly generate SPIR-V kernels

11:06 <haasn> and I'm sure you can probably ingest SPIR-V kernels on CUDA if you try hard enough

11:06 <BtbN> Auto-Generating actual C++ CUDA code is in theory possible, but needs a closed source compiler library (or interfacing with libllvm I guess?)

11:06 <BtbN> But the pure driver API absolutely can consume PTX "Assembly" at runtime and build it

11:08 <haasn> at worst you can probably stitch together individual kernels for each op

11:08 <haasn> like runtime linking of precompiled modules / kernels

11:08 <Lynne> I'd prefer to interface with LLVM and generate LLVM-IR directly

11:08 <BtbN> rcombs has build something like this for Plex before, so it's 100% possible

11:08 <Lynne> that can then be used to compile into ptx and spirv

11:09 <BtbN> i.e. scale_cuda in Plex-Ffmpeg works very different compared to ours

11:09 <BtbN> The problem with interacing with LLVM is that it's a royal mess

11:09 <Lynne> it's just a C++ API?

11:09 <BtbN> I'm dealing with this at work via numba/llvmlite

11:09 <BtbN> and the llvm folks break ABI and API every release

11:10 <BtbN> And not just a little bit

11:10 <Lynne> ah, opaque pointers?

11:10 <BtbN> not sure, since I only see that the numba folks need over a year to implement support for each new llvm major version

11:10 <BtbN> But they almost have to start over from scratch each time

11:11 <BtbN> In theory it's a great idea though, but very maintenance intensive

11:11 <BtbN> I thought about that for scale_cuda before, but discarded it due to being massively overkill

11:13 <haasn> damn SSIM really hates blue noise

11:13 <Lynne> I hate glsl with the force of 10000 suns, and my plan has been to convert all glsl code to opencl, which can be partially compiled and linked during runtime with llvm

11:13 <haasn> since it's higher frequency than ordered dither

11:13 <haasn> and I guess a bit less mathematically precise?

11:14 <haasn> my eyes prefer blue noise though

11:14 <BtbN> it might unironically be easier to invoke clang via cli than to interface with libllvm. The cli interface is more stable.

11:14 <haasn> Lynne: I still have high hopes for directly generating SPIR-V in swscale2

11:16 <Lynne> the thing is that this involves going through spir-v assemblers if you do it textually

11:16 <Lynne> ...which is khronos code

11:16 <haasn> maybe will have to steal put_bits.h from avcodec

11:16 <JEEB> BtbN: yea that sounds quite possible

11:16 <haasn> no, I mean obviously as binary

11:16 <JEEB> or libclang I guess?

11:17 <BtbN> Does that exist?

11:17 <Lynne> yeah

11:17 <Lynne> that's what I meant by LLVM

11:17 <BtbN> How is it different from libllvm?

11:17 <Lynne> it runs a frontend

11:17 <BtbN> ah, fair

11:18 <BtbN> maybe it's so bad for the numba folks since they are effectively building a frontend for Python there, out-of-tree of llvm

11:18 <JEEB> yea so you're not directly interfacing with LLVM, but instead I think the interface is on the C / clang layer.

11:18 <JEEB> at least for editor stuff it seems stable enough, since I've used it both for sublime text as well as vscode

11:20 <BtbN> If sws can then generate something llvm can compile, it should be trivially possible to make it output nvptx

11:20 <BtbN> well, maybe not trivially

11:30 <Lynne> I'll write the vulkan backend with that in mind

11:32 <BtbN> if it can output spir-v via llvm, it can output nvptx

11:32 <BtbN> All that's needed then is some scaffolding that generated ptx code can be plugged into

11:34 minimal has joined #ffmpeg-devel

11:40 microlappy has joined #ffmpeg-devel

11:43 kasper93_ has joined #ffmpeg-devel

11:43 kasper93 is now known as Guest8861

11:43 Guest8861 has quit [Killed (osmium.libera.chat (Nickname regained by services))]

11:43 kasper93_ is now known as kasper93

11:46 <fflogger> [editedticket] cus: Ticket #11583 ([undetermined] ffmpeg with certain files will output getting a 1kB output file) updated https://trac.ffmpeg.org/ticket/11583#comment:1

11:50 microlappy has quit [Quit: Konversation terminated!]

11:53 mkver has joined #ffmpeg-devel

11:55 <IndecisiveTurtle> There are lightweight libraries for emitting spirv in directly in binary, but for c++

11:56 <kierank> •haasn> if we were to implement temporal dithering in swscale, should the dither offset index be based on the frame PTS or the output frame count? (or configurable as an expression?)

11:56 <kierank> surely this is more of an avfilter thing

11:56 <kierank> sws doesn't care about pts right now

11:57 <haasn> s/swscale/vf_scale/

12:05 <Lynne> IndecisiveTurtle: grats to you too

12:05 <Lynne> do you know when you might have time to fix the issues from the last rounds of review for the vc2 encoder?

12:08 <IndecisiveTurtle> Thx xD, I've been chipping away at those the past week

12:08 <IndecisiveTurtle> I finished most of the cosmetic ones, fixed the leak and implemented the LUT optimizations in the shader. Now I'm looking for interlacing support

12:14 <IndecisiveTurtle> I believe I also found a bug in cpu encoder, for some interlaced videos it crashes with "Assertion n <= s->buf_end - s->buf_ptr failed at libavcodec/put_bits.h:390"

12:17 <Lynne> I think you can skip on interlacing for now

12:24 <IndecisiveTurtle> I felt kinda bad gutting all that code for something seemgly simple to add, but okay

12:27 <IndecisiveTurtle> Other than that there is the const casting comment which I can't really solve without a separate patch, as ff_vk functions require a mutable pointer

12:29 <Lynne> which functions?

12:29 jamrial has joined #ffmpeg-devel

12:30 <IndecisiveTurtle> ff_vk_exec_add_dep_frame, ff_vk_create_imageviews, ff_vk_frame_barrier etc all of them receive a non-const AVFrame*

12:32 <Lynne> just cast it

12:32 <Lynne> we do that in the ffv1enc_vulkan.c code too

12:33 <Lynne> it's unavoidable

12:34 jamrial has quit [Ping timeout: 252 seconds]

12:36 <IndecisiveTurtle> Andres suggested to rewrite the vulkan code to avoid it, so I thought of going over them and seeing which ones can be changed to make them const correct and simply cast on those cant need mutable. For a lot of them it seems possible from a first look

12:37 <IndecisiveTurtle> But I'll leave it you if that should be done in the end or not

12:37 <IndecisiveTurtle> (If I need to do it I mean)

12:39 <Lynne> mkver: accessing vulkan frames requires locking a mutex and incrementing a semaphore, so it's not really possible to make the avframe truly const

12:40 <Lynne> we could make the frame const, but access the payload buffer as non-const, but I think it's fine leaving it as-is

12:40 jamrial has joined #ffmpeg-devel

12:49 jamrial has quit [Ping timeout: 244 seconds]

13:25 minimal has quit [Quit: Leaving]

13:47 rvalue has quit [Read error: Connection reset by peer]

13:48 rvalue has joined #ffmpeg-devel

14:33 jamrial has joined #ffmpeg-devel

15:20 <fflogger> [newticket] jr_clifton: Ticket #11584 ([ffprobe] ffprobe returns "n/a" for bitrate in opus audio file) created https://trac.ffmpeg.org/ticket/11584

15:48 <BtbN> In what state actually is the sws work? Like, could I start helping testing/devloping it for CUDA, or is that too early?

15:55 zsoltiv has quit [Ping timeout: 260 seconds]

15:56 zsoltiv_ has quit [Ping timeout: 248 seconds]

16:02 Traneptora has quit [Quit: Quit]

16:07 <BtbN> What I'm slightly afraid of is llvm becoming a dependency, since it's HUGE and would quadruple the size of the static builds

16:13 mkver has quit [Ping timeout: 248 seconds]

16:16 Traneptora has joined #ffmpeg-devel

16:17 cone-126 has joined #ffmpeg-devel

16:17 <cone-126> ffmpeg Marvin Scholz master:aeea2defe4fb: avformat/apvdec: remove unused variable

16:31 <fflogger> [newticket] paulpacifico: Ticket #11585 ([ffmpeg] Error converting 10bit to 8bit and vice-versa (Operation not supported)) created https://trac.ffmpeg.org/ticket/11585

17:59 <rcombs> BtbN: is there any reason the codegen would need to happen at runtime instead of compile-time? the clang integration in the main ffmpeg build system is perfectly fine

17:59 <BtbN> It's how I understand new-swscale is working

17:59 <BtbN> it generates code for the desired conversion, builds it for the desired target (local CPU, SPIR-V, NVPTX, ...), and runs it

18:00 <BtbN> With the code apparently being LLVM IL?

18:00 <BtbN> Pre-Generating all possible conversion would be a GIGANTIC ball of code

18:01 <BtbN> scale_cuda is also to a degree suffering from that, with its relatively small subset of conversion

18:05 <rcombs> so the problem there is combinatorials, right?

18:06 <rcombs> there's a pretty straightforward trick I used for that in CUDA tonemapping, which reduces the problem massively

18:07 <BtbN> I think the goal for swscale is maximum performance

18:07 <BtbN> so it can build a conversion kernel optimized for the local CPU

18:07 <rcombs> in the CUDA code, I declare some `extern const __constant__` variables, and branch on them

18:07 <BtbN> and as a side effect, it can also build those kernels for GPUs

18:08 <rcombs> and then I generate a PTX assembly string defining those variables' values

18:08 <Lynne> BtbN: llvm is a hard dep on mesa, so I'm not too worried about it on linux

18:08 <BtbN> If the OS provides it, sure

18:08 <rcombs> the compiler then sees that the values will be constant at runtime and inlines them

18:08 <BtbN> but for static builds, I think the binary sice would crack 1GB, up from 100MB or so

18:09 <rcombs> so you get the same performance as if doing the branching in C++ with templates (or with generated code)

18:09 <rcombs> but without needing to generate any actual CUDA source (or LLVM IR or whatever)

18:10 <BtbN> Yeah, for a pure CUDA implementation, something like that is fine

18:10 <BtbN> but I think the goals for new swscale are bigger

18:10 <rcombs> similar things are straightforwardly possible for vulkan and such, afaik?

18:11 <rcombs> for the local CPU, uhhhhhhhhh

18:11 <rcombs> you have a number of problems there, including "the system won't always allow you to JIT anyway"

18:11 <BtbN> you can jit all you like

18:11 <BtbN> every emulator under the sun would be defunct if you couldn't

18:11 <Lynne> its GPU code, so you can JIT

18:12 <BtbN> It's about the CPU "kernels" swscale would generate

18:12 <Lynne> its not generating anything now

18:12 <BtbN> Well, but it's the goal, isn't it?

18:12 <Lynne> no, x86 is regular SIMD

18:12 <BtbN> ah

18:13 <rcombs> JIT on macOS and iOS requires an entitlement; you can opt into it on macOS (it's a bit annoying but it's not too bad) but on iOS it requires special permission

18:13 <BtbN> Apple can just get lost then

18:13 <rcombs> look I don't like it either but it's a major target platform

18:14 <BtbN> They get dumb C implementations then, nothing else to be done

18:15 <rcombs> and the macOS approach here is pretty reasonable (needing to explicitly opt in to enable the JIT API)

18:15 <rcombs> I mean, there are entirely reasonable assembly implementations today

18:15 <Lynne> ramiro went on to experiment with runtime JIT for aarch64 SIMD but that's just a side project; I wouldn't be inclined to accept it

18:15 <BtbN> On both Linux and Windows you just need to explicitly allocate executable memory

18:16 <haasn> BtbN: bit late to the party; let me clarify some things

18:17 <haasn> 1) now would be as good a time as any to attempt a gpu backend, yes; especially as we will probably need to change at least some parts of the API for it

18:17 <rcombs> the entitlement means that it's harder to exploit a compromised process, since just calling mmap/mprotect with the relevant flags would fail unless the executable itself was flagged as "yes I actually use that feature"

18:17 <haasn> best get that out of the way

18:17 <haasn> but I already committed and wrote a full x86 SIMD backend so I don't expect major changes at this point

18:18 <BtbN> I'm curious though if that SIMD would be beaten by llvm compiled native code for the current platform

18:18 <haasn> BtbN: it generates an SwsOpList; the level of abstraction we're dealing with can be inferred readily from https://github.com/haasn/FFmpeg/blob/swscale6/libswscale/ops.h#L44

18:19 <haasn> it's not really comparable to LLVM IL and we don't have any LLVM or JIT code in my branch currently

18:19 <haasn> an LLVM backend would be a fun project but not something I'm relying on

18:19 mkver has joined #ffmpeg-devel

18:19 <BtbN> I think manually generating NVPTX from that is more realistic

18:19 <haasn> (the ideal case here would be if we could use the same backend to also generate CUDA and SPIR-V code)

18:19 <BtbN> It's not that hard of an assembly language

18:19 <haasn> I think manually generated SPIR-V binary is also more realistic than linking against LLVM

18:20 <rcombs> can confirm nvptx is pretty easy

18:20 <haasn> FFmpeg loves to NIH C format generators and SPIR-V looks like a particularly friendly binary format

18:20 <rcombs> helps that it targets an abstract machine with, like, hundreds of registers

18:20 <rcombs> and that branching on constants is literally free

18:21 <haasn> 3) we don't need to JIT for SIMD backends, we could also just stitch together kernels "by hand" - they have a pretty regular / obvious shape; just look for the JMP instruction and instead put the next kernel there

18:21 <haasn> though I'm not convinced it will give any performance benefit on modern CPUs with working branch predictors

18:21 <BtbN> Lynne: something you might be interested in is new stuff in CUDA 12.9 regarding Vulkan interop. Apparently one can now create a CUDA context from a Vulkan one, so they are effectively the same context, making stuff more efficient.

18:21 <haasn> since the jumps are all non-dependent and point to the same destination 100% of the time

18:22 <Lynne> I'd still rather link against LLVM, since the compiler can optimize the code further and we'd get both spirv and ptx out of it, and I'm intending to switch the existing vulkan code anyway

18:23 <Lynne> BtbN: but what about the underlying objects?

18:23 <haasn> 4) even if you can't gen the binary at runtime, you don't need necessarily need to pre-generate kernels for every possible format conversion at compile time, if you can generate a kernel for every possible operation (a few dozen) and stitch/link them together

18:23 <BtbN> Lynne: what do you mean?

18:23 <Lynne> can you export buffers and images in between APIs freely?

18:24 <haasn> rcombs: you don't even need to branch at all, everything is SSA, every operation is strictly linear with only one input and one output

18:24 <rcombs> in theory a jump has a slight cost even if it's perfectly predicted, but in practice unless your EU utilization is *crazy* good, it's fine

18:24 <BtbN> To a degree, yeah

18:24 <rcombs> haasn: I mean using branching to "stitch together" different operations

18:24 <haasn> ah, sure

18:25 <haasn> we can also quite easily provide a guarantee about the maximum number of ops

18:25 <rcombs> like, in tonemap_cuda there are a bunch of different tonemap algorithms, and they all get inlined into a single function with branching on a constant, which is free at runtime

18:25 <haasn> so you could unroll every kernel into a fixed size loop of switch/case statements and plug in the constants later

18:26 <rcombs> Lynne: if we wanted, we could probably build out a small collection of C++ templates that would let us write shader files that target both NVPTX and SPIR-V in a single shader

18:27 <haasn> BtbN: we can make the C implementation "fast enough" if we want to, in my original design for swscale3 I had _only_ the C code

18:27 <haasn> just requires compiling the template with -mfpu=neon or w/e

18:27 <Lynne> rcombs: it's nothing that we couldn't do with C

18:27 <haasn> and it was still faster than the old x86 code (with -mavx2)

18:27 <haasn> obviously the new hand written AVX2 is way faster

18:27 <rcombs> ¯\_(ツ)_/¯ the compiler's C++ anyway, no reason not to make use of templates

18:28 <Lynne> what compiler?

18:28 <rcombs> clang targeting NVPTX or SPIR-V

18:29 <haasn> BtbN: I still think we should write a NEON backend though that doesn't rely on JIT, but uses the same approach as the current x86 backend

18:30 <haasn> (If STF accepts my proposal I would get contracted to do that anyways)

18:30 <BtbN> We absolutely should have a NEON backend. What'd be the alternative for fast ARM code?

18:32 <Lynne> rcombs: if the API is C++, sure

18:34 <rcombs> I mean, for NVPTX the underlying API is assembly, but using templates was still very valuable

18:34 <rcombs> think of it like macros but better

18:36 <Lynne> though I foresee a libclang implementation accepting opencl code strings rather than C

18:37 <Lynne> but that's if they fix it in time

18:38 <Lynne> I'd like to get STF to fund a Vulkan backend, and right now it segfaults when I ask it to output vulkan-flavoured spirv out of opencl

18:38 <Lynne> so runtime SPIRV gen looks more likely

18:40 <Lynne> unlike other vulkan code we have, it's less likely to change, as it builds up everything from small primitives

19:17 cone-126 has quit [Quit: transmission timeout]

19:50 <haasn> ramiro: how do you handle dithering in your backend atm?

19:50 <haasn> I am thinking about how to handle sizes larger than 16x16, which requires access to the current x coordinate, something I don't currently track at all

19:50 <ramiro> haasn: can't you use exec.x and exec.y?

19:51 <haasn> I made exec immutable by the backend

19:51 <haasn> I guess the obvious answer is to change that

19:51 <ramiro> why would you make it immutable?

19:54 <linkmauve> > 20:08:24 Lynne> BtbN: llvm is a hard dep on mesa, so I'm not too worried about it on linux

19:54 <linkmauve> Only for radeonsi (and even there it can work with ACO nowadays) and llvmpipe, and the rusticl frontend. You can compile Mesa without.

20:02 <ramiro> haasn: I modified ff_sws_solve_shuffle() a bit: https://github.com/ramiropolla/ffmpeg/commit/5b3ef87b

20:03 <haasn> not sure I follow the point

20:04 <haasn> why not just use vector_size as `size`?

20:06 ramiro has quit [Ping timeout: 252 seconds]

20:08 ramiro has joined #ffmpeg-devel

20:09 <ramiro> haasn: sorry, internet went down. I saw your replies on some irclogs online

20:10 <ramiro> this way I can do any combination of input/output format by always writing full vectors, even if it means working on chunks of 96 bytes (input or output).

20:10 <ramiro> 240 combinations use the shuffle solver. some are much much faster, most are faster, a few are the same speed, and a couple are a tiny little bit slower.

20:10 <ramiro> there is no over-reading or over-writing, no unused chunks of vector

20:12 <Lynne> linkmauve: I know, but for most purposes, its a hard dep as most distros compile with lavapipe and such

20:31 cone-710 has joined #ffmpeg-devel

20:31 <cone-710> ffmpeg Michael Niedermayer master:9230c93cc9fd: avcodec/rv60dec: inter also fails with qp >= 32

20:31 <cone-710> ffmpeg Michael Niedermayer master:43926e026dd8: avcodec/mmvideo: fix palette index

20:31 <cone-710> ffmpeg Michael Niedermayer master:4e5523c98597: avcodec/hevc/ps: Fix dependant layer id check

20:31 <cone-710> ffmpeg Michael Niedermayer master:ce1fd73d637a: avformat/iff: Check nb_channels == 0 in MHDR

20:32 <fflogger> [editedticket] fedegratti: Ticket #9408 ([avdevice] WASAPI Audio Input/Output Support) updated https://trac.ffmpeg.org/ticket/9408#comment:5

20:36 <fflogger> [editedticket] tibaldo: Ticket #9408 ([avdevice] WASAPI Audio Input/Output Support) updated https://trac.ffmpeg.org/ticket/9408#comment:6

20:58 <averne> Lynne: Hey, thanks :) I've been away for a bit and haven't been able to work on stuff, but things are finally aligning for me to get some free time

21:00 <averne> By the way, do you know of any projects doing shader-based/assisted media decoding (aside from ffmpeg and nvidia's jpeg thing)? As I'm supposed to study existing art during the first 3 weeks.

21:12 <Lynne> not really, the ffv1 code is representative enough

21:15 <averne> Yeah I was also planning to look at the VC2 code from last year's gsoc

21:15 mkver has quit [Remote host closed the connection]

21:16 mkver has joined #ffmpeg-devel

21:17 <averne> Ideally I'd also like to reverse nvidia's jpeg shaders, to get acquainted with low-level shader stuff. I dumped them from the cuvid library a while ago, but that might be a lot of effort

21:17 <Lynne> most of the other projects I've seen (jpeg) mostly implemented codecs with floats, which is fine if you're presenting on screen once, but we want the output from the software and compute decoders to match

21:18 <averne> nvidia's thing does NV12 decode iirc

21:22 <Lynne> not a great idea tbh

21:23 <Lynne> the issue is that unless you decode components simultaneously, you would have to load, insert the value into the vector, store

21:24 <Lynne> so for ffv1 decoding we use yuv420p instead

21:27 <averne> Oh, they do decode to yuv420p, then launch a separate shader to merge the chroma planes into the final output

21:27 <averne> https://files.catbox.moe/ilc7o1.png <- nsight trace I've captured of some random jpg

21:28 realies has quit [Quit: ~]

21:29 <Lynne> yeah, we don't need to do that thankfully

21:31 Anthony_ZO has joined #ffmpeg-devel

21:33 realies9 has joined #ffmpeg-devel

21:36 <Lynne> btw when its time to optimize I highly recommend using mesa+rgp

21:37 <Lynne> nvidia's tools do not support pure compute programs

21:37 <Lynne> only mesa can debug compute-only code bu dumping rgp which can then be read

21:38 <averne> Thanks for the tip, I admit I have little experience in that way

21:39 <averne> Does Geo Ster hang around here? I see that he was accepted into the prores encoder project, might be worth exchanging early to see if we can commonize some stuff

21:41 <Lynne> yeah, IndecisiveTurtle

21:42 <Lynne> there's also pmozil from last year who wrote a vc2 decoder, but he hasn't had time to work on it (due to being in a literal warzone)

21:43 <Lynne> he did finish it, you can find the code here - https://github.com/pmozil/FFmpeg

21:43 <averne> Nice, and yeah I've seen the VC2 stuff when browsing prior gsoc editions

21:46 user23 has joined #ffmpeg-devel

21:46 user23 has quit [Excess Flood]

21:46 user23 has joined #ffmpeg-devel

21:46 <IndecisiveTurtle> averne: Hi xd Yes I did vc2 encoder last year as well

21:48 <averne> Oh right I knew I saw your name from the previous edition

21:49 <averne> Anyway I was thinking there might be some code the decoder and encoder could share, eg. (i)DCT

21:50 <IndecisiveTurtle> Is it the exact same operation, cause on vc2 the encoder applies wavelet transform while decoder reverses it

21:51 <IndecisiveTurtle> Or at least that is what I understood, haven't looked at the diracdec.c file much

21:52 <averne> I think? The matrices will change but in essence it's the same operation. So you could upload different matrices as UBOs or use a #define

21:53 MetaNova has quit [Ping timeout: 276 seconds]

21:53 <Lynne> no, they're completely different

21:54 <Lynne> we don't use matrices for wavelets

21:55 <IndecisiveTurtle> Ideally I want to use subgroups again for this DCT, need to study the code and see if its possible

21:55 <Lynne> prores uses regular 8x8 DCTs which you can get away with a matrix mult (but you shouldn't, full matrix mults are almost always slower than a traditional 2D DCT)

21:56 <averne> Yeah I wasn't talking about wavelets or VC2, I don't know anything about it. But I thought DCT was representable by malmuts?

21:56 <IndecisiveTurtle> Not sure about nvidia, but amd gpus translate matrix ops into a dozen fma instructions

21:56 <Lynne> I guess you could try to use the coop matrix extension, which would make it probably worth it

21:57 <averne> Yeah that would be very nice to use

21:57 <Lynne> using it is very much a total pain

21:57 <IndecisiveTurtle> I'm not sure if my GTX 1650 even supports that

21:57 <averne> Heh why? Is it because it runs in accelerators separate from the main shader cores?

21:58 <IndecisiveTurtle> The vk extension mentions SPV_NV_tensor_addressing being a dependency so yeah it probably needs tensor cores

21:59 <averne> IndecisiveTurtle: I have a 3050 so I should be able to do it. One of the issues I identified is precision, nvidia only supports up to 16b floats iirc

22:00 <IndecisiveTurtle> I think we need to use integers for the transforms to maintain bit accuracy

22:00 <Lynne> averne: no, the interface is utter pain to use

22:00 <Lynne> the glsl syntax

22:01 <IndecisiveTurtle> slang switch wen /s

22:01 <averne> IndecisiveTurtle: Oh that shouldn't be a problem then, I think? At least they have u8 variants here https://github.com/jeffbolznv/vk_cooperative_matrix_perf/tree/master/shaders

22:01 MetaNova has joined #ffmpeg-devel

22:01 <Marth64> Hello all. I am very sorry all for disappearing due to moving which was then followed by some IRL problems, I was underwater. I plan to be active again.

22:02 <IndecisiveTurtle> We probably need full 32-bit integers, but I'm not super sure

22:03 <Lynne> IndecisiveTurtle: never, they expect you to static link the libraries

22:03 <Lynne> because the language changes every 5 minutes

22:04 <Lynne> you can get away with 8*8->16bit DCTs

22:08 <IndecisiveTurtle> I see

22:09 <Lynne> but really, whatever's fastest

22:12 <IndecisiveTurtle> On amd gpus a whole 8x8 block can fit inside a wave hmm

22:16 <IndecisiveTurtle> I think it could work with 4 threads per row too, so it fills an nvidia subgroup. But it will be easiest to implement it first as a thread per row and then optimize it

22:25 kasper93_ has joined #ffmpeg-devel

22:25 kasper93 has quit [Killed (tantalum.libera.chat (Nickname regained by services))]

22:25 kasper93_ is now known as kasper93

22:36 <kierank> Marth64: hi

22:44 <Marth64> Hello!

22:44 <kasper93> o/

22:44 <kasper93> don't mind me, I'm just passing by

22:46 <Marth64> \o

23:15 kode54 has quit [Ping timeout: 252 seconds]

23:31 cone-710 has quit [Quit: transmission timeout]