#linux-rockchip on 2025-07-22 — irc logs at libera.catirclogs.org

2023-07-10 07:36 mmind00 changed the topic of #linux-rockchip to: Rockchip development discussion | public log at https://libera.irclog.whitequark.org/linux-rockchip

00:23 wens has quit [Ping timeout: 240 seconds]

00:35 wens has joined #linux-rockchip

00:46 ungeskriptet has quit [Ping timeout: 248 seconds]

00:47 ungeskriptet has joined #linux-rockchip

00:58 ungeskriptet has quit [Ping timeout: 268 seconds]

01:04 ungeskriptet has joined #linux-rockchip

02:02 ungeskriptet has quit [Ping timeout: 252 seconds]

02:04 ungeskriptet has joined #linux-rockchip

02:35 ungeskriptet has quit [Ping timeout: 252 seconds]

02:36 ungeskriptet has joined #linux-rockchip

02:48 Daanct12 has joined #linux-rockchip

02:51 ungeskriptet has quit [Ping timeout: 252 seconds]

02:56 ungeskriptet has joined #linux-rockchip

03:04 ungeskriptet has quit [Ping timeout: 260 seconds]

03:07 ungeskriptet has joined #linux-rockchip

03:11 lucaceresoli has quit [Quit: WeeChat 2.8]

03:29 System_Error has quit [Remote host closed the connection]

03:34 ungeskriptet has quit [Ping timeout: 260 seconds]

03:34 ungeskriptet has joined #linux-rockchip

03:36 System_Error has joined #linux-rockchip

04:01 ungeskriptet has quit [Ping timeout: 248 seconds]

04:02 ungeskriptet has joined #linux-rockchip

04:03 digetx has quit [Quit: No Ping reply in 180 seconds.]

04:04 digetx has joined #linux-rockchip

04:10 ungeskriptet has quit [Ping timeout: 276 seconds]

04:37 tlwoerner has quit [Remote host closed the connection]

04:37 tlwoerner has joined #linux-rockchip

05:06 ungeskriptet has joined #linux-rockchip

05:16 ungeskriptet has quit [Ping timeout: 252 seconds]

05:50 franoosh has joined #linux-rockchip

06:35 warpme has joined #linux-rockchip

07:01 ldevulder has joined #linux-rockchip

07:15 franoosh has quit [Ping timeout: 240 seconds]

07:26 franoosh has joined #linux-rockchip

07:32 chewitt has joined #linux-rockchip

07:41 stikonas has joined #linux-rockchip

07:54 xha has quit [Ping timeout: 252 seconds]

08:37 raster has joined #linux-rockchip

08:57 franoosh has quit [Read error: Connection reset by peer]

08:57 erg_ has joined #linux-rockchip

09:02 fleg has quit [Remote host closed the connection]

09:03 digetx has quit [Quit: No Ping reply in 180 seconds.]

09:04 sfo has quit [Remote host closed the connection]

09:09 digetx has joined #linux-rockchip

09:22 naoki has joined #linux-rockchip

09:25 naoki has quit [Client Quit]

09:32 stikonas has quit [Remote host closed the connection]

09:50 cbeznea has joined #linux-rockchip

09:53 digetx has quit [Remote host closed the connection]

09:54 digetx has joined #linux-rockchip

09:59 digetx has quit [Remote host closed the connection]

10:00 digetx has joined #linux-rockchip

10:04 digetx has quit [Client Quit]

10:08 digetx has joined #linux-rockchip

10:10 sfo has joined #linux-rockchip

10:10 fleg has joined #linux-rockchip

10:37 <diederik> mmind00: Can we still sent a Revert of 1631cbdb8089 ("arm64: dts: rockchip: Improve LED config for NanoPi R5S") to Linus before 6.16 is released? And how would that work?

10:38 <mmind00> diederik: what is the issue with those LEDs?

10:38 <diederik> I've now done 20 warm reboots and 5 cold boots with 6.16-rc7 with that commit reverted and I haven't seen the hung-task problem yet

10:39 <diederik> mmind00: https://paste.sr.ht/~diederik/ea49022c6e8d371d69a8666262729197c0e5f740#NanoPi%20R5S:%20hung%20task%20(blocked%20on%20a%20mutex%20likely%20owned%20by%20task%20dhcpcd)-L1003 which in turn makes the WAN port not come up

10:39 <diederik> As that's the only port I'm using, that's a problem ... and a regression

10:41 <diederik> See also the discussion from 2025-07-18 20:20:33 (CEST)

10:42 <mmind00> I guess when you drop the "linux,default-trigger = "netdev";" line, the issue goes away?

10:42 ungeskriptet has joined #linux-rockchip

10:44 <diederik> could be and sounds logical, but so far I've only tried a revert of that commit to make sure that is indeed the culprit.

10:44 <diederik> but then again, without robmur01's hint I would have never thought it could be related to that commit

10:45 <mmind00> also, while that DT change triggers the issue, the error seems to be in the trigger routine

10:46 <mmind00> diederik: personally I'd like things minimal ... so if you could check if we could just drop the netdev trigger, that would be helpful

10:46 <mmind00> ah ... just realized those are all netdev triggers :-D

10:47 <diederik> yeah, the real problem is probably/possibly somewhere else and just triggered by my commit. So I figured that a proper investigation would be needed, but I'd prefer not to bring a regression into 6.16 before that is done

10:48 <diederik> If there was an issue with the netdev triggers, I'd have more expected them with the LAN ports

10:50 <mmind00> diederik: ok, so I guess I'll just do a revert, set you as "Reported-by" with your paste and send that to the armsoc people ... sounds ok?

10:50 <diederik> Yep

10:55 <diederik> I'll now do tests with the WAN trigger removed.

10:57 <diederik> If you prefer you could wait for that, but I don't/didn't know if it's possible and how long it would take to get it to Linus (before 6.16 is released)

10:59 <qschulz> diederik: worst case scenario you can ask for a backport to 6.16 and it would make it to 6.16.1 for example

11:03 <diederik> dropping the trigger on the WAN port isn't enough. Now I'll put that back and drop the triggers on the LAN ports

11:15 <robmur01> Got it: lock inversion between pid 615 and 758 - dev_change_flags holds rtnl_lock and ends up waiting for triggers_lock; meanwhile netdev_trig_activate() is trying to take rtnl_lock while led_trigger_regsiter() holds triggers_lock

11:16 <robmur01> diederik: any chance you could rebuild with lockdep enabled, confirm the splat and report it?

11:18 <diederik> robmur01: I don't really understand what that means, but if you have a patch I'd be happy to try that (and report about it)

11:20 <diederik> Dropping the LAN ports triggers wasn't enough either, so now testing with no netdev triggers

11:22 <robmur01> I mean can you try enabling PROVE_LOCKING in your kernel config, and boot with the triggers enabled - that should spit out a report of the deadlock condition, which you can then give to the netdev/LED maintainers to fix

11:25 <diederik> robmur01: I can/will do that :)

11:30 ungeskriptet has quit [Remote host closed the connection]

11:31 ungeskriptet has joined #linux-rockchip

11:35 <diederik> I've now warm rebooted 10 times in a row with no netdev triggers and that all went fine

11:37 System_Error has quit [Ping timeout: 244 seconds]

11:50 ungeskriptet has quit [Remote host closed the connection]

11:51 ungeskriptet has joined #linux-rockchip

11:56 System_Error has joined #linux-rockchip

11:59 <chewitt> now that I have HEVC working nicely on 3588/3576 I thinking .. I wonder what's needed for HDR to work?

11:59 <chewitt> so this was timely https://patchwork.kernel.org/project/linux-rockchip/cover/20250721-rk3588-10bpc-v1-0-e95a4abcf482@collabora.com/

12:00 <chewitt> however when I pick the commits, I end up with no DRM device for Kodi to render to

12:01 <chewitt> not sure if Cristian lurks here or not, but thought I'd pass that info along :)

12:17 <diederik> mmind00: I've now warm rebooted 20 times with the netdev triggers dropped, so I would be fine with a fixup instead of a full revert

12:17 <diederik> Currently building new kernel with PROVE_LOCKING for further investigation ...

12:17 <mmind00> diederik: nice ... but this one you should send to me :-)

12:18 <diederik> ok, I can do that :)

12:18 <mmind00> "mechanical changes" I can do myself, but when it comes to making stuff work, having seen things working on the actual hw is more helpful :-)

12:38 <diederik> done :)

12:38 <Daanct12> diederik: have you looked into rkvop2 module issue?

12:39 <diederik> Daanct12: not sure what issue you're referring to, but likely not

12:40 <Daanct12> so if you turn vop2 into a module (not builtin) your display would not work

12:40 <diederik> I have built a kernel where the order of the 10-bit and 8-bit formats was reversed, but haven't gotten around to actually testing that

12:41 <diederik> oh that one :) Piotr fixed that. Let me look up the ML post ...

12:41 <diederik> https://lore.kernel.org/linux-rockchip/20250706083629.140332-2-pZ010001011111@proton.me/

12:50 <chewitt> diederik better fix than simply swapping the format order https://github.com/chewitt/linux/commit/9b0cff1056afbf8679a16b0c52231c6805a31bc5

12:52 <diederik> chewitt: awesome :) I was pretty sure swapping wasn't a/the proper fix, but it would 'prove' where the problem lies

12:53 <diederik> and I was also curious what effect that would have on 8-bit media

12:53 <chewitt> none that I could see

12:53 <chewitt> it was rendering 8-bit as NV12 and 10-bit as NV15, which was correct

12:54 <diederik> great :)

12:54 <chewitt> but the reorder was only ever a workaround until someone that actually reads/authors code eyeballed the real problem

12:54 <diederik> Yeah, my tests would just be for a '+1' on your findings :)

13:16 System_Error has quit [Remote host closed the connection]

13:17 <detlevc> chewitt: nice find ! It was supposed to keep 420/8 and 420/10 indeed, the decoder (as much as I can tell, doesn't support hevc 422)

13:17 <detlevc> I will change that in the next version of the series

13:18 <chewitt> Alex Bee found/spotted the real problem, but good to see it will be fixed up

13:20 Daanct12 has quit [Quit: WeeChat 4.6.3]

13:23 System_Error has joined #linux-rockchip

14:03 xha has joined #linux-rockchip

14:07 mripard has joined #linux-rockchip

15:01 System_Error has quit [Remote host closed the connection]

15:08 System_Error has joined #linux-rockchip

15:12 lucaceresoli has joined #linux-rockchip

15:18 digetx has quit [Remote host closed the connection]

15:20 digetx has joined #linux-rockchip

15:32 dsimic has quit [Ping timeout: 240 seconds]

15:34 dsimic has joined #linux-rockchip

15:39 ldevulder has quit [Ping timeout: 240 seconds]

15:41 warpme has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

16:00 <diederik> Interesting. Booted into the kernel with PROVE_LOCKING enabled ... and 'memtest' as kernel parameter and all 3 boot attempts resulted in OOPS :-O

16:00 <diederik> Boot 1: https://paste.sr.ht/~diederik/75ea1cec45eedb52b9ed29da431620012b77d19f ; Boot 2: https://paste.sr.ht/~diederik/5a524626bc7e354902a11a4b19db65e9b6e0df46

16:01 <diederik> I'm going to assume each one will now OOPS; will boot into a different kernel and then drop the 'memtest' parameter

16:27 <robmur01> oof, yeah, all the extra lockdep work can indeed change the timing of things and make certain conditions more or less likely

16:29 <robmur01> that oops is now looking suspiciously consistent, that I'm now less convinced it's random, just completely unrelated to the LED thing

16:29 <robmur01> definitely smells like whatever driver is failing probe there has corrupted a clock item in its devres list

16:32 <diederik> Didn't save OOPS 3, but did save OOPS 4 but now without memtest (but 'confusingly' named that paste OOPS 3): https://paste.sr.ht/~diederik/67cbf2b3daa9936a7ae039d208fd5db2974fdf86

16:33 <diederik> I just noticed that OOPS 1 was "error -EEXIST: failed to register extcon device" (again), but OOPS 2 & 3 are not

16:38 <robmur01> the "corruption" itself is also puzzling: not a bad pointer, or NULL, or some numeric value, nor even ASCII... just a load of random-looking bytes written over where a pointer should be... what does that?

16:40 <diederik> I have no idea how to interpret the printed out data, but if you click on line 1045 in OOPS 2 and line 447 in OOPS 3 and then switch between tabs ... there is a LOT of data the same

16:42 <diederik> or f.e. in the x0..x29 fields, they have either the same values, or there is/seems to be a consistent pattern/difference between the ones that do differ

16:44 <robmur01> yeah, the register state is likely to be pretty consistent for the same call stack - that "SUBSYSTEM=" in x16/17 is intriguing but I think unrelated :)

16:44 <diederik> ok :)

16:45 <robmur01> don't suppose you have DYNAMIC_DEBUG enabled so you can quickly boot 'dyndbg="file dd.c +p"' (or something to that effect) to signpost the driver probing?

16:45 <diederik> AFAIK I do have that enabled. I have used it a couple of times

16:46 <robmur01> if only the standard "failed to probe" message wasn't later than devres_release_all()...

16:46 <diederik> do you want me to literally use "file dd.c +p" ? Or was that just an example

16:49 <robmur01> just the ones in really_probe() should suffice to tell which driver is the culprit here

16:50 raster has quit [Quit: Gettin' stinky!]

16:54 <robmur01> my hunch is the USB phy, since that does have a clk_bulk_get_all() and is already sometimes implicated by the extcon errors...

16:58 stikonas has joined #linux-rockchip

16:58 <diederik> https://paste.sr.ht/~diederik/f9c72d9333d6107e5b799695af9dc443f41c12ad

17:07 <robmur01> SDHCI? :O

17:09 <diederik> I can also boot from SD card if that helps (probably need to install one/some extra kernels, but that's not a problem).

17:10 <diederik> I started using the eMMC as that was needed for MASKROM

17:10 <diederik> or is sdhci the (likely) culprit?

17:20 <robmur01> that's the last driver it started to probe before crashing, but it's possible they were on different threads

17:22 ungeskriptet has quit [Remote host closed the connection]

17:23 ungeskriptet has joined #linux-rockchip

17:23 <robmur01> trying the same with "maxcpus=1" is probably easier than rebuilding again to enable the printk CPU thing

17:27 vagrantc has joined #linux-rockchip

17:31 chewitt has quit [Quit: Zzz..]

17:31 <diederik> will try "maxcpus=1" first. What Kconfig option do I need for the printk CPU thing?

17:34 <robmur01> I think it's PRINTK_CALLER

17:34 <robmur01> often useful, often just irritatingly verbose :)

17:36 <diederik> it's indeed disabled, but I can start a new kernel build in the mean time :)

17:40 <diederik> 6.16-rc7 + dyndbg + maxcpus=1 : https://paste.sr.ht/~diederik/bd781503d4dc26ee768cb7d4c640038fede75419

17:41 <diederik> "platform fe310000.mmc: bus: 'platform': really_probe: probing driver sdhci-dwcmshc with device" I guess that confirms your suspicion?

17:51 <robmur01> oh FFS, there it is: sdhci_platfm_free() right at the end of dwcmshc_probe()... guess where that "priv" area is that various devres things are still pointing to when it defers because the regulator isn't ready?

17:57 <robmur01> that's been broken nearly a year :(

18:28 erg_ has quit [Ping timeout: 248 seconds]

19:22 a3f has quit [Ping timeout: 248 seconds]

19:22 a3f has joined #linux-rockchip

19:23 helene has quit [Ping timeout: 248 seconds]

19:27 helene has joined #linux-rockchip

19:45 cbeznea has quit [Ping timeout: 240 seconds]

20:02 necessarypinch has quit [Quit: The Lounge - https://thelounge.chat]

20:04 necessarypinch has joined #linux-rockchip

20:11 ldevulder has joined #linux-rockchip

21:08 <diederik> robmur01: Thanks :) Would an output with PRINTK_CALLER still be useful?

21:13 ldevulder has quit [Ping timeout: 240 seconds]

21:22 <diederik> Will install it anyway; curious what it does :)

21:52 raster has joined #linux-rockchip

22:56 raster has quit [Quit: Gettin' stinky!]

23:41 digetx has quit [Remote host closed the connection]

23:42 digetx has joined #linux-rockchip