mmind00 changed the topic of #linux-rockchip to: Rockchip development discussion | public log at https://libera.irclog.whitequark.org/linux-rockchip
wens has quit [Ping timeout: 240 seconds]
wens has joined #linux-rockchip
ungeskriptet has quit [Ping timeout: 248 seconds]
ungeskriptet has joined #linux-rockchip
ungeskriptet has quit [Ping timeout: 268 seconds]
ungeskriptet has joined #linux-rockchip
ungeskriptet has quit [Ping timeout: 252 seconds]
ungeskriptet has joined #linux-rockchip
ungeskriptet has quit [Ping timeout: 252 seconds]
ungeskriptet has joined #linux-rockchip
Daanct12 has joined #linux-rockchip
ungeskriptet has quit [Ping timeout: 252 seconds]
ungeskriptet has joined #linux-rockchip
ungeskriptet has quit [Ping timeout: 260 seconds]
ungeskriptet has joined #linux-rockchip
lucaceresoli has quit [Quit: WeeChat 2.8]
System_Error has quit [Remote host closed the connection]
ungeskriptet has quit [Ping timeout: 260 seconds]
ungeskriptet has joined #linux-rockchip
System_Error has joined #linux-rockchip
ungeskriptet has quit [Ping timeout: 248 seconds]
ungeskriptet has joined #linux-rockchip
digetx has quit [Quit: No Ping reply in 180 seconds.]
digetx has joined #linux-rockchip
ungeskriptet has quit [Ping timeout: 276 seconds]
tlwoerner has quit [Remote host closed the connection]
tlwoerner has joined #linux-rockchip
ungeskriptet has joined #linux-rockchip
ungeskriptet has quit [Ping timeout: 252 seconds]
franoosh has joined #linux-rockchip
warpme has joined #linux-rockchip
ldevulder has joined #linux-rockchip
franoosh has quit [Ping timeout: 240 seconds]
franoosh has joined #linux-rockchip
chewitt has joined #linux-rockchip
stikonas has joined #linux-rockchip
xha has quit [Ping timeout: 252 seconds]
raster has joined #linux-rockchip
franoosh has quit [Read error: Connection reset by peer]
erg_ has joined #linux-rockchip
fleg has quit [Remote host closed the connection]
digetx has quit [Quit: No Ping reply in 180 seconds.]
sfo has quit [Remote host closed the connection]
digetx has joined #linux-rockchip
naoki has joined #linux-rockchip
naoki has quit [Client Quit]
stikonas has quit [Remote host closed the connection]
cbeznea has joined #linux-rockchip
digetx has quit [Remote host closed the connection]
digetx has joined #linux-rockchip
digetx has quit [Remote host closed the connection]
digetx has joined #linux-rockchip
digetx has quit [Client Quit]
digetx has joined #linux-rockchip
sfo has joined #linux-rockchip
fleg has joined #linux-rockchip
<diederik> mmind00: Can we still sent a Revert of 1631cbdb8089 ("arm64: dts: rockchip: Improve LED config for NanoPi R5S") to Linus before 6.16 is released? And how would that work?
<mmind00> diederik: what is the issue with those LEDs?
<diederik> I've now done 20 warm reboots and 5 cold boots with 6.16-rc7 with that commit reverted and I haven't seen the hung-task problem yet
<diederik> mmind00: https://paste.sr.ht/~diederik/ea49022c6e8d371d69a8666262729197c0e5f740#NanoPi%20R5S:%20hung%20task%20(blocked%20on%20a%20mutex%20likely%20owned%20by%20task%20dhcpcd)-L1003 which in turn makes the WAN port not come up
<diederik> As that's the only port I'm using, that's a problem ... and a regression
<diederik> See also the discussion from 2025-07-18 20:20:33 (CEST)
<mmind00> I guess when you drop the "linux,default-trigger = "netdev";" line, the issue goes away?
ungeskriptet has joined #linux-rockchip
<diederik> could be and sounds logical, but so far I've only tried a revert of that commit to make sure that is indeed the culprit.
<diederik> but then again, without robmur01's hint I would have never thought it could be related to that commit
<mmind00> also, while that DT change triggers the issue, the error seems to be in the trigger routine
<mmind00> diederik: personally I'd like things minimal ... so if you could check if we could just drop the netdev trigger, that would be helpful
<mmind00> ah ... just realized those are all netdev triggers :-D
<diederik> yeah, the real problem is probably/possibly somewhere else and just triggered by my commit. So I figured that a proper investigation would be needed, but I'd prefer not to bring a regression into 6.16 before that is done
<diederik> If there was an issue with the netdev triggers, I'd have more expected them with the LAN ports
<mmind00> diederik: ok, so I guess I'll just do a revert, set you as "Reported-by" with your paste and send that to the armsoc people ... sounds ok?
<diederik> Yep
<diederik> I'll now do tests with the WAN trigger removed.
<diederik> If you prefer you could wait for that, but I don't/didn't know if it's possible and how long it would take to get it to Linus (before 6.16 is released)
<qschulz> diederik: worst case scenario you can ask for a backport to 6.16 and it would make it to 6.16.1 for example
<diederik> dropping the trigger on the WAN port isn't enough. Now I'll put that back and drop the triggers on the LAN ports
<robmur01> Got it: lock inversion between pid 615 and 758 - dev_change_flags holds rtnl_lock and ends up waiting for triggers_lock; meanwhile netdev_trig_activate() is trying to take rtnl_lock while led_trigger_regsiter() holds triggers_lock
<robmur01> diederik: any chance you could rebuild with lockdep enabled, confirm the splat and report it?
<diederik> robmur01: I don't really understand what that means, but if you have a patch I'd be happy to try that (and report about it)
<diederik> Dropping the LAN ports triggers wasn't enough either, so now testing with no netdev triggers
<robmur01> I mean can you try enabling PROVE_LOCKING in your kernel config, and boot with the triggers enabled - that should spit out a report of the deadlock condition, which you can then give to the netdev/LED maintainers to fix
<diederik> robmur01: I can/will do that :)
ungeskriptet has quit [Remote host closed the connection]
ungeskriptet has joined #linux-rockchip
<diederik> I've now warm rebooted 10 times in a row with no netdev triggers and that all went fine
System_Error has quit [Ping timeout: 244 seconds]
ungeskriptet has quit [Remote host closed the connection]
ungeskriptet has joined #linux-rockchip
System_Error has joined #linux-rockchip
<chewitt> now that I have HEVC working nicely on 3588/3576 I thinking .. I wonder what's needed for HDR to work?
<chewitt> however when I pick the commits, I end up with no DRM device for Kodi to render to
<chewitt> not sure if Cristian lurks here or not, but thought I'd pass that info along :)
<diederik> mmind00: I've now warm rebooted 20 times with the netdev triggers dropped, so I would be fine with a fixup instead of a full revert
<diederik> Currently building new kernel with PROVE_LOCKING for further investigation ...
<mmind00> diederik: nice ... but this one you should send to me :-)
<diederik> ok, I can do that :)
<mmind00> "mechanical changes" I can do myself, but when it comes to making stuff work, having seen things working on the actual hw is more helpful :-)
<diederik> done :)
<Daanct12> diederik: have you looked into rkvop2 module issue?
<diederik> Daanct12: not sure what issue you're referring to, but likely not
<Daanct12> so if you turn vop2 into a module (not builtin) your display would not work
<diederik> I have built a kernel where the order of the 10-bit and 8-bit formats was reversed, but haven't gotten around to actually testing that
<diederik> oh that one :) Piotr fixed that. Let me look up the ML post ...
<chewitt> diederik better fix than simply swapping the format order https://github.com/chewitt/linux/commit/9b0cff1056afbf8679a16b0c52231c6805a31bc5
<diederik> chewitt: awesome :) I was pretty sure swapping wasn't a/the proper fix, but it would 'prove' where the problem lies
<diederik> and I was also curious what effect that would have on 8-bit media
<chewitt> none that I could see
<chewitt> it was rendering 8-bit as NV12 and 10-bit as NV15, which was correct
<diederik> great :)
<chewitt> but the reorder was only ever a workaround until someone that actually reads/authors code eyeballed the real problem
<diederik> Yeah, my tests would just be for a '+1' on your findings :)
System_Error has quit [Remote host closed the connection]
<detlevc> chewitt: nice find ! It was supposed to keep 420/8 and 420/10 indeed, the decoder (as much as I can tell, doesn't support hevc 422)
<detlevc> I will change that in the next version of the series
<chewitt> Alex Bee found/spotted the real problem, but good to see it will be fixed up
Daanct12 has quit [Quit: WeeChat 4.6.3]
System_Error has joined #linux-rockchip
xha has joined #linux-rockchip
mripard has joined #linux-rockchip
System_Error has quit [Remote host closed the connection]
System_Error has joined #linux-rockchip
lucaceresoli has joined #linux-rockchip
digetx has quit [Remote host closed the connection]
digetx has joined #linux-rockchip
dsimic has quit [Ping timeout: 240 seconds]
dsimic has joined #linux-rockchip
ldevulder has quit [Ping timeout: 240 seconds]
warpme has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<diederik> Interesting. Booted into the kernel with PROVE_LOCKING enabled ... and 'memtest' as kernel parameter and all 3 boot attempts resulted in OOPS :-O
<diederik> I'm going to assume each one will now OOPS; will boot into a different kernel and then drop the 'memtest' parameter
<robmur01> oof, yeah, all the extra lockdep work can indeed change the timing of things and make certain conditions more or less likely
<robmur01> that oops is now looking suspiciously consistent, that I'm now less convinced it's random, just completely unrelated to the LED thing
<robmur01> definitely smells like whatever driver is failing probe there has corrupted a clock item in its devres list
<diederik> Didn't save OOPS 3, but did save OOPS 4 but now without memtest (but 'confusingly' named that paste OOPS 3): https://paste.sr.ht/~diederik/67cbf2b3daa9936a7ae039d208fd5db2974fdf86
<diederik> I just noticed that OOPS 1 was "error -EEXIST: failed to register extcon device" (again), but OOPS 2 & 3 are not
<robmur01> the "corruption" itself is also puzzling: not a bad pointer, or NULL, or some numeric value, nor even ASCII... just a load of random-looking bytes written over where a pointer should be... what does that?
<diederik> I have no idea how to interpret the printed out data, but if you click on line 1045 in OOPS 2 and line 447 in OOPS 3 and then switch between tabs ... there is a LOT of data the same
<diederik> or f.e. in the x0..x29 fields, they have either the same values, or there is/seems to be a consistent pattern/difference between the ones that do differ
<robmur01> yeah, the register state is likely to be pretty consistent for the same call stack - that "SUBSYSTEM=" in x16/17 is intriguing but I think unrelated :)
<diederik> ok :)
<robmur01> don't suppose you have DYNAMIC_DEBUG enabled so you can quickly boot 'dyndbg="file dd.c +p"' (or something to that effect) to signpost the driver probing?
<diederik> AFAIK I do have that enabled. I have used it a couple of times
<robmur01> if only the standard "failed to probe" message wasn't later than devres_release_all()...
<diederik> do you want me to literally use "file dd.c +p" ? Or was that just an example
<robmur01> just the ones in really_probe() should suffice to tell which driver is the culprit here
raster has quit [Quit: Gettin' stinky!]
<robmur01> my hunch is the USB phy, since that does have a clk_bulk_get_all() and is already sometimes implicated by the extcon errors...
stikonas has joined #linux-rockchip
<robmur01> SDHCI? :O
<diederik> I can also boot from SD card if that helps (probably need to install one/some extra kernels, but that's not a problem).
<diederik> I started using the eMMC as that was needed for MASKROM
<diederik> or is sdhci the (likely) culprit?
<robmur01> that's the last driver it started to probe before crashing, but it's possible they were on different threads
ungeskriptet has quit [Remote host closed the connection]
ungeskriptet has joined #linux-rockchip
<robmur01> trying the same with "maxcpus=1" is probably easier than rebuilding again to enable the printk CPU thing
vagrantc has joined #linux-rockchip
chewitt has quit [Quit: Zzz..]
<diederik> will try "maxcpus=1" first. What Kconfig option do I need for the printk CPU thing?
<robmur01> I think it's PRINTK_CALLER
<robmur01> often useful, often just irritatingly verbose :)
<diederik> it's indeed disabled, but I can start a new kernel build in the mean time :)
<diederik> "platform fe310000.mmc: bus: 'platform': really_probe: probing driver sdhci-dwcmshc with device" I guess that confirms your suspicion?
<robmur01> oh FFS, there it is: sdhci_platfm_free() right at the end of dwcmshc_probe()... guess where that "priv" area is that various devres things are still pointing to when it defers because the regulator isn't ready?
<robmur01> that's been broken nearly a year :(
erg_ has quit [Ping timeout: 248 seconds]
a3f has quit [Ping timeout: 248 seconds]
a3f has joined #linux-rockchip
helene has quit [Ping timeout: 248 seconds]
helene has joined #linux-rockchip
cbeznea has quit [Ping timeout: 240 seconds]
necessarypinch has quit [Quit: The Lounge - https://thelounge.chat]
necessarypinch has joined #linux-rockchip
ldevulder has joined #linux-rockchip
<diederik> robmur01: Thanks :) Would an output with PRINTK_CALLER still be useful?
ldevulder has quit [Ping timeout: 240 seconds]
<diederik> Will install it anyway; curious what it does :)
raster has joined #linux-rockchip
raster has quit [Quit: Gettin' stinky!]
digetx has quit [Remote host closed the connection]
digetx has joined #linux-rockchip