<bslsk05>
bsky.app: @chordbug.bsky.social on Bluesky
<GeDaMo>
Yikes :|
<zid>
was already fixed internally 2 weeks ago apparently, but it's sort of crazy that broken pow (or maybe constant folding in msvc version used bug?) made it so far up the chain
<heat>
yeah but who has tests anyway?
<zid>
osu, apparently ;)
netbsduser` has joined #osdev
<zid>
Also that is my nightmare about using random high level language shite
<nikolar>
zid: how the heck did that happen
<zid>
either a bug in pow itself, or a bug in msvc
<heat>
someone changed pow + no tests
<zid>
osu -> .net -> msvcrt.dll
<nikolar>
heat: yeah but why
<heat>
various reasons
<nikolar>
why did anyone touch pow lol
<heat>
speed, correctness
<nikolar>
after probably 2 decades
<zid>
might ahve been fucking with ifdefs
<zid>
for newer cpus
<zid>
and accidentally the wrong lines
<zid>
pow's important to optimize
<GeDaMo>
Somebody suggested they might have asked an AI :P
<nikolar>
GeDaMo: that i can believe
<nikolar>
zid: sure, but, isn't it almost always just a polynomial approximation
<nikolar>
so, no particular thing to optimize for arch
<zid>
if msvc were that good
<zid>
you might
<zid>
but msvc is shit, so I bet it's all inline assembly
<nikolar>
kek i guess that's also true
<zid>
gcc is too shit to use avx inside str* and mem*
<heat>
glibc i386 pow is fully asm
<zid>
fuz rewrote all freebsd's the other day
<heat>
the other day aka like 2 years ago
<nikolar>
zid: it would use avx on -O3, but no one compiles libc with -O3
<nikolar>
i guess you could do -ftree-vectorize
<nikolar>
but eh
<zid>
it's shit at it
<zid>
and worse
<zid>
you need to do some fucky ass shit to make avx work with str* and mem*
<nikolar>
sure, but it's *ok* and you don't need platform specific asm :)
<zid>
except
<heat>
yes you do
<zid>
this is the fundamental function
<heat>
no one cares about *ok*
<zid>
that everything ELSE depends on for speed
<heat>
it's the libc
<nikolar>
yeah yeah
kata has quit [Read error: Connection reset by peer]
<nikolar>
but pow isn't str* or mem*
<nikolar>
and it's already going to be slow
<heat>
ftree-vectorize will almost always be worse than whatever you can crap out manually
<zid>
it's identical, but for -lm
<zid>
instead of -lc
<zid>
pow is *crucial*
<nikolar>
exp is crucial
<nikolar>
pow is just there
<heat>
zid: do you have visual studio installed?
<zid>
god no
<heat>
they ship the ucrt lib source code
<heat>
oh ok
<nikolar>
heat: oh they ship the souce
<heat>
yes
<nikolar>
that would've been nice to see
<heat>
install it
ekko has quit [Ping timeout: 260 seconds]
<nikolar>
i guess no one is allowed to host it *legally*
<heat>
on SLES!
<nikolar>
heat: hell no
<GeDaMo>
Maybe it's on github?
<zid>
msvc needs multiple reboots to install
<nikolar>
why
gog has joined #osdev
<zid>
microsoft reasons
<zid>
it probably hooks a bunch of explorer.exe to add right click menus, but uses illegal 3rd party ways to do it that needs it to reboot
kata has joined #osdev
<zid>
because you can't reload files at runtime on windows
<nikolar>
but why multiple reboots lol
<zid>
on linux, once you start a program you can rm it, on windows the file is locked
<nikolar>
ah yes
<nikolar>
windows sucks
<nikolar>
i almost forgot
<heat>
yeah
<heat>
versus on linux where the package manager can silently crash or corrupt processes
innegatives has quit [Quit: Connection closed for inactivity]
<nikolar>
you should reboot on linux after updates, for sure
<nikolar>
but you don't have to reboot a billion times to install them
<zid>
if you have to update your package manager in the middle
<zid>
you'd need two reboots under "you should reboot between udpates"
<zid>
pow is actually a really cool function to optimize btw
<nikolar>
why's that
<zid>
There's a very clever but sort of naive impl. that gets you almost all of the speedup, and it's conceptually clever
\Test_User has quit [Ping timeout: 276 seconds]
<zid>
you can binary-decompose integer exponents
<heat>
musl pow is actually readable because it was written by ARM
<heat>
thanks ARM
<zid>
so if you have idk, ^135
<zid>
you don't need to do n*n*n*n*n*n*n*n*n...
<zid>
after the first multiply you already have n*n
<zid>
so you can do m = n*n, m*m*m*m*m.. x135/2
<zid>
but after the first m*m, you now have n*n*n*n, so you can do that 33 times
<zid>
etc
hazard_hitman has joined #osdev
<zid>
which ends up just being.. the binary expansion of 135
<Ameisen>
I was able to update my MIPS emulator and toolchain before my shoulder surgery: https://github.com/ameisen/vemips I wasn't able to get around to fixing portability issues yet so it's still stuck on Windows. Not hard to fix but I'm stuck with one arm.
<bslsk05>
github.com: vemips/src/mips/instructions/instructions_table.cpp at master · ameisen/vemips · GitHub
<heat>
that looks slow
<Ameisen>
mips instructions can be annoying to parse
<zid>
I hope you checked the codegen for that
<Ameisen>
it is. MIPS is annoying.
<zid>
cus compilers tend to SUCK at optimizing out function pointers
<Ameisen>
the instructions have weird masked formats
<Ameisen>
the interpreter isnt the normal way it's used
<zid>
yea but that's fine, you just do what you did, but each nested switch is a small array
<heat>
computed goto or no-go
<Ameisen>
the dynamic recompiler only fetches these once or so
<heat>
ok
<Ameisen>
unless you invalidate it
<Ameisen>
but yeah, array at end would probably be faster.
<zid>
MIPS should be orthogonal enough that you could probably just mask 16 bits off and index it into an array of functions, and pass the other 16 bits in :P
<Ameisen>
there are, IIRC, 3 or 4 instruction formats
<Ameisen>
r6 added more masked instructions as well
<Ameisen>
i have a todo to weight the table by usage
<zid>
that's what gprof is for ;)
<Ameisen>
it's naively greedy right now
<Ameisen>
vemips can already dump that data
<zid>
I tested it on my gbz80, emulator, but it did precisely nothing, because it just compiles to a computed goto anyway
<Ameisen>
i just havent integrated it into this.
<zid>
I should test C23 case-ranges though
<zid>
and do a mix of switch and masking
<Ameisen>
ideally, you shouldnt be running interpreted, so i havent been very motivated to optimize the table.
<zid>
so if F0 was mov A, F1 was mov B, etc: case 0xF0..0xFF: mov_reg_reg(op & 0xF);
<zid>
rather than case F0: movA(); case F1: movB();
<zid>
I imagine it also does precisely nothing though, because codegen is already just.. doing a computed goto, so it'd just be MORE work to do it this way
<Ameisen>
not that the dynamic recompiler is much better in design. really need to rearch when I have my arm back.
<bslsk05>
github.com: vemips/tools/vevcbridge at master · ameisen/vemips · GitHub
<Ameisen>
that was used to get the gdb server in it to talk to Visual Studio 2015's debugger, so you could line-by-line debug mips programs in the IDE.