#osdev on 2025-07-20 — irc logs at libera.catirclogs.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:14 c0co has quit [Ping timeout: 252 seconds]

00:34 netbsduser has quit [Ping timeout: 240 seconds]

01:00 Matt|home has quit [Quit: Matt|home]

01:22 <Ameisen> heat: iirc, most chips prefer negative polarity - that is, they assume that a branch that decreases the current program counter is taken at first? (Except for chips that just choose randomly)

01:23 <heat> not sure about backwards jumps, i guess it makes sense that they would, cuz of loops

01:23 <Ameisen> the instruction count branch is only generated if it's needed, but it is often needed since iterative/pausing execution is one of the features.

01:23 <heat> but for forwards branches they definitely assume you're not taking the branch

01:23 <Ameisen> the main issue generally would be the myriad branches for potential exceptions

01:23 <Ameisen> hard to eliminate the forward branches in regards to exceptions, at least

01:24 <Ameisen> usually ends up getting written as: operation, check exception, goto no_exception, throw exception, no_exception: store result

01:24 <Ameisen> at some point a forward jump is required

01:24 <heat> well that's wrong

01:24 <nikolar> i think they assume that backwards are taken and forwards are not taken

01:24 <nikolar> at least when encountered for the first ime

01:24 <heat> unlikely stuff should go towards the end of the code

01:24 <heat> dont jump over things in likely paths

01:25 <Ameisen> then it'd need to be: operation, check exception, goto exception if exception, store result, goto no_exception, exception: throw exception, no_exception:

01:25 <Ameisen> so a forward jump over unlikely code instead

01:26 <Ameisen> I can certainly change it to generate that sort of sequence insteead

01:28 <heat> that still sucks

01:28 <heat> don't jump over code

01:28 <Ameisen> well, I have to jump at some point

01:28 <heat> you really don't

01:28 <Ameisen> how do I handle exceptions, then?

01:28 <heat> i mean

01:29 <heat> don't jump over code in the hotpath

01:29 <heat> you only need those exception: thunks once every ~2GB on x86

01:29 <heat> (because of jmp imm32)

01:30 <Ameisen> yes, those jumps for exceptions go to those thunks.

01:30 <Ameisen> I put them every generated chunk, but meh

01:32 <heat> that's just crap

01:32 <heat> you're adding extra jmps and filling your icache with stuff that's almost never going to run

01:33 <heat> mjg would say it is PESSIMAL but the guy's sleeping

01:38 <Ameisen> This is what a simple instruction looks like: https://pastebin.com/GuF1JraL

01:39 <Ameisen> written out, since this isn't what the patches actually look like in source :|

01:40 <zid> why do you need the overflow case btw?

01:40 <Ameisen> there are some instructions - especially load/stores - which are more complex though, and sometimes involves calls back into the interpreter. They also usually need to pass an operand to the thunk. Right now their order is opposite of that.

01:40 <zid> flags act differently?

01:40 <Ameisen> zid: because the MIPS32r6 spec mandates it.

01:41 <zid> yes but *why*

01:41 <zid> what do you have to implement there

01:41 <Ameisen> because MIPS doesn't have flags.

01:41 <zid> oh it causes an exception interrupt?

01:41 <Ameisen> yes

01:41 <zid> neat

01:41 <Ameisen> it doesn't have an exception interrupt for divide-by-zero, interestingly.

01:41 <Ameisen> it's just 'undefined'

01:42 <Ameisen> so I end up having to turn x86 flags into exceptions, thankfully there are jumps specifically for those flags

01:42 <Ameisen> there are just... a lot of jumps.

01:42 <Ameisen> ever `ADD` needs one simply because the flag must be checked, as an example

01:43 <Ameisen> well, almost every `ADD`, some of the conditions generate different patches that cannot overflow.

01:43 <zid> Yea sounds like it would benefit heavily from some optimization

01:43 <Ameisen> yeah, I'm just unsure how. I need to handle the overflow exception, I'm just not sure what the best approach is. As said, certain instructions are more complex and have more jumps - load/stores can be... weird.

01:44 <Ameisen> since I'm also checking for valid address ranges and such

01:44 <zid> well, the 'best' way would be to prove it can be elided

01:44 <Ameisen> patched jumps are probably the weirdest.

01:44 <zid> otherwise you're just emulating it

01:44 <Ameisen> presently, I only perform static analysis on the registers themselves - it doesn't try to introspect on the values.

01:45 <Ameisen> I have an idea on how to do that that won't break things (I cannot do tracing, but I can generate short 'hot paths' that execute as a single unit instead)

01:45 <zid> You need to write an optimizing compiler, effectively

01:46 <Ameisen> https://github.com/ameisen/vemips/blob/master/src/mips/processor/jit/jit1/jit1_instructions.cpp#L959-L1021 - that's what the patch actually looks like

01:46 <zid> where the equivalent C source you're compiling is if(a + (long)b > UINT_MAX) except_overflow();

01:46 <zid> so you can either prove a + b can't be big enough, or that except_overflow has no effects

01:46 <zid> but that's obviously, hard

01:46 <Ameisen> right, that requires some level of introspection into what the values can potentially be at that point.

01:46 <Ameisen> which is possible to a point in a tracing optimizer

01:47 <Ameisen> I can know, at least, sometimes if they're zero (since $0 is always zero)

01:47 <Ameisen> surprisingly, the compiler generates that more often that I'd expect... basically just making moves

01:47 <Ameisen> not sure why.

01:48 <zid> It may literally be aliased from 'mov'

01:48 <zid> on mips 1 I did 'ori' for my mov

01:48 <Ameisen> it's not - the table generator masks the instructions out if they're aliased.

01:49 <zid> what?

01:49 <Ameisen> mips32r6 has some instructions that only differ by a bit, and they're often defined as the same instruction. The table generator masks those out when generating lookups.

01:49 <Ameisen> so they will get resolved as different instructions

01:49 <zid> what's a table generator, what table generator, what is 'aliased', how is this a response to what I said?

01:49 <zid> I literally don't understand any of it

01:52 <Ameisen> I define the instructions by mask and masked bits. Something like, say, EHB and SLL which are the same instruction are distinguished by their mask (zero registers and a specific shift size in that case), so they show up to the system as distinct instructions.

01:52 <Ameisen> the table generator just generates the lookup table from that

01:53 <zid> I didn't ask you what you defined things as

01:53 <zid> you said the compiler generated a lot of add n, r, 0

01:53 <zid> I said yes, that's a common way to implement the instruction 'mov'

01:53 <zid> I either use that or 'ori' for mov

01:53 <zid> or addiu I guess, for mips

01:53 <Ameisen> Well, the funny thing is that I see _both_

01:53 <Ameisen> multiple instructions used like that

01:53 <Ameisen> that's what's weird to me.

01:54 <zid> might be different peepholes?

01:54 <zid> idk what compiler it is

01:54 <Ameisen> Clang, so LLVM backend.

01:54 <zid> an smips has 'li' meaning addiu apparently

01:54 <Ameisen> I even saw a few shift instructions with shifts of zero.

01:54 <zid> padding? filling delay slots?

01:55 <zid> you said no flags so probably not that

01:55 <Ameisen> it's possible, though any instruction would do then. They were operating as moves still, not identity-writes.

01:55 <Ameisen> There are internal flags that are 'defined' but not quite physical, like delay branches

01:55 <zid> I can only assume they just get picked by different codepaths in their optimizer

01:55 <Ameisen> they're not user-visible though

01:56 <zid> unless something clever pops into my face

01:56 <Ameisen> I do have the compiler for the toolchain set to avoid delay branches, though - they're slower than compact branches because there's more logic associated with them.

01:56 <Ameisen> that's my guess, I just wasn't expecting it.

01:57 <Ameisen> I was thinking "do I really need to write the optimal patches for things like shifting by zero? Why would they do that?"

01:59 <zid> mark mips down as "not amenable to JIT" and run it in an interp :P

01:59 <Ameisen> I mean, I do have an interpreter as well, they interplay (can and do switch between them)

01:59 <Ameisen> even the current dynamic recompiler is vastly faster than it, though.

02:03 <Ameisen> with it, performance test runs in 33.5 seconds right now (i have some test logic in, normally closer to 30). Fully interpreted, it takes... well, it's still running.

02:06 <Ameisen> Natively, it takes around 4-5 seconds to run.

02:16 <Ameisen> interpreted: 819s

02:22 <zid> your interp is bad and it should feel bad

02:22 <zid> you should be getting a few cycles per cycle, not whatever that is

02:54 GeDaMo has joined #osdev

03:06 _whitelogger has joined #osdev

03:10 sprock has joined #osdev

03:22 parabirb has quit [Quit: ZNC 1.8.2+deb3.1+deb12u1 - https://znc.in]

03:23 parabirb has joined #osdev

03:29 sprock has quit [Remote host closed the connection]

03:42 jcea has joined #osdev

03:52 jcea has quit [Ping timeout: 248 seconds]

03:57 Matt|home has joined #osdev

04:10 xenos1984 has quit [Quit: Leaving.]

05:08 _whitelogger has joined #osdev

05:24 mrpops2ko has quit [Ping timeout: 240 seconds]

05:25 mrpops2ko has joined #osdev

05:30 mrpops2ko has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

05:33 stux|away has joined #osdev

05:34 stux has quit [Ping timeout: 268 seconds]

06:27 bauen1 has quit [Ping timeout: 260 seconds]

06:38 mrpops2ko has joined #osdev

07:09 Lucretia has joined #osdev

08:02 c0co has joined #osdev

08:10 netbsduser has joined #osdev

08:19 c0co_ has joined #osdev

08:20 c0co has quit [Ping timeout: 272 seconds]

08:38 c0co_ is now known as c0co

08:43 aejsmith has quit [Quit: Lost terminal]

08:45 aejsmith has joined #osdev

09:21 the_oz has joined #osdev

09:23 bauen1 has joined #osdev

09:45 Left_Turn has joined #osdev

10:00 _whitelogger has joined #osdev

10:05 bleb has quit [Ping timeout: 252 seconds]

10:09 bleb has joined #osdev

10:12 <Ameisen> zid: I do feel bad :(

10:12 <Ameisen> though I'm not sure how you can possibly get a few cycles per cycle emulating MIPS in an interpreter. The overhead should at least be an order of magnitude worse.

10:14 <zid> why would it be, it should be like, a mask, a jump, then two more masks, then a single instruction, in 99% of cases

10:14 <zid> and x86 is magic and will run half of that before you even asked it to

10:14 <zid> while it waits for the previous write to settle

10:34 <heat> computed goto

10:34 <geist> getting near 10:1 is about as good as you

10:34 <geist> you'll get with an interpreter, from what i understand

10:35 <geist> an emulator i wrote years ago transcoded into a more expanded, kinda VLIW looking instruction set that ends up being basically a big switch statement and everything computed

10:35 <geist> got fairly close to 10:1 average

10:36 <geist> well more like 10:1 overhead of the loop

10:48 the_oz has quit [Quit: Leaving]

10:52 <nikolar> so you got the original code and transpiled it into your own representation?

10:58 <heat> transmeta but geist

11:00 <nikolar> kek

11:00 <nikolar> and in software

11:01 <mjg> OY netbsd landed O_CLOFORK

11:02 <mjg> This is Ricardo Branco's implementation of O_CLOFORK (and

11:02 <mjg> associated fcntl, etc) for NetBSD (with a few minor changes

11:02 <mjg> by me).

11:07 <nikolar> nice

11:07 <mjg> idk mon

11:08 <mjg> this bit is interesting tho https://austingroupbugs.net/view.php?id=1851

11:08 <kof673> and these three are one (firm/hard/software) /s

11:08 <mjg> i verified freebsd and netbsd account for it

11:08 <mjg> the openbsd folk opened it

11:08 <mjg> according to the ticket solaris and illumos also handle the thing

11:08 <mjg> does SORTIX do it

11:08 <mjg> sortie: https://austingroupbugs.net/view.php?id=1851

11:09 <zid> I wish my gameboy was 10:1 but it has to update all its devices every subcycle because I was too lazy to write the version with scheduling :P

11:10 <Ameisen> geist - see, what I call an interpreter itself would be 'it looks up each instruction, and executes it, each time'. It doesn't do anything else.

11:10 <Ameisen> So, performance is poor. It's interpreting things. Inbetween, you can start doing more work like precomputing function calls, dynamic recompilation, etc...

11:11 <zid> I bet mips is pretty amenable to avx tricks too

11:11 <Ameisen> my interpreter is a very dumb interpreter since it's not really intended to be used except as a fallback or for specific tasks, but it's pretty simple. It has to look up each instruction, execute it, check state, etc.

11:11 <Ameisen> the dynamic recompiler is not optimal and I don't think that I can make it such within the constraints it has

11:11 <Ameisen> I think it's... 'closer' to what you're calling an interpreter though.

11:12 <Ameisen> I don't like calling mine a JIT simply because most people seem to use that just to refer to things like tracing JITs

11:13 <Ameisen> I need instruction-level accuracy, so I can't start merging instructions together (except in certain cases that I'm looking into)

11:16 <heat> mjg: lol lol lol lol lol lol lol

11:16 <heat> lol

11:16 <heat> xd xd xd xd xd xd xd

11:16 <mjg> chill dawg

11:16 <nikolar> i think you said that already heat

11:17 <heat> Good

11:17 <zid> nikolar: Hey don't make fun of heat for his low mental clockspeed

11:17 <nikolar> lol

11:18 <mjg> heat is a arschloch

11:18 <mjg> an

11:18 <zid> UMA MUSUME, THEY WERE BORN TO RUN

11:18 <zid> That's the clockspeed limit of my brain ^

11:18 <Ameisen> zid: yeah, if you're rebuilding the binary as a whole into something new, you can do a lot of tricks. My goal has just had specific constraints and I keep butting my head into the issues those constraints cause.

11:19 <zid> Ameisen: that's a dynarec/jit, not an interp

11:19 <Ameisen> though if those issues weren't there it'd be boring.

11:19 <zid> I'm saying I bet you can avx the *interp*

11:19 <Ameisen> I'm not sure how

11:19 <heat> mjg: same as you hunny bun <3

11:19 <zid> decode multiple instructions at the same time

11:19 <zid> as one example

11:20 <Ameisen> but how are you executing them, following the mandated specification requirements during them, and also allowing the user to interrupt execution after 2 instructions with the state maintained?

11:20 <zid> mips is fixed width right

11:20 <mjg> heat: true

11:20 <zid> ???

11:20 <zid> what's that got to do with anything

11:20 <zid> you still *retire* them in-order

11:20 <Ameisen> you're just talking about instruction lookup?

11:21 <Ameisen> yeah, my approach for that is awful, I just haven't bothered to improve it because the interpreter isn't intended to be used in that way.

11:21 <zid> but when you do op = INSTR&OPMASK; reg_dst = (INSTR & REG_DST_MASK) >> REG_DST_MASK_CRAP; ...

11:21 <Ameisen> ah, so full decoding you mean

11:21 <zid> you could probably just fetch 8 instructions and do all 8 at once, generating reg_dst[8] and then just looping over those

11:21 <Ameisen> I'd have to write a new interpreter to do things like that; the current one is intended to be fully portable and relatively simple.

11:22 <Ameisen> it wouldn't be impossible to do

11:22 <zid> nikolar: I have downloaded the horseime.

11:22 <zid> It is 80GB for S01

11:23 <nikolar> why is it 80 gigs

11:23 <Ameisen> I care more about the performance of the dynamic recompiled code itself, since that's where the time is really being spent (unless the interpreter is getting used for some reason, heavily)

11:23 <zid> cus it's old so it has blurays instead of webrips :P

11:23 <nikolar> kek

11:23 <nikolar> i thought someone would've reencoded or something

11:23 <zid> yea the main 'speedup' of a JIT, even if you're non-optimizing, is that you delete all of the decoder code

11:23 <zid> nikolar: but every umapixel is important

11:24 <nikolar> good counter point

11:24 <zid> the decodes stay decoded

11:24 <zid> because you're writing them to a buffer

11:24 <Ameisen> right now, during the test run, the dynamic recompiler doesn't drop to the interpreter fully at all (until exit). It processes a handful of emulated instructions - like 30 out of 45 billion.

11:24 <Ameisen> the decoder isn't hit at all when the dynamic recompiled code is running.

11:24 <Ameisen> it only gets hit when chunks are being generated

11:25 <Ameisen> I don't think it's ever come up in a profile (though profiling this is a pain).

11:25 <zid> exactly

11:25 <zid> Hence

11:25 <zid> > the main 'speedup' of a JIT, even if you're non-optimizing, is that you delete all of the decoder code

11:25 <Ameisen> oh, I thought you meant completely unloading it in a literal sense.

11:25 <nikolar> well, if you're going really fancy, you can dynamically optimize

11:25 <nikolar> but yeah, yeeting the decoder code is the first obvious speed up

11:26 <Ameisen> There are other gains to, mainly in that I can control state sharing between instructions more precisely in generated code than I can from even the flattest C++.

11:27 <Ameisen> too*

11:27 <nikolar> eww c++

11:27 <zid> The fuck is a flattest C++ and what state

11:27 <zid> ???

11:28 <Ameisen> the registers, the state of various things (like the IP, DBP, etc), and not needing things to get pushed back into memory until necessary.

11:28 <zid> "jits are faster than interps because of the huge gain of not using flat C++"

11:28 <zid> what you just said

11:28 <Ameisen> I'm not sure how to explain what flattened code means at 6:30 AM

11:28 <Ameisen> :|

11:28 <zid> then don't pretend it's a thing we'll understand

11:28 <zid> but yea, mips doesn't get to benefit from pinned regs, rip

11:29 <Ameisen> It's a term we used in game development. Recursive inlining so your call tree is flat.

11:29 <Ameisen> along with intraprocedural optimizations, you can get some very nice code generation that way. Or very bad.

11:29 <zid> yea I'm not sure C++ workarounds are super relevent to describing a jit

11:29 <nikolar> lol

11:29 <Ameisen> They're common to C or C++, but obviously not.

11:30 <Ameisen> I'm just saying I cannot get the compiler to do a lot of things regardless of what I do when using C or C++, that you can when generating code.

11:30 <zid> It doesn't need a name in C, it's just called "The compiler will do it"

11:30 <Ameisen> the compiler is surprisingly bad at doing it a lot :)

11:30 <zid> it really isn't lol

11:30 <zid> my entire gameboy emulator is in a single god damn function

11:30 <Ameisen> especially if you need portability and have to target compilers that optimize poorly in these regards.

11:30 <zid> with labels like lto.38493

11:31 <zid> thanks gcc

11:31 <Ameisen> I'm surprised that it did that without `__attribute__((__flatten__))` everywhere.

11:31 <Ameisen> whenever I test these things with GCC, Clang, or MSVC... they really don't want to inline in a lot of cases.

11:31 <nikolar> yeah, because it was written in c

11:31 <nikolar> not c++

11:32 <Ameisen> the backends don't care if it's C or C++.

11:32 <zid> That's just how C works Ameisen, I get to define my interfaces properly

11:32 <nikolar> the backends don't care no

11:32 <nikolar> but the frontends differ, widely

11:32 <Ameisen> the optimize doens't either.

11:32 <nikolar> in the code they emit

11:32 <Ameisen> The Clang frontends for both are identical

11:32 <Ameisen> they're the same code

11:32 <nikolar> sure

11:32 <zid> C++ frontend is in a huge battle against de-virtualization and blah blah blah

11:32 <Ameisen> GCC is different, no idea about MSVC.

11:32 <zid> that C just gets to entirely skip

11:32 <nikolar> ^

11:32 <nikolar> i said they emit different code

11:33 <Ameisen> devirtualization is an optimization pass

11:33 <nikolar> not that they are different code

11:33 <Ameisen> it's not a part of the frontend.

11:33 <nikolar> either way

11:33 <nikolar> c is easier to optimize

11:33 <Ameisen> the only thing the frontend does it generate the IL, which if you're not using those features, doesn't really differ between C and C++.

11:33 <zid> C++ has to make way way more shit public

11:33 <zid> which just ruins a lot of optimizations

11:33 <Ameisen> eh? Everything is public in C++...

11:33 <Ameisen> err, C

11:33 <zid> no, basically nothing is

11:34 <nikolar> that's not the kind of "public" he's talking about i imagine

11:34 <nikolar> it's the same in c++ anyway

11:34 <Ameisen> I'm not sure what he means by 'public' in that context. Do you mean 'C++ has more features available that the compiler has to take into account'?

11:35 <zid> no

11:35 <zid> what do the C++ weenies call it

11:35 <zid> pimpl

11:36 <Ameisen> I mean, I don't see any private implementations in my code.

11:36 <Ameisen> So I'm not sure how they're relevant...

11:36 <Ameisen> I can implement the same paradigm in C anyways

11:36 <Ameisen> though it'll be worse.

11:36 <zid> right, but then they're literally the same

11:36 <Ameisen> Correct, that's my point.

11:36 <zid> if you're talking C++, that means C++ features

11:36 <zid> not C

11:37 <Ameisen> Just because you're using C++ doesn't mean that you're using all of C++'s features at all times everywhere.

11:37 <zid> C++ is x means nothing if you're just writing .c files but compiling them with -x cpp-with-preprocessor

11:37 <Ameisen> That would be stupid.

11:37 <Ameisen> I heavily use C++ features, usually templates and constexpr.

11:37 <Ameisen> I don't use the particular ones you're talking about very often because they're not really relevant to my use-cases.

11:37 <zid> great, but that doesn't change what we were talking about really

11:38 <zid> C compilers are very good at inlining C

11:38 <Ameisen> GCC is an odd one in that the frontends are different (though I haven't looked at the IL for it). Clang generates basically the same thing for comparable C and C++.

11:38 <zid> C++ (actual C++ code, not C in a disguise) is much harder to inline, because of things like having to *also* pull off devirtualization.

11:38 <zid> before the flat C re-eappears

11:38 <Ameisen> To put another way: I'm not using any features in my code that's particularly hard for the compiler to process in that sense.

11:38 <zid> re-appears*

11:39 <Ameisen> and when I do, I'd be doing something worse in C anyways.

11:39 <nikolar> optimizers are limited in the number of things they can do

11:39 <nikolar> the more things it needs to see throigh and optimize, the worse it gets

11:39 <Ameisen> the optimizer is literally the same in this case, though.

11:39 <nikolar> so c being simpler, is far easier to optmize

11:39 <zid> like if I write T<x<<"bob"::f(j &)> crap, it has to turn that into flat C, *then* apply optimizations you'd consider for C

11:39 <Ameisen> The optimizer doesn't know if you're optimizing C or C++

11:39 <Ameisen> it's optimizing IL

11:39 <zid> like inlining

11:39 <nikolar> Ameisen: i am not saying it knows

11:40 <nikolar> i am saying it needs to wade through more shit to get to the core if it's optimizing what was emitted for c++

11:40 <zid> > if I write T<x<<"bob"::f(j &)> crap, it has to turn that into flat C, *then* apply optimizations you'd consider for C

11:40 <nikolar> it doesn't need to know or care for that to be a fact

11:40 <zid> nikolar ^

11:40 <Ameisen> I mean, that's wholly untrue unless you're talking about weird contexts.

11:40 <Ameisen> Anyways, this argument is dumb and religious, and I really don't want to have it since it won't go anywhere, so I'm going to go to bed.

11:41 <nikolar> i mean you said it yourself

11:41 <zid> wholly untrue? lol. It's just a basic fact of language

11:41 <nikolar> you need to convince compiler to inline things for you

11:41 <nikolar> i am just explaining why you don't have to do that when you're working in c

11:41 <nikolar> at least not as much

11:42 <zid> nikolar: And given C++ is largely a superset, they suffer from the same things that they *can't* optimize well, generally. C++ obviously just has points of failure *on top*. Because it has more language.

11:42 <Ameisen> I should note that I have a _lot_ of familiarity with C++ and compiler optimization passes in these senses - I did a lot of work on more constrained targets like AVR specifically with C++ to understand what the compiler and optimization passes struggled with and what they didn't. The things they struggled with were there, but they were generally limited to things like `virtual` (and that _specifically_), but the equivalent constructs in C generally

11:42 <Ameisen> resulted in worse code.

11:43 <zid> Ameisen: Were any of those passes only relevent to the C++ language?

11:43 <Ameisen> exceptions also were problematic, ESPECIALLY on AVR.

11:43 <zid> Such as, for example, devirtualization

11:43 <Ameisen> I already answered your question.

11:43 <Ameisen> I specifically mentioned `virtual`, which is the only reason that devirtualization exists.

11:43 <zid> No it isn't

11:44 <zid> It just means turning a class call into a function call

11:44 <zid> bypassing the vtable

11:44 <Ameisen> and... what causes a vtable to exist?

11:44 <zid> by recognizing that the class hasn't b een inherited etc

11:44 <zid> vtables exist because that's the way you implement C++ classes, because of inheritance and stuff meaning the pointers might change at runtime

11:44 <Ameisen> That's... not correct.

11:44 <zid> you *need to do an optimization pass* to prove they *won't* change, called devirtualization, which turns them back into flat calls

11:44 <Ameisen> vtables exist (though not mandated by the spec) simply because it's a convenient way to implement virtual dispatch.

11:45 <zid> That's literally what I just said

11:45 <Ameisen> No class has a vtable if it isn't inheriting virtually.

11:45 <zid> That's literally what I just said

11:45 <Ameisen> and it's literally what I said first.

11:45 <Ameisen> if you aren't using `virtual`, devirtualization isn't relevant.

11:45 <Ameisen> It literally is a pass with nothing to do .

11:45 <zid> You're agreeing with me 100% then telling me I am wrong

11:45 <Ameisen> read what I wrote.

11:45 <Ameisen> then what you asked.

11:45 <zid> I did read it

11:45 <zid> You agree that it can elide all this crap when certain things aren't happening

11:46 <zid> You agree that these things are part of C++ and not C

11:46 <Ameisen> > but they were generally limited to things like `virtual` (and that _specifically_)

11:46 <Ameisen> which implies devirtualization

11:46 <zid> but you disagree that.. C++ has to elide it when certain things aren't happening

11:46 <zid> as an optimization

11:46 <Ameisen> that optimization pass runs whether it's C or C++.

11:46 <Ameisen> it is performed on IL.

11:46 <zid> good for it

11:46 <zid> That's a weird implmentation detail

11:46 <zid> and 100% irrelevent

11:46 <Ameisen> if your C++ doesn't contain virtual inheritance, then optimization passes don't exist.

11:47 <Ameisen> and `virtual` is actually pretty rare in most contexts.

11:47 <zid> so now the optimization DOES exist, but it's "rare"

11:47 <zid> so does it exist or not? You've had it both ways now

11:47 <Ameisen> *sigh* I'm going to sleep, this is stupid.

11:47 <Ameisen> I can't tell if you're being obstinate or if there's a language barrier.

11:57 <zid> Ameisen: https://prog21.dadgum.com/40.html

12:21 <heat> Oh nice, the not-c++ people are explaining c++

12:24 <Ameisen> heat: I had an argument on the C programming subreddit once. it was... fun. They were arguing things like 'all objects in C++ are dynamically allocated', and other things. Then they started saying things that sounded suspiciously like C#... and they linked to a page that was clearly AI copied from C# to C++.

12:24 <Ameisen> my brain was being very loud and making it hard to sleep

12:25 <Ameisen> zid: if you'd like, when my shoulder is in better shape I can write up a report about C and C++ optimization issues in this regard, since I've done a _lot_ of work into it. It might be more productive than bickering on an IRC channel.

12:26 <Ameisen> otherwise, I'm thinking of ways I can portably use things like AVX (there are generic ways to do similar) to prefetch instructions as ye suggested.

12:37 <nikolar> Depends on what you mean by probably

12:37 <nikolar> Since requiring avx limits portability

12:38 <nikolar> *portability

12:39 <heat> *portability

12:42 xvmt has quit [Ping timeout: 252 seconds]

12:46 <zid> *portability*

12:47 <Ameisen> *portabello

12:47 <Ameisen> AVX does, though there are similar-ish extensions on other archs, and there are generic SIMD libraries that can expand to it

12:51 <nikolar> simde is nice, but I don't know if it does what you want

12:55 <zid> bonus points: Disregard all the mov [ebp+reg0] stuff, keep all the mips regs in avx regs :P

12:55 <zid> There's enough regs on amd64 that I had considered making my z80 emulator just pin all its regs to real regs with register asm("r9"); type stuff, but never got around to bothering to test it

12:55 <zid> (I'd have to write thunks around the sdl code to conform to the C abi again, so I couldn't just do it as a one-liner)

12:56 <nikolar> How many registers does z80 have

12:56 <zid> real z80 has more cus of IX/IY and some dram refresh reg and things, but gbz80 has AF, BC, DE, HL, SP, PC

13:04 Turn_Left has joined #osdev

13:08 Left_Turn has quit [Ping timeout: 265 seconds]

13:10 Left_Turn has joined #osdev

13:10 <Ameisen> zid: most things I've seen have suggested that if you can keep your register file in a cache line well-enough, it will be faster than trying to insert/extract things from SIMD registers.

13:11 <zid> yea for sure

13:11 <zid> It is however, hilarious

13:13 Turn_Left has quit [Ping timeout: 244 seconds]

13:26 <nikolar> That's a good argument

13:27 <heat> you're making me want to work on eBPF again :/

13:29 <nikolar> Because it's hilarious?

13:30 <heat> because i get to work on a jitter

13:30 <heat> i have a cBPF implementation and x86 jitter i'm yet to integrate in the kernel

13:31 <nikolar> cbpf was the earlier one right

13:32 <heat> yeah

13:32 <heat> eBPF has lots more regs, and shit like atomics

13:32 <heat> and if you want to be real correct, you need a verifier as well (though, lol, i disagree that you even need it)

13:33 <nikolar> Lol

13:33 <heat> as far as I understand the eBPF verifier is kind of a leftover from the times where they thought unprivileged eBPF would be fine

13:34 <heat> and now only exists because ring 0 doesn't imply code loading privileges as well

13:35 <heat> besides helping prove correctness, but...

13:52 <nikolar> Interesting

13:54 ZetItUp has joined #osdev

14:18 xvmt has joined #osdev

14:22 karenw has joined #osdev

14:29 Left_Turn has quit [Remote host closed the connection]

14:29 Left_Turn has joined #osdev

15:13 c0co has quit [Ping timeout: 240 seconds]

15:17 jedesa has quit [Remote host closed the connection]

15:45 Turn_Left has joined #osdev

15:48 Left_Turn has quit [Ping timeout: 245 seconds]

16:10 karenw has quit [Ping timeout: 240 seconds]

16:10 kata has quit [Ping timeout: 240 seconds]

16:14 kata has joined #osdev

16:27 kata has quit [Remote host closed the connection]

16:29 kata has joined #osdev

16:30 kata has quit [Remote host closed the connection]

16:33 kata has joined #osdev

16:47 kata has quit [Read error: Connection reset by peer]

16:49 kata has joined #osdev

16:50 <Ermine> "jitter"

16:50 <Ermine> that scares real time streaming people...

16:51 <kof673> :D double meaning

16:52 * kof673 awards meta points

16:53 <kof673> a jit is liable to create jitter

17:08 <kof673> as well as jitterish

17:27 msv has quit [Remote host closed the connection]

17:28 msv has joined #osdev

17:29 msv has quit [Remote host closed the connection]

17:29 msv has joined #osdev

17:55 Turn_Left has quit [Read error: Connection reset by peer]

18:01 c0co has joined #osdev

18:13 Left_Turn has joined #osdev

18:34 k0valski18891621 has joined #osdev

18:44 teejay has joined #osdev

18:49 Turn_Left has joined #osdev

18:52 Left_Turn has quit [Ping timeout: 244 seconds]

19:07 EmanueleDavalli has joined #osdev

19:08 EmanueleDavalli has quit [Client Quit]

19:20 vdamewood has joined #osdev

19:23 EmanueleDavalli has joined #osdev

19:32 TkTech9 has joined #osdev

19:34 TkTech has quit [Ping timeout: 272 seconds]

19:34 TkTech9 is now known as TkTech

19:44 karenw has joined #osdev

19:44 EmanueleDavalli has quit [Quit: Client closed]

20:07 mrpops2ko has quit [Ping timeout: 272 seconds]

20:09 mrpops2ko has joined #osdev

20:16 mrpops2ko has quit [Ping timeout: 240 seconds]

20:18 mrpops2ko has joined #osdev

20:25 mrpops2ko has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

20:27 GeDaMo has quit [Quit: 0wt 0f v0w3ls.]

20:30 tigerbrother has quit [Quit: Ping timeout (120 seconds)]

20:30 tigerbrother has joined #osdev

20:36 Turn_Left has quit [Read error: Connection reset by peer]

20:49 Turn_Left has joined #osdev

21:17 xvmt has quit [Ping timeout: 240 seconds]

21:18 xvmt has joined #osdev

21:28 netbsduser has quit [Ping timeout: 252 seconds]

21:33 SophiaNya has quit [Remote host closed the connection]

21:33 ptrc has quit [Remote host closed the connection]

21:33 SophiaNya has joined #osdev

21:33 ptrc has joined #osdev

22:55 [Kalisto] has quit [Ping timeout: 244 seconds]

23:00 <Ameisen> heat: this is roughly what some of the generated instructions look like: https://pastebin.com/tLfTJxAw

23:01 <Ameisen> in a store, for instance, most of the time is spent just validating things and checking if an exception needs to be thrown, which sucks.

23:03 <heat> i would try to move those thunks to a designated place

23:03 <Ameisen> though I really need to break out of getting stuck in the weeds, since there are actual features I need to add and bugs I need to fix, and the little performance work is just very addictive but not very useful.

23:03 <Ameisen> the thunks themselves are, those are just the jumps to them

23:03 <heat> you're jumping over the thunks though?

23:04 <heat> ah ok i see

23:04 <heat> well that's still a thunk :p

23:04 <heat> also, proper register allocation would be great -- though, yes, hard

23:04 <Ameisen> it's sorta a thunk. The thing they're jumping to is a thunk outright, though

23:04 <Ameisen> yeah, it's more difficult because of the fact that the instructions need to stay discrete. I have an idea for how to do it, but it's going to take a lot of work.

23:05 <Ameisen> and I'm not 100% sure it will always be beneficial.

23:05 <heat> you gotta test

23:05 <heat> sometimes with performance work you spend a lot of timing doing something just to figure out it isn't worth it

23:05 <heat> such is life

23:05 <Ameisen> yeah. I know that my idea, if there's a loop that jumps across chunk boundaries or even just jumps into a weird place in the same chunk, and it does so a -lot-, will make it worse.

23:06 <Ameisen> I can easily reserve/maintain register-cached values across the chunk itself, assuming linear execution. I just have to push/pop that state on a jump.

23:06 Lucretia has quit [Remote host closed the connection]

23:06 <Ameisen> I suspect that allowing for hot paths in code (allowing optimized cross-instruction paths where there's no instruction count or jump hazard) will be more beneficial

23:07 <Ameisen> sorta-almost tracing

23:07 <Ameisen> but the performance of it right now is acceptable, I really need to work on getting a few things implemented.

23:07 <Ameisen> the biggest one is getting the ability to update chunks, and thus invalidate patches, in so that self-modifying code will work right.

23:08 <Ameisen> there's also two edge cases associated with that (needing to add/remove delay branch flags from the start of the _next_ chunk if relevant, and also handling when the chunk triggered its own update, which means that I have to `ret` to somewhere else)

23:09 <Ameisen> then I need to fix a performance issue regarding jumping to invalid memory, and implementing LL/SC properly.

23:09 <Ameisen> then it should be stable

23:09 c0co_ has joined #osdev

23:10 <heat> LL/SC? yikes

23:10 <Ameisen> yeah. They don't behave correctly right now.

23:10 <Ameisen> they just act like normal load/stores. I am probably going to do the most relaxed version the spec allows me to do.

23:10 <Ameisen> that's basically 'if there was a store anywhere, the linked operation fails'

23:11 <Ameisen> the spec does in fact allow me to do that

23:12 c0co has quit [Ping timeout: 240 seconds]

23:17 <Ameisen> regarding that jmp to the thunk - in equivalentish code, clang keeps it in the middle, gcc puts it at the end. They both generate very similar code to me, at least.

23:49 PapaFrog has quit [Ping timeout: 244 seconds]