BarrensZeppelin has quit [Ping timeout: 260 seconds]
jcea has joined #pypy
jcea has quit [Ping timeout: 252 seconds]
<antilisp-dev>
What are the limits of the RPython JIT ? If a give a hashtable to every object of my language (for attributes), will the JIT optimize what can be optimized depending on the keys used (like v8 with hidden classes) ?
<antilisp-dev>
(i'm trying to see what's the minimal amount of code i need to implement an object system, without being too slow)
<korvo>
"It depends" is the standard answer. In your case, a hash might be the wrong data structure for speed. Access to a hash is amortized constant but with a large startup factor; the input has to be hashed and possibly dereferenced if it's some sort of generic object.
<korvo>
For example, PySOM has multiple different object layouts. An object without any fields is specially allocated. An object with five or fewer fields is specially allocated too. An object with exactly one field is special-cased; its method behaves like a function.
<korvo>
Typhon has a bit of this too, but it also has one single high-speed JIT-friendly lookup path: methods/attributes are in a *list* and a linear scan is used for lookup. This ends up being much faster than the hash for typical objects *and* the JIT has a better time with it.
<korvo>
Note that, in all cases and for all interpreters, you have to be able to promote() your code objects to constants, including your method/attribute wrappers if they have executable methods. The JIT doesn't know that a method/attribute has a constant behavior; it assumes that anything can happen.
BarrensZeppelin has joined #pypy
BarrensZeppelin has quit [Client Quit]
<korvo>
Oh, one more thing. In order to make `is` work for super-fast comparisons, the selector/atom for the lookup must be internable and unique. You'll end up building something like https://github.com/monte-language/typhon/blob/master/typhon/atoms.py just to provide that O(1) comparison. The @elidable_promote() annotation tells the JIT that you're not going to be nefarious with that global dict; it's literally a cons-hashing cache.
BarrensZeppelin has joined #pypy
<antilisp-dev>
thanks. it seems i really need to study the JIT more before i can make things fast with it
<antilisp-dev>
i think i won't need too much performance on interpreted code because it's beyond the scope of this experiment
<antilisp-dev>
(one of the things i am interested in with this language is opt-in static analysis and self-compilation, but this would be done by the hosted environment rather than the rpython jit)
<antilisp-dev>
(if i ever reach the point where i can do it)
<korvo>
(You'll need to think about how your fast JIT-friendly interpreter loads code; it'll have to be able to either load or extend code at runtime.)
<antilisp-dev>
yes, that's the plan. i heard mmap() can do something like that, and i confirmed libffi works from rpython. No idea how to approach codegen yet, but i'll see
<antilisp-dev>
i'm still stuck at deciding how to mix lisp and smalltalk without breaking both
<korvo>
In my most recent high-performance language, Cammy, I compile to a basic stack machine (the CAM, as used in OCaml) and so I have a list of bytecodes. In order to dynamically load code, I have the compiler set up so that it *appends* bytecodes to a list. Then I lie to the JIT; I promise never to change the list, but sometimes I extend it without changing the existing contents.
<antilisp-dev>
why does the JIT not break ? shouldn't it get errors (like being out of bounds) when you try to access the appended part ?
<korvo>
The compiler also has to be kept around at runtime; it's got jump tables and a few other bits of internal state that are outside the list of bytecodes.
<korvo>
Older code can't directly jump into newer code. Each time I change the list, I'm only adding newer code at the end. The older code at the beginning only points to itself.
<antilisp-dev>
so it's done like the caml declaration system where new declarations shadow older ones ?
<korvo>
A Forth might do something similar. Forth dictionaries are usually append-only and the stack only grows "down" over time, so it's usually safe to generate fresh code by appending it to the "end" of the allocated memory.
<korvo>
Oh, Cammy doesn't have declarations. It doesn't have variable names either. A lot of those concerns just don't exist. The main reason that code might get loaded at runtime is that Cammy has builtin support for two-tiered staging and evaluation; Cammy can do high-level codegen.
<antilisp-dev>
but forth has a dictionary, so old words can call new words even if they don't exist
<korvo>
But yes, if you were compiling ML or SML or OCaml then you'd just shadow the global dictionary with each new declaration. Same for Forth, usually.
<korvo>
Not direct calls. Let's say that a call is *direct* if we compile its literal address into the bytecode and directly jump to it, and *indirect* if there's a handle for the address and we have an indirect jump. Older code can't directly call newer code.
<korvo>
Like, let's say our code is a list `bc`. We might have two versions of it, v1 and v2, so that `bc[:v1]` and `bc[:v2]` are two slices. If v1 comes before v2, then `v1 < v2`, and `bc[:v1] == bc[:v2][:v1]`. The trick here is that no direct call in v1 accesses code outside v1, even if there's code for v2 too.
<korvo>
Any code compiled during v1 only knows about other v1 code. It might have the ability to make a generic indirect call that can go anywhere, and that has to be compiled to something that takes the indirect address as a runtime value.
<antilisp-dev>
is there a performance benefit to using a bytecode compiler ? I thought the rpython JIT made AST walking interpreter as performant
<antilisp-dev>
i think i understand some parts, but deceiving the JIT to optimize a stack machine is far beyond my level
<korvo>
The JIT does generate excellent code for ASTs, as long as the AST is constant. Bytecode is often faster because it is *packed*, while ASTs involve pointer-chasing. A packed AST is possible too; I've done that and it performed okay.
<korvo>
Fundamentally, the JIT generator doesn't operate on an interpreter. It operates on a while-loop with some local variables. For example, this interpreter for a FizzBuzz-solving language, DIVSPL, generates extremely good machine code without an AST or bytecode. https://github.com/rpypkgs/rpypkgs/blob/main/divspl/divspl.py
Dagger has quit [Ping timeout: 250 seconds]
<korvo>
Unrelated: Is there a good way to turn an rffi ptr into a const ptr? I have a callback that used to take a `struct whatever_t*` but now it takes a `const struct whatever_t*`. It looks like the render_as_const hint only applies to arrays, not ptrs?
<antilisp-dev>
i don't even know why something would need to be const in C. Does the code run in python ?
<antilisp-dev>
i know rffi mixes pointers and values
<antilisp-dev>
(Ptr(Struct()).c_attr == Struct().c_attr in rpython)
<antilisp-dev>
so i wouldn't be surprised if const only applies to values
<antilisp-dev>
btw, would red black trees (binary trees) be more efficient than hashtables for the JIT ? or is it still too dynamic ?
<korvo>
It's a GCC error from -Wincompatible-pointer-types. note: expected 'uv_read_cb' {aka 'void (*)(struct uv_stream_s *, long int, const uv_buf_t *)'} but argument is of type 'void (*)(uv_stream_t *, Signed, uv_buf_t *)' {aka 'void (*)(struct uv_stream_s *, long int, uv_buf_t *)'}
<korvo>
No, the builtin dict is *very* efficient and you probably can't beat it. You might consider implementing RB trees if you plan to handle untrusted keys and you need to ensure worst-case performance.
<antilisp-dev>
thanks, i already have RB trees implemented from antilisp, where i used trees for immutable lists (instead of lisp cons cells) so i could have copy pasted if that was more efficient
BarrensZeppelin has quit [Quit: BarrensZeppelin]
<korvo>
antilisp-dev: The main issue with dicts is that the JIT won't ever look inside the guts of the dict lookup. Either the key is "green" or "red"; greens are constant to the JIT during the trace, while reds are unknown variables. If the dict and key are both green then the JIT can sometimes remove the lookup entirely, without looking inside.
<korvo>
In contrast, lists and strs are builtin low-level objects and the JIT knows when they are immutable. If a list and index are both green then the JIT *definitely* removes the lookup; it's just constant propagation.
<korvo>
This will make more sense if/when you start to read JIT traces. The trace explains what operations the JIT is compiling, and this is how we learn which operations we want to remove.
jcea has joined #pypy
<korvo>
...Maybe I figured out rffi? I used hints={"nolength":True,"render_as_const":True} and added a [0] to the lone spot where I actually dereference the pointer. Now I'm getting totally different errors from a different module.
<antilisp-dev>
"average rffi hackery". when in doubt, remember that VOIDP is actually char*
<antilisp-dev>
i think the rffi is one of the parts that would really need a refactor. At least to get a unique definition of C types (instead of lltypes.Void, rffi.VOIDP being char*, jit_libffi having its own definitions, and rpython having r_* types)
glyph_ has joined #pypy
Dagger has joined #pypy
glyph has quit [Read error: Connection reset by peer]