Re: How does LuaJIT's trace compiler work?

From: Mike Pall <mike-1311@xxxxxxxxxx>

To: luajit@xxxxxxxxxxxxx

Date: Fri, 29 Nov 2013 14:50:32 +0100

Robin Heggelund Hansen wrote: > From what I understand, LuaJIT's JIT doesn't compile hot methods like > Java's HotSpot does, it compiles hot paths originating from loops. Does > this mean that if something doesn't originate from a loop (say, I call Lua > functions from the C-api) then the code will never be jitted? Function entries can be trace start points, too. But the hotspot detection and the region selection prefers loops. > And what happens when you hit another loop? Will the path to the > second loop be JIT'ed, and then a new path from that loop jitted > as well, or will the second loop be a part of the same path? Depends. If the inner loop has a low iteration count, it'll be unrolled and inlined. For most loop nestings there'll be a single trace for the outer loop that goes 'around' the inner loop and joins its exit with its entry. Otherwise traces link to each other in linear pieces. > How does the interpreter choose the most optimal hot path? The interpreter doesn't choose a path. Region selection uses probabilistic algorithms and lots of intertwined heuristics to select when and where to start a trace, whether to follow it and where to end it. It's really, really hard to describe this adequately in prose and not in code. IMHO LuaJIT has pretty good facilities for looking at the generated traces (-jv and -jdump). Try this with some carefully crafted examples and check the source code for details. There are a few surprises on your way. > Let's say I have a hash-table of ints -> strings. Now imagine > that I've called table[x] with x being 3 and 5 enough times that > they've become hot paths and jitted, how does the interpreter > decide which jitted code to call for table[x] where x is 4? Umm, no. Indexing with non-constant integers is not specialized directly (the pay-off is too low). There's specialization for tons of other things, e.g. function calls. This may indirectly lead to specialization of other instructions, of course. Specialization works by adding checked assertions to the IR. A new side trace is spawned if an assertion turns out to be wrong often enough. That one either specializes to other pre-conditions or generalizes, depending on the circumstances. This leads to a graph of traces that probabilistically approach a local optimum. Similar to how a C switch statement is turned into a decision tree. Except that a JIT compiler has access to runtime path statistics (indirectly, via probabilistic hotspot detection) and a trace compiler can ignore paths that are never taken. This often makes a decision tree a better choice than a jump table (which is mainly beneficial for a large equi-weighted fan-out or in absence of known weights). [ A recent example illustrates the power of this approach: Cloudflare's WAF (web application firewall) basically generates Lua code for the (highly non-linear) maze of firewall rules. An incoming attack triggers certain rules and the corresponding paths are turned into linearized traces. These can be heavily optimized by LuaJIT, much more so than you could ever hope to do with a static compiler. When a different attack wave is started, it'll spawn different traces -- i.e. the generated code dynamically adapts to the specific attacks. The result is a much higher firewall throughput than you might be able to achieve with a data-driven or a static compilation approach. This directly equates into money saved on machines needed to handle the load. ] > Another thing that has been racking my brain. Since paths are compiled, not > functions, won't a trace compiler require more memory? Nope, a trace compiler generates much, much less code than a method-at-a-time compiler. Only a fraction of the entire program is compiled and traces are usually rather short, too. > Since you can't really re-use compiled code of another path I > mean, and since paths will probably be larger than single > functions in the general case... There's some amount of trace overlap and code duplication. But that's often beneficial due to better optimization opportunities. The effect on code density is rather minor in practice. --Mike

References : How does LuaJIT's trace compiler work? From: Robin Heggelund Hansen

:

Other related posts: