comefrom

After reading about a function decorator that rewrites the bytecode to enable goto in Python , I realized a very similar technique could be used to manipulate the bytecode in luajit. Of course, being the superior language, Lua already has goto, so for this example we need to add even more advanced control flow,

The interface is much the same. Wrap the function with a rewriting function (decorator) and add keywords by using Lua’s shortcut syntax for string function calls.

local comefromulate = require "comefrom" local fn = comefromulate ( function ( x ) comefrom "end" if not x then return end label "start" print ( "where'd it go?" ) comefrom "start" print ( x ) x = nil label "end" end ) fn ( "hello" )

This will print “hello”.



notes

The Lua syntax/grammar is both easy and hard to hack on. The shortcut that allows calling a function makes it look pretty clean. On the other hand, Lua’s rules to prevent errors often get in the way.

A return statement is only valid at the end of a block, which means if the control flow introduces new blocks, they may not behave as such without if true return end wrapping.

Lua draws a hard line between statements and expressions. New keywords had best look like function calls in order to qualify as statements.

The jit.bc module has some useful functions in it, particularly dump. The bytecode documentation exists, but it’s pretty spartan. I found it best to use the “official” dump function and write my own and compare results.



implementation

We need to turn the function into a string, then parse the output, and return our newly rebuilt function.

local function comefromulate ( fn ) local d = string . dump ( fn ) local r = sx . reader ( d ) local bcpos , numbc = readhdr ( r ) local newbc = patch ( fn , r , numbc ) d = d : sub ( 1 , bcpos ) .. newbc .. d : sub ( 1 + bcpos + numbc * 4 ) return loadstring ( d ) end

The header has lots of fields in it, but we aren’t interested in most of them. It’s slightly wasteful, but we’re going to make the new bytecode exactly the same length, so we can weave it back in without patching up everything. All the offsets will remain as they were.

The key fact that makes everything work is that comefrom "there" is always turned into the same three bytecode sequence. A load of a global, comefrom, a load of a string, “there”, and then a call.

The first pass of patch just reads the bytecode stream and turns it into an array for later modification. We watch for comefrom as we go.

for i = 1 , num do local op = byte ( r ) local a = byte ( r ) local c = byte ( r ) local b = byte ( r ) local d = b * 256 + c if op == 52 then local s = jutil . funck ( fn , - d - 1 ) if s == "label" then waslabel = i end if s == "comefrom" then wascomefrom = i end end if op == 37 then local s = jutil . funck ( fn , - d - 1 ) if waslabel == i - 1 or wascomefrom == i - 1 then labels [ i - 1 ] = s end if wascomefrom == i - 1 then if targets [ s ] then error ( "already defined comefrom " .. s ) end targets [ s ] = i + 1 end end newbc [ i ] = { op , a , c , b } end

We can’t do anything about a label or comefrom until we see the string, so we use a kind of backwards looking peephole whenever we see a string load. Then we check if the previous instruction was also of interest. (Could also peek ahead, but it’s a little easier this way.)

For labels, we record where they are so they can be patched. For comefrom, we map from name to location. Then we patch.

for i , l in pairs ( labels ) do local targ = targets [ l ] or i + 2 local d = targ - i + 0x8000 local b = math . floor ( d / 256 ) local c = bit . band ( d , 0xff ) newbc [ i ] = { 84 , 99 , c , b } end for _ , l in pairs ( labels ) do targets [ l ] = nil end for l , _ in pairs ( targets ) do error ( "can't comefrom " .. l ) end

After fixing up all the labels, we even check to make sure there’s no attempt to comefrom nowhere. You are allowed to have extraneous labels, though. Note that a fair bit of the original bytecode (string load and call) is left behind. Only the first instruction gets changed to a jump.

JMP takes a free slot argument, but I’ve no idea what to use here. I picked 99.



todo

Lua’s goto is only half a keyword. In statements like goto "there" it’s actually compiled as a function, meaning it would be possible to use the label syntax above. One could even goto a label, and then immediately comefrom it.

Computed comefrom. The argument to label should support variables, which will then jump to the right comefrom.

label ( x ) comefrom "good" rv = 1 comefrom "bad" rv = - 1 comefrom "indifferent" rv = 0

It’s not as ridiculous if it’s spelled switch(x) case "val" .

It may be possible to permit cross function comefrom using the debug library’s introspection to match up lonely labels with eligible comefroms.

Support for variable arguments to comefrom could also be added. Useful for implementing the devil’s control flow.



code

Don’t feed the animals.