Lua internals



Alright, lets start from some info about Lua internals.



As for many other script languages, each mention of some variable in some Lua expression is actually taking time to get its value. For example, if I write following code:

Language: Lua for i = 0 , 10 do

print ( i )

end

It takes some time to get value of i and pass it into print. More important that here print is also variable. It's so called global variable . Here it's storing API function provided by environment. So, getting print function is also taking time. Again, more important, that taking here value of print is much slower than taking value of i. Because i here is local variable .



Local variables is working faster than global variables, mostly because local variables doesn't have name in runtime, instead of that, they have indexes. And all actions "take local variable value" is just getting value by index, which is fast. On the other hand, global variables are stored in _G table. It's global table for all global variables. And getting some global variable is essentially taking value from table by its name.



So following code is same:

Language: Lua _G [ "print" ] ( "qwe" )

print ( "qwe" )

And because indexing by string is more harder task, including hash calculations if it's using hashtables as implementation, or other stuff - it works much slower than just getting value by index. This means that this stupid-looking code is actually works faster:

Language: Lua local print = print -- store global value into local

for i = 0 , 10 do print ( i ) end -- use local variable

Now, a bit about print itself. As I said, it's API function provided by environment.

Each time you call environment function, Lua doing following:

Lua->API->Environment

1) Lua calls API function provided by environment

2) API function pulling some info about: arguments count, their types

3) API function does some sanity checks about call correctness

4) API function does actual work with environment internals

Lua<-API<-Environment

5) API function is taking results from environment

6) API function is pushing results to Lua

7) Lua continues execution of Lua code

So, basically each environment API call consists of two translations: from Lua to Environment and backwards. Even though you can't avoid calling API, you may try to reduce count of calls, to reduce unneccessary calls to improve speed.



Case



We want to draw collision data in 2D platformer. In this post I'll talk mostly about this case.

What do I mean by "collision data"? Simple: if some object is changing behavior when it's touched <- it's "collision data". As for now, I'll talk only about tile based 16x16 pixel collision data.



Alright. We have 16x16 collision blocks, they have pixels, each value of pixel may have different meaning, like: ground, ceil, wall, ladder, and so on. It's very handy to draw overlay over game screen where each collision pixel is colored by its color. But how code would look like? It would be something like this:

Language: Lua for x = ? , ? do -- here ? is some expression calculating bounds for x

for y = ? , ? do -- here ? is some expression calculating bounds for y

local tile = get_collision_tile ( x , y ) -- to avoid finding corresponding tile 16*16=256 times

for i = 0 , 15 do -- 16 by y

for j = 0 , 15 do -- 16 by x

local v = get_collision_data ( tile , i , j ) -- get collision pixel

if colors [ v ] ~= nil then -- if it has color, then you want to draw here

gui . drawPixel ( x + i - scrollx , y + j - scrolly , colors [ v ] ) -- scrollx, scrolly is camera position

end

end

end

end

end

If colors[v] is nil this means that v value doesn't affect behavior at all, in other words: nothing to draw. For example you should have v for air, and you'll set colors[air] = nil, to avoid draw every space where you can walk freely.





(Demolition Man for Sega Genesis)



This code above is actually already optimized. It'll only translate values to colors and position relative to tile into global position. Nothing to optimize, right? No!



First optimization that is often done is: process only tiles that is on screen. It's done by more carefully cooking ranges of first two for loops. Example is something like that:

Language: Lua local left , right , top , bottom = scrollx , scrollx + 320 , scrolly , scrolly + 224

for y = max ( 0 , floor ( top / 256 ) ) , min ( hblocks - 1 , floor ( ( bottom - 1 ) / 256 ) ) do

for x = max ( 0 , floor ( left / 512 ) ) , min ( wblocks - 1 , floor ( ( right - 1 ) / 512 ) ) do

I don't want to explain it in details, but idea is simple. x and y should take values of block if and only if some part of it is visible on the screen. For 320x224 screen as in my case it will process at most 21x15=315 blocks. Also, for each tile we can calculate sx, sy = x-scrollx, y-scrolly and use them in gui.drawPixel call reducing few operations: get value of scrollx, scrolly, and two subtractions.



Now, what is next? Next is what is all lua internals part was about. What API environment function do we see here in code above? It's gui.drawPixel. It should be something about it. Nope. (actually yes but... later)



We are calling get_collision_tile and get_collision_data. But wait, it's not API environment functions. It's our custom made functions for reading game data. Yep. But what it has inside? It has multiple calls memory.read_something calls, each of them is environment API call. First, you should make local variable for them, to avoid looking for them in global table _G every time you want to read something from memory.



Time to remind our goal: we want to reduce count of API calls. Here API calls are memory.read_something. How can we get rid of them? Answer is simple: read level only once, and store its whole structure on Lua side! But how can we do that? Just each time when you want to draw collision overlay, check for current level. If it changed, forget all data about previous level, and load whole level into Lua again. It'll require time only when you change level. After that, code of get_collision_tile and get_collision_data should not have any API calls .



After you did that, you may think: wait, why do we always convert collision pixel value(v) into its color? (colors[v]) Yes, now we can "bake in" collision color instead of collision value straight into collision tile data!



What's next? You may notice that many of collision tiles are just empty. Full of air. (wait, you said empty!) So, you don't want to spend time on those tiles. You may make some table of empty tiles and before processing 16x16 pixels check is it in list of empty. If it is, then skip processing. Note: when I say "check in list" - I don't actually mean that you should have it as some kind of list. Best way to implement that is just one big table, and if value at index of tile is true then it is empty. So, code is just

Language: Lua if empty_tiles [ tile_id ] then continue end

You may came up with some similar ideas, like store table of full_somevalue tiles, and so on...



Well, it's way long introduction to actual new approach.



Baking time!



What if we would be able to make list of calls for each collision tile? For example, if it has 16 non-empty pixels, why not to store function with its arguments into list, and then just execute it? Fortunately, we able to do that! Like this:

Language: Lua calls_list = {

{ gui . drawPixel , 5 , 7 , 0xFFFF00FF } ,

{ gui . drawPixel , 10 , 10 , 0xFF00FFFF }

}

for i = 1 , # calls_list do

local b = calls_list [ i ] -- single list entry

b [ 1 ] ( x + b [ 2 ] , y + b [ 3 ] , b [ 4 ] ) -- where x, y is shift to align with camera position.

end

Using this technique, you may bake all collision tiles into lists of calls, during level loading on Lua side.



Now we have solution for making and running call lists. How about optimizing call lists?



What else do we have besides gui.drawPixel? We have gui.drawLine! Can we utilize it? Of course we can!



You may think of gui.drawLine as batch of gui.drawPixel with same color. So, if we able to split collision pixels into bunch of lines, then we can draw them instead. It will reduce count of API calls and thus, it'll improve performance of script.



How can we do that? For example like this. If color of previous line is same as next pixel, and there is no gap in between - just increase length of line. Here is a picture:

https://i.imgur.com/k7Xcw9e.gifv



Also, good implementations of drawLine method is doing sanity checks only in the beginning, and drawing all pixels straight ahead. This is done using so called "clipping". Original line reduced to segment of it that is on screen, and only pixels of that segment is efficiently enumerated, without checks for leaving bounds. On the other hand, each call of drawPixel does check weather position is on screen or out of bounds. But drawLine does it once for the whole line.



Next, you may think of gui.drawBox as batch run of gui.drawLine. Can we utilize it? Of course we can! Similarly to lines, we will look for previous line, and if it has appropriate starting and ending position, then we will convert it into box. Here is a picture:

https://i.imgur.com/PxeTKei.gifv



Similarly, good implementation of drawBox does coordinates check only once. It's done by "clipping" - transforming coords passed to the method into new coords - part of box that is on screen. And no additional checks required during fill.



Now we have a small trouble. gui.drawPixel, gui.drawLine, gui.drawBox have different arguments. To handle variable count of arguments for functions, you would need some workarounds. You may want to use unpack function, but you need to add camera shift to coordinates, so it's not an elegant way. You may check vs function that you want to call, and then call it as intended. You may make supplementary functions that use always four coordinates and then calling API function, but this will lead to unnecessary addition for coords that is not used. gui.drawPixel for example. Also, gui.drawBox has two colors: fill and outline.



Checks like that, intermediate functions, and other stuff may reduce gain. Perhaps we can "bake" even more?



We need to bake more!



Even though Lua is interpreter, each time you include some file using require, it's actually compiling code that is loaded in some kind of intermediate language code. This step is required to have decent performance. Otherwise you'll end up spending most of the time parsing each line over and over again.



This means, in theory we can generate .lua files and then load them using require, with all tiles recompiled into functions, and then simply call that functions. But in practice, there is better function. There is loadstring. It is loading code pretty much like require does, but is loading code straight from the string! Also, it doesn't check "was it loaded already?" which is annoying part if you would use require because you would have to always generate new name for generated file.



That's it! Just generate Lua code for each collision tile, and then simply call it like this:

Language: Lua cdata [ idx1 ] ( sx , sy ) -- shift x, shift y to align with camera



And, to make functions work fast, you should make local variables for functions, like this:

Language: Lua local gui_box , gui_line , gui_pixel = gui . drawBox , gui . drawLine , gui . drawPixel



So, for the following picture:

https://i.imgur.com/4kmJbBK.gifv



Code is following:

Language: Lua local gui_box , gui_line , gui_pixel = gui . drawBox , gui . drawLine , gui . drawPixel

return function ( x , y )

gui_pixel ( x + 0 , y + 0 , 16777113 )

gui_box ( x + 1 , y + 0 , x + 2 , y + 1 , 16777113 , 16777113 )

gui_pixel ( x + 2 , y + 2 , 16777113 )

gui_box ( x + 3 , y + 0 , x + 4 , y + 3 , 16777113 , 16777113 )

gui_pixel ( x + 4 , y + 4 , 16777113 )

gui_box ( x + 5 , y + 0 , x + 6 , y + 5 , 16777113 , 16777113 )

gui_pixel ( x + 6 , y + 6 , 16777113 )

gui_box ( x + 7 , y + 0 , x + 8 , y + 7 , 16777113 , 16777113 )

gui_pixel ( x + 8 , y + 8 , 16777113 )

gui_box ( x + 9 , y + 0 , x + 10 , y + 9 , 16777113 , 16777113 )

gui_pixel ( x + 10 , y + 10 , 16777113 )

gui_box ( x + 11 , y + 0 , x + 12 , y + 11 , 16777113 , 16777113 )

gui_pixel ( x + 12 , y + 12 , 16777113 )

gui_box ( x + 13 , y + 0 , x + 14 , y + 13 , 16777113 , 16777113 )

gui_pixel ( x + 14 , y + 14 , 16777113 )

gui_line ( x + 15 , y + 0 , x + 15 , y + 15 , 16777113 )

end



So, whole problem is reduced to following task: how to split picture into minimum count of axis aligned rectangular areas of same color?



Depending on your solution of this task, you may gain more speed.



In the end, if you have way to draw pictures, you may "bake" each collision tile into a picture, and then draw it instead. But solution discussed above works even if you can't draw picture, or if it is slow.



Ahhh. Here is part of code that is actually implementation of algorithm above:

And, according to pictures above, atm it has some bug with building blocks not as intended. Anyway, it works. Alright, lets start from some info about Lua internals.As for many other script languages, each mention of some variable in some Lua expression is actually taking time to get its value. For example, if I write following code:It takes some time to get value ofand pass it into print. More important that hereis also variable. It's so called. Here it's storing API function provided by environment. So, gettingfunction is also taking time. Again, more important, that taking here value ofis much slower than taking value of. Becausehere isLocal variables is working faster than global variables, mostly because local variables doesn't have name in runtime, instead of that, they have indexes. And all actions "take local variable value" is just getting value by index, which is fast. On the other hand, global variables are stored intable. It's global table for all global variables. And getting some global variable is essentially taking value from table by its name.So following code is same:And because indexing by string is more harder task, including hash calculations if it's using hashtables as implementation, or other stuff - it works much slower than just getting value by index. This means that this stupid-looking code is actually works faster:Now, a bit aboutitself. As I said, it's API function provided by environment.Each time you call environment function, Lua doing following:Lua->API->Environment1) Lua calls API function provided by environment2) API function pulling some info about: arguments count, their types3) API function does some sanity checks about call correctness4) API function does actual work with environment internalsLuahttps://gist.github.com/realmonster/93090d4ef0b40c1b88d62f8840fad675/5310f3bf8463ae6e0a766e4608f268fd28762184#file-demolition-man-lua-L107 And, according to pictures above, atm it has some bug with building blocks not as intended. Anyway, it works.



Last edited by r57shell on 2018-12-16 09:27:43; edited 1 time in total