Stack-checking a program that will execute in orbit

(September 2011, posted on Reddit and Hacker News Things change.

For those of us that were raised with Z80 CPUs, it is abundantly clear that we are living in "times of plenty"; most developers no longer have to think about things like "fitting into memory" or "having enough stack space". Moore's Law provided orders of magnitude of improvements in both CPU speeds and memory sizes; so big in fact, that resource-wasting virtual machines dominate our everyday software, and we don't even feel it.

There are always exceptions, however: platforms where you still need to be careful about memory and CPU usage. Especially when the device operates in a critical environment (e.g. in orbit), you need to be very careful.

For the last 5 years I've been developing code for the European Space Agency, in the context of projects like TASTE; the target platform is an embedded Leon processor, an open-source SPARC derivative specifically designed for operating in space. You really don't want to run out of memory there - "blue screens of death" are too expensive when the nearest ground station operator... is some 36.000 km below :‑)

Heap and Stack

malloc

First of all, you can't use heap - not when you're in orbit. Why? Because quite simply, if you do use heap, andreturns null... what do you do? You can't ask the user for help - there are no human operators aboard the satellite, and telecommands sent remotely need a functioning program that listens to what is sent; if the code runs out of memory while handling the incoming telecommands, well, you have a dead satellite.

And what about stack? Stack usage also grows and diminishes as functions call other functions and return back... A "stack overflow" means a crashed program. If you use recursive functions, it's impossible to predict how much stack space you'll actually need at runtime; and even without recursive functions, when spawning the application, you need to know the total stack the code will use, and reserve it up-front - being absolutely sure that you won't run out of stack.

So... how do you enforce the restrictions? How do you cope with the stack requirements?

Space companies use custom-made tools and procedures that address these challenges - but just for the fun of it, we can do some hacking of our own :‑)

Meet objdump

objdump

displays information from object files. It is a very useful tool, capable of showing many kinds of information from your executables.

In our case, we will use it's ability to disassemble the ELF binary of our program:

bash$ cat > hello.c main() { int i=314159; printf("Hello, 10K times pi is %d

", i); } (Ctrl-D) bash$ gcc hello.c bash$ objdump -d ./a.out ... Disassembly of section .text: ... 08048384 <main>: 8048384: 55 push %ebp 8048385: 89 e5 mov %esp,%ebp 804838a: 83 ec 20 sub $0x20,%esp 804838d: c7 44 24 1c 2f cb 04 movl $0x4cb2f,0x1c(%esp) 8048394: 00 8048395: 8b 44 24 1c mov 0x1c(%esp),%eax 8048399: 89 44 24 04 mov %eax,0x4(%esp) 804839d: c7 04 24 70 84 04 08 movl $0x8048470,(%esp) 80483a4: e8 0f ff ff ff call 80482b8 <printf@plt> 80483a9: c9 leave 80483aa: c3 ret 80483ab: 90 nop

So, what do we see here?

Apparently, there's a lot of information we can gather, by simple text-processing of this output.

Functions start in lines of the form: ADDRESS <FUNCTIONNAME>:

Calls to other functions follow this pattern: ADDRESS: OPCODES call ADDRESS <FUNCTIONNAME>

Each function starts by reserving the space it needs on the stack: it lowers the stack pointer by a specific amount, which the compiler calculated to provide enough space for local variables and anything else that is to be placed on the stack, e.g. ADDRESS OPCODES sub $0x20,%esp In fact, GCC is more creative, sometimes: ADDRESS OPCODES add $0xFFFFFF84,%esp ...so the stack pointer is decreased via an 'add' instruction, which adds a negative value. Update: Fabien Sanglard pointed me to the reason behind this; it's a size (opcode-wise) optimization.

objdump -d

learn what functions are called (and check if any of the "forbidden" ones are called!)

learn how much stack is used by each function (how much it lessens the stack pointer upon entry) [1]

learn what functions are called by each function - and therefore, form a call graph!

objdump

Scripting it in Python

... import re ... functionNamePattern = re . compile ( r '^(\S+) <([a-zA-Z0-9_]+?)>:' ) callPattern = re . compile ( r '^.*call\s+\S+\s+<([a-zA-Z0-9_]+)>' ) stackUsagePattern = re . compile ( r '^.*[add|sub]\s+\$(0x\S+),%esp' )

... match = functionNamePattern . match ( line ) if match : functionName = match . group ( 1 )

... callgraph = {} stackUsagePerFunction = {} for line in os . popen ( "objdump -d \"" + sys . argv [ 1 ] + "\"" ). readlines (): match = functionNamePattern . match ( line ) if match : functionName = match . group ( 1 ) callGraph [ functionName ] = set () foundStackUsage = False call = callPattern . match ( line ) if functionName != "" and call : calledFunction = call . group ( 1 ) callGraph [ functionName ]. add ( calledFunction ) if not foundStackUsage and functionName != "" : stackMatch = stackUsagePattern . match ( line ) if stackMatch : foundStackUsage = True value = stackMatch . group ( 1 ) value = int ( stackMatch . group ( 1 ), 16 ) if value > 2147483647 : value = 4294967296 - value stackUsagePerFunction [ functionName ] = value

a preliminary use of stack space per function, in the stackUsagePerFunction dictionary

dictionary a call graph, showing us which functions are called by each function ( callGraph[functionName] is a set of unique function names)

malloc

free

Measuring stack usage (and detecting cycles)

This means that by catching these patterns in the output of, we can...Note: there are better ways to obtain some of this information; for example,offers a "-t" option, that provides only symbols and their offsets. However, by using "objdump -d", we can obtain all the information we want in one pass.We will be using regular expressions to match our three targets. Here they are:The regular expressions mark their "targets" (ADDRESS, FUNCTIONNAME, AMOUNTOFSTACK, etc) within parentheses, so we can extract the matched portion easily in Python, with:We apply these pattern matches to the lines generated by objdump:And we now have two things:At this point, detection of use of forbidden functions is very easy - we just look in the call graph, to see if some function is calling, etc - and report it.What about the actual stack usage of a function? It's not just how much space it reserves itself - we have to trace all the call chains that start from it, to see the "longest" chain in terms of stack usage.

But before we do that, we have to make sure that there are no "cycles" in the call graph - that there is no recursion whatsoever. Here's a standalone example of an algorithm that detects cycles:

import sys callGraph = { 'a' : [ 'b' , 'c' ], 'b' : [ 'd' ], 'c' : [ 'e' ], 'd' : [], 'e' : [ 'a' ] } def SearchForCycles ( path ): print path node = path [- 1 ] neighbours = callGraph [ node ] for neighbour in neighbours : if neighbour in path : print "Detected cycle" , path , "->" , neighbour sys . exit ( 1 ) SearchForCycles ( path + [ neighbour ]) for node in callGraph . keys (): SearchForCycles ([ node ])

SearchForCycles

taking the last node of the path formed so far

fetching the neighbours of this last node (from the call graph)

checking to see if any of these neighbours were already in the path - if so, a cycle has been found

otherwise, it recurses futher down in the chain, by calling itself.

bash$ ./detectCycles.py ['a'] ['a', 'b'] ['a', 'b', 'd'] ['a', 'c'] ['a', 'c', 'e'] Detected cycle ['a', 'c', 'e'] -> a

max

def findStackUsage ( foo , stackUsagePerFunction , callGraph , cache ={}): if foo in cache : return cache [ foo ] if foo not in stackUsagePerFunction : return 0 if foo not in callGraph or not callGraph [ foo ]: cache [ foo ]= stackUsagePerFunction [ foo ] return stackUsagePerFunction [ foo ] totalStackUsage = max ( (( stackUsagePerFunction [ foo ] + findStackUsage ( x , stackUsagePerFunction , callGraph )) for x in callGraph [ foo ])) cache [ foo ]= totalStackUsage return totalStackUsage

bash$ stackUsage.py renderer-2.x/src/renderer | c++filt | tail -15 788: Scene::renderAmbient(Camera const&, Screen&) 872: Scene::renderRaytracer(Camera&, Screen&, bool) 1012: Scene::renderGouraud(Camera const&, Screen&) 1040: Scene::renderPhongAndShadowed(Camera const&, Screen&) 1048: lib3ds_chunk_dump_info 1088: Scene::renderPhong(Camera const&, Screen&) 1108: lib3ds_node_read 1124: texture_map_read 1184: lib3ds_material_read 1340: mdata_read 1416: lib3ds_file_read 1460: lib3ds_file_load 1652: lib3ds_mesh_calculate_normals 4080: Scene::load(char const*) 4096: main

bash$ stackUsage.py renderer-2.x/src/renderer | c++filt ... Detected cycle: ['lib3ds_file_insert_node', 'lib3ds_file_insert_node'] Detected cycle: ['CountDepth(BVHNode*, int, int&)', 'CountDepth(BVHNode*, int, int&)'] Detected cycle: ['Scene::PopulateCacheFriendlyBVH(Triangle*, BVHNode*, unsigned int&, unsigned int&)', 'Scene::PopulateCacheFriendlyBVH(Triangle*, BVHNode*, unsigned int&, unsigned int&)'] Detected cycle: ['CountBoxes(BVHNode*)', 'CountBoxes(BVHNode*)'] Detected cycle: ['CountTriangles(BVHNode*)', 'CountTriangles(BVHNode*)'] ...

bash$ stackUsage.py /path/to/leonBinaryMadeWithAda ... Detected cycle: [ '__gnat_raise_nodefer_with_msg', '__gnat_last_chance_handler', '__gnat_rcheck_19', '__gnat_raise_program_error', '__gnat_raise_nodefer_with_msg' ] ...

Final comments

This is relatively simple code: we want to detect recursion in the call chains, so we... write a recursive function :‑)checks a path for cycles, by:And its results:Once we've made sure that there are no cycles in the call graph, we can now safely accumulate stack usage per path: For each function, we report theof the accumulated step costs in all call chains that start from it.And now, we can finally see some results: invoked on my renderer , I get......so I can see exactly how much stack space is used by each function. More importantly, I can see whether cycles (recursion) exists:There - the script detects recursive functions. It also detects "deeper" cycles: here's cycle detection in a SPARC/Leon Ada binary:Mission accomplished :‑)This kind of functionality is usually part of far more mature tools than... quickly-hacked Python scripts. Don't use this to send your spaceship in orbit :‑)

[1] In fact, the latest versions of GCC support the -fstack-usage option, which creates stack usage per function (individually) upon compiling the code. Unfortunately, compilers for embedded systems usually take a while to... catch up, in terms of these features. Parsing the stack "decrease" at each function startup is for now, a good-enough workaround. Note that for SPARC/Leon compilers, the corresponding stack search pattern is... ADDRESS OPCODES save %sp, -AMOUNT, %sp , so the final version of the script uses separate regexps for each architecture.

Note that the calculated stack usage per function assumes a single thread running; If there are N threads spawned in our binary, then of course the stack usage scales correspondingly.

Finally, if you compile with debug info enabled (-g) you will get much better output - many symbols are not visible without it.

Download

You can download the script for yourself from here - it currently supports x86 and SPARC/Leon binaries.

Update: Dan McGee added support for x86-64.



Back to index My CV Last update on: Sat Mar 8 22:58:16 2014

The comments on this website require the use of JavaScript. Perhaps your browser isn't JavaScript capable or the script is not being run for another reason. If you're interested in reading the comments or leaving a comment behind please try again with a different browser or from a different connection.