Instrumenting the AST

The following is a rough retelling of my presentation for the Testing in Python BoF at PyCon 2010, including removing some in-jokes relevant only for that session. On the other hand, I expanded it to include working code. This means you, yes you, could work on this. It's not for the faint of heart. Have fun! And let me know what you think.

The AST module

Code coverage is a good thing. I want to do branch coverage. Last year Ned Batchelder added branch coverage support to coverage.py, which works by analyzing the byte code. I want to see if there's a better solution through an entirely different approach.

Let me introduce you to Python's "ast" module.

>>> import ast

>>> ast.parse("for i in range(10): print i") <_ast.Module object at 0x1004d06d0> >>>

>>> ast.dump(ast.parse("for i in range(10): print i")) "Module(body=[For(target=Name(id='i', ctx=Store()), iter=Call(func=Name(id='range', ctx=Load()), args=[Num(n=10)], keywords=[], starargs=None, kwargs=None), body=[Print(dest=None, values=[Name(id='i', ctx=Load())], nl=True)], orelse=[])])"

>>> ast.dump(ast.parse("for i in range(10): print i"), True, True) "Module(body=[For(target=Name(id='i', ctx=Store(), lineno=1, col_offset=4), iter=Call(func=Name( id='range', ctx=Load(), lineno=1, col_offset=9), args=[Num(n=10, lineno=1, col_offset=15)], keywords=[], starargs=None, kwargs=None, lineno=1, col_offset=9), body=[Print(dest=None, values=[Name(id='i', ctx=Load(), lineno=1, col_offset=26)], nl=True, lineno=1, col_offset=20)], orelse=[], lineno=1, col_offset=0)])"

Programmatically building an AST

It's an interface to Python's internal Python parser so program can convert string containing Python code into an abstract syntax tree (AST).The ast module contains some code to display the contents of the AST as a string.I've reformatted it to fit my slides as otherwise it's a long string. I can also ask it to display the position information, which is the second True in the following.

You can use the ast module to build a tree directly, without parsing a string, then compile and execute that code.

>>> from ast import * >>> tree = Module([Print(None, [Str("PyCon2010!")], True)]) >>> tree.lineno = 1 >>> tree.col_offset = 1 >>> fix_missing_locations(tree) <_ast.Module object at 0x1004dff50> >>> tree = fix_missing_locations(tree) >>> compile(tree, "<TiP>", "exec") <code object <module> at 0x1004d38a0, file "<TiP>", line 1> >>> exec compile(tree, "<TiP>", "exec") PyCon2010! >>>

The mystery of the wrong TypeError

"ast.fix_missing_locations" is a helper function to assign missing position information the compiler needs when generating byte code. I end up using it and "ast.copy_location" a lot, which copies the location information from one node to another.

What's the bug with the following?

try: raise TypeError("blah: %d" % "I said 'PyCon2010'!") except TypeError: pass

The code correctly raises a TypeError, but it's the wrong TypeError. I've made this mistake a few times, which is why I try to remember to check the contents of the exception during my tests. Notice that unittest doesn't help here, since assertRaises only checks the exception type, and not the content.

It is possible to check all of these manually. You could defer the calculation to a "check_mod()" function

try: raise TypeError( check_mod("blah: %d", "I said 'PyCon2010'!")) except TypeError: pass

def check_mod(left, right): try: return left % right except Exception, err: print "Could not interpolate: %s" % (err,) traceback.print_stack() raise

Rewriting the AST for fun (and profit?)

A check_mod function might look like

The ast module has some support code for creating a new parse tree based on transforming another parse tree. I can transform all "%" binary operations to call a function for the left and right sides. Here's a non-working but mostly complete version of how that might look.

from ast import * class RewriteInterpolation(NodeTransformer): def visit_BinOp(self, node): if isinstance(node.op, Mod): new_node = Call(func=Name(id='check_string', ctx=Load()), args=[node.left, node.right, Num(n=node.lineno), Num(n=node.col_offset)], keywords = [], starargs=None, kwargs=None ) copy_location(new_node, node) fix_missing_locations(new_node) return new_node return node code = open(filename).read() tree = parse(code, filename) tree = RewriteInterpolation().visit(tree)

What's missing is the code to define or import check_string. I'll leave that for later. For now, just get the idea that you can parse Python code to an AST, rewrite it in order to instrument certain parts, then execute the result.

I ran something similar to this against the Python standard library and tests, in the hopes that I could find an bug. It took a lot of hands on fiddling, since some essential Python modules cannot be instrumented because the full path wasn't fully defined. I got it to work, and found no bugs. The closest was this code from difflib.py

try: linenum = '%d' % linenum id = ' id="%s%s"' % (self._prefix[side],linenum) except TypeError: # handle blank lines where linenum is '>' or '' id = ''

(Well-known) Limitations in coverage.py

where the comment is needed because otherwise the reason isn't immediately obvious. While not a bug, it perhaps does show you that the test revealed something. (Oh, and it also showed the several hundred tests that the standard library does to test string interpolation failures.)

Take a look at this program. I've used coverage.py to run the program and annotate the code to display the coverage and highlight the lines which weren't executed.

You can see there are a other few problems which coverage did not test. Line 9 never executes "x=9" and the "raise TypeError" in line 17 is never reached, because of the string interpolation error in the parameter list. I've hacked together something over the last 30 hours to show that something better is possible.

A different approach

x = 1

from ast_report import register_module ast_enter, ast_leave, ast_reached = \ register_module('spam/testing.py', {0: (1, 0)}, {} ) # unique identifer -> (lineno, col_offset) if 1: ast_enter[0] += 1 x = 3 ast_leave[0] += 1

I want to generate coverage for this statement:I'll do that by parsing the code into the AST then rewriting the AST so it's equivalent to:

The "register_module" function is something I'll write in a bit. It will take a filename and two location dictionaries. The first dictionary is for statements like assignment which are supposed to go to completion. That is, code before it and after it are supposed to execute. (Compare this to 'return', which will never allow code after it to run. That's what the second dictionary is used for.)

The key is a unique identifier associated with each statement with a coverage test, and the value is the lineno and col_offset pair which come from the AST.

The ast_enter and ast_leave dictionaries here are default dicts (though that's an implementation point). The "0" is same unique identifier in the location dictionary, and can be used to say that the statment at line 1, column 1 (col_offset starts with 0), was reached and left.

At this point someone in the audience astutely asked why I used an "if 1:" in the above. That's a limitation of how the ast.NodeTransformer works. It lets derived classes tranform a single term to a single other term, which means I need to transform a single statement (the assignment here) into a single other statement, and not three statements. I chose the "if 1:" because it's easy to write, it can contain an arbitrary number of sub-statements, and because Python's byte compiler knows how to optimize away the "if 1:" test.

If you think about this approach, the run-time overhead is pretty low, but it's a lot more than simple assignment. I don't know how it affects real-world code. Remember, it's been 30 hours since I started this, and I'm at conference as well.

Instrumenting the AST for code coverage

The next step is to automate all of this: convert a .py file into an AST, instrument the code to add these coverage checks, implement the reporting mechanism as an atexit hook, and for good measure, add the "%" TypeError check. To see if this is effective, convert the AST to byte code and save it to a .pyc file.

This calls for a horrible hack around a call to compileall.py. I've named the result "ast_compileall.py"

# ast_compileall.py import __builtin__ from ast import * import compileall import sys import traceback import itertools class RewriteInterpolation(NodeTransformer): def __init__(self, filename): self.filename = filename self.enter_linenos = {} # id -> (lineno, col_offset) self.reach_linenos = {} # id -> (lineno, col_offset) self.counter = itertools.count() def visit_Module(self, module_node): # Need to import and call ast_report.register_module(). # These must occur after the "from __future__ import ..." statements. # Find where I can insert them. body_future = [] body_rest = [] for node in module_node.body: node = self.visit(node) if (not body_rest and isinstance(node, ImportFrom) and node.module == "__future__"): body_future.append(node) else: body_rest.append(node) # It's easier to let Python convert the code to an AST import_line = parse("from ast_report import register_module, check_string").body[0] print ("ast_enter, ast_leave, ast_reached = register_module(%r, %r, %r)" % (self.filename, self.enter_linenos, self.reach_linenos)) register_line = parse( "ast_enter, ast_leave, ast_reached = register_module(%r, %r, %r)" % (self.filename, self.enter_linenos, self.reach_linenos)).body[0] # Assign a reasonable seeming line number. lineno = 1 if body_future: lineno = body_future[0].lineno for new_node in (import_line, register_line): new_node.col_offset = 1 new_node.lineno = lineno new_body = body_future + [import_line, register_line] + body_rest return Module(body=new_body) # These are statements which should have an enter and leave # (In retrospect, this isn't always true, eg, for 'if') def track_enter_leave_lineno(self, node): node = self.generic_visit(node) id = next(self.counter) enter = parse("ast_enter[%d] += 1" % id).body[0] leave = parse("ast_leave[%d] += 1" % id).body[0] self.enter_linenos[id] = (node.lineno, node.col_offset) for new_node in (enter, leave): copy_location(new_node, node) # This is the code for "if 1: ..." n = Num(n=1) copy_location(n, node) if_node = If(test=n, body=[enter, node, leave], orelse=[]) copy_location(if_node, node) return if_node visit_FunctionDef = track_enter_leave_lineno visit_ClassDef = track_enter_leave_lineno visit_Assign = track_enter_leave_lineno visit_AugAssign = track_enter_leave_lineno visit_Delete = track_enter_leave_lineno visit_Print = track_enter_leave_lineno visit_For = track_enter_leave_lineno visit_While = track_enter_leave_lineno visit_If = track_enter_leave_lineno visit_With = track_enter_leave_lineno visit_TryExcept = track_enter_leave_lineno visit_TryFinally = track_enter_leave_lineno visit_Assert = track_enter_leave_lineno visit_Import = track_enter_leave_lineno visit_ImportFrom = track_enter_leave_lineno visit_Exec = track_enter_leave_lineno #Global visit_Expr = track_enter_leave_lineno visit_Pass = track_enter_leave_lineno # These statements can be reached, but they change # control flow and are never exited. def track_reached_lineno(self, node): node = self.generic_visit(node) id = next(self.counter) reach = parse("ast_reached[%d] += 1" % id).body[0] self.reach_linenos[id] = (node.lineno, node.col_offset) copy_location(reach, node) n = Num(n=1) copy_location(n, node) if_node = If(test=n, body=[reach, node], orelse=[]) copy_location(if_node, node) return if_node visit_Return = track_reached_lineno visit_Raise = track_reached_lineno visit_Break = track_reached_lineno visit_Continue = track_reached_lineno # Some code to instrument the run-time and check for '%' failures. def visit_BinOp(self, node): if isinstance(node.op, Mod): new_node = Call(func=Name(id='check_string', ctx=Load()), args=[node.left, node.right, Num(n=node.lineno), Num(n=node.col_offset)], keywords = [], starargs=None, kwargs=None ) copy_location(new_node, node) fix_missing_locations(new_node) return new_node return node old_compile = __builtin__.compile def compile(source, filename, mode, flags=0): # skipping a few parameters # My rewrite code uses ast.parse, which ends up calling this # function with this argument, so pass it back to the real compile. if flags == PyCF_ONLY_AST: return old_compile(source, filename, mode, flags) assert mode == "exec" #traceback.print_stack() code = open(filename).read() tree = parse(code, filename) tree = RewriteInterpolation(filename).visit(tree) code = old_compile(tree, filename, "exec") return code # Ugly hack so I can force compileall to use my compile function. __builtin__.compile = compile exit_status = int(not compileall.main()) sys.exit(exit_status)

def main(): def f(x): if x > 0: return x*x 1/0 for i in range(4, 9): if f(i) < 0: x=9 if i == 8: continue print "Here" if i == 10: continue try: raise TypeError("Hi! %d" % "sdfa") except TypeError: pass main()

python ast_compileall.py spam

# ast_report.py from collections import defaultdict import traceback import atexit import linecache loaded_modules = [] class FileInfo(object): def __init__(self, filename, enter_linenos, reach_linenos): self.filename = filename self.enter_linenos = enter_linenos self.reach_linenos = reach_linenos self.ast_enter = defaultdict(int) self.ast_leave = defaultdict(int) self.ast_reach = defaultdict(int) def register_module(filename, enter_linenos, reach_linenos): #print filename, enter_linenos, reach_linenos info = FileInfo(filename, enter_linenos, reach_linenos) loaded_modules.append(info) return info.ast_enter, info.ast_leave, info.ast_reach def check_string(left, right, lineno, col_offset): if not isinstance(left, basestring): return left % right try: return left % right except Exception, err: print "Could not interpolate: %s" % (err,) traceback.print_stack() raise # Basic coverage report def report_coverage(): for fileinfo in loaded_modules: # This will contain a list of all results as a 3-ple of # lineno, col_offset, "text message" report = [] # These should have both 'enter' and 'leave' counts. for id, (lineno, col_offset) in fileinfo.enter_linenos.items(): if id not in fileinfo.ast_enter: report.append( (lineno, col_offset, "not entered") ) elif id not in fileinfo.ast_leave: report.append( (lineno, col_offset, "enter %d but never left" % fileinfo.ast_enter[id]) ) else: delta = fileinfo.ast_leave[id] - fileinfo.ast_enter[id] report.append( (lineno, col_offset, "enter %d leave %d (diff %d)" % (fileinfo.ast_enter[id], fileinfo.ast_leave[id], delta)) ) # These only need to be 'reach'ed for id, (lineno, col_offset) in fileinfo.reach_linenos.items(): if id not in fileinfo.ast_reach: report.append( (lineno, col_offset, "not reached") ) else: report.append( (lineno, col_offset, "reach %d" % (fileinfo.ast_reach[id],)) ) # sort by line number, breaking ties by column offset report.sort() print "Coverage results for file", fileinfo.filename for lineno, col_offset, msg in report: print "%d:%d %s" % (lineno, col_offset+1, msg) print linecache.getline(fileinfo.filename, lineno).rstrip() # Dump the coverage results when Python exist. atexit.register(report_coverage)

Try it out!

I placed this file in "spam/testing.py"I then compiled all of the .py files in the 'spam' directory withand I made sure the following was on my PYTHONPATH as "ast_report.py"(While I used an atexit hook here, I did that because it was the fastest way to get to a proof-of-concept solution. Really I think this should be more like how coverage.py works, with a command-line script which sets up the run environment and reports the results at the end.)

This coverage code will only work on modules which were imported, where the .pyc file is used instead of the .py file. (But perhaps an import hook would be useful or at least interesting here?) What I do is import the module via the command-line

% cd spam/ % python -c 'import testing' Could not interpolate: %d format: a number is required, not str File "<string>", line 1, in <module> File "spam/testing.py", line 21, in <module> main() File "spam/testing.py", line 785, in main File "ast_report.py", line 30, in check_string traceback.print_stack() Coverage results for file spam/testing.py 1:1 enter 1 leave 1 (diff 0) def main(): 3:3 enter 1 leave 1 (diff 0) def f(x): 4:5 enter 5 but never left if x > 0: 5:7 reach 5 return x*x 6:5 not entered 1/0 8:3 enter 1 leave 1 (diff 0) for i in range(4, 9): 9:5 enter 5 leave 5 (diff 0) if f(i) < 0: x=9 9:18 not entered if f(i) < 0: x=9 10:5 enter 5 leave 4 (diff -1) if i == 8: 11:8 reach 1 continue 12:8 not entered print "Here" 13:5 enter 4 leave 4 (diff 0) if i == 10: 14:8 not reached continue 16:3 enter 1 leave 1 (diff 0) try: 17:7 reach 1 raise TypeError("Hi! %d" % "sdfa") 19:7 enter 1 leave 1 (diff 0) pass 21:1 enter 1 leave 1 (diff 0) main()

You can see that it reports the string interpolation without a problem, and if you look closely you'll see that it catches that the "if" on line 9 is executed while the "x=9" also on line 9 is never executed.

There's also some problems. Line 4 reports that the code was entered 5 times and never left, but that's a bit of a false positive since it left through a return statement. I think now, after additional thought, that the better solution is to put the "leave" test on the first line of each possible branch.

Pluses and minuses

There are some great advantages to this approach.

I don't need to look at the stack frame to figure out where I am, or even use the sys.settrace() hook.

I get coverage testing of every statement on a line.

I can instrument a specific and limited set of Python files

Full branch coverage is possible.

I can add tests which are almost impossible to add otherwise (like "%d" % "asdf"; or what about checking if the RHS of an assert will actually work?)

What about instrumenting all "d.keys()" calls in Python 2.x code to check and report if a dict keys() result is ever used as something other than the iterator, like it would be in Python 3.x?

Some very complex things are possible. Some very evil things are also possible.

There are some difficult problems as well. Consider:

x = arg or default_arg or die(_("missing arg"))

Branch reporting should say that 'arg' tested both True and False, that default_arg tested True and False and ... that the result of die() tested both True and False?

And just how should someone visualize all this extra data?

"See also" and ruminations

I talked with Ned some after my presentation. He pointed out that the complex part of coverage.py, which he's worked on a lot during the last year, is to make the system configurable so it can be told which coverage to ignore. I know what he means. In the late 1990s I added the "#pragma: no cover" option to the early form of coverage.py, which exists (although not my actual code) to this day.

If coverage works on a more fine-grained level, how do you suppress the false warnings so the true issues aren't hidden in the noise?

Ned also pointed out Matthew J. Desmarais' work with Canopy

instrument python code to generate robust coverage information. the goal is to provide modified condition/decision coverage metrics.

I'm not the only one who has thought about instrumenting the AST, even in Python. (The Lisp community likely thought of this before I was born.) What I've hoped to do here is explain it well enough so that you can figure out how this approach works and come up with ways to extend it for the future.... or figure out why it fails.

If you are doing that, do bear in mind my python4ply package. It contains a full grammar definition for Python using PLY, with support for the decrepit AST from the compiler module. Potentially you could use it to have Python 3 generate an AST for Python 2, or even vice versa, with a lot more work.

Or, if you have both money and interest, perhaps you'll fund me? I am a consultant, after all. I mostly work in computational chemistry and my clients aren't interested in this sort of deep language analysis, so I only work on this during rare intervals. It's not only money, but access to people who want these sorts of capabilities and can give me feedback on what they want and how effective a solution is.

Or, if you want to work on it yourself - feel free! I hereby release all of this code to the public domain, and disavow any copyright interest in the code expressed in this article. You don't even have to mention my name. Just develop good testing tools.

I know there are a number of tools in the greater world of computing which can work on ASTs. I have no experience with them. Perhaps it's best to convert the Python AST to some other tree grammar where there is a tree manipulation language? When I'm feeling crazy I think "just convert the AST to XML then use XSLT to add the instrumentation, and convert the resulting XML back to an AST." How sane is that? And it would mean I would have to learn a lot more about XSLT. Or what about ANTLR's tree grammars? But then there's Manual Tree Walking Is Better Than Tree Grammars. It's a Brave New World.

Thanks!

I thank Armin Rigo, Brett Cannon, Grant Edwards, John Ehresman, Jeremy Hylton, Kurt Kaiser, Neal Norwitz, Neil Schemenauer, Nick Coghlan, Tim Peters, Martin von Löwis and everyone else who worked on the ast module. Without them this would be a much harder problem.

Any comments?

Leave them here.

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me