Starting with Python 2.5, the Python compiler (the part that takes your source-code and translates it to Python VM code for the VM to execute) works as follows :

Parse source code into a parse tree ( Parser/pgen.c ) Transform parse tree into an Abstract Syntax Tree ( Python/ast.c ) Transform AST into a Control Flow Graph ( Python/compile.c ) Emit bytecode based on the Control Flow Graph ( Python/compile.c )

Previously, the only place one could tap into the compilation process was to obtain the parse tree with the parser module. But parse trees are much less convenient to use than ASTs for code transformation and generation. This is why the addition of the _ast module in Python 2.5 was welcome - it became much simpler to play with ASTs created by Python and even modify them. Also, the python built-in compile function can now accept an AST object in addition to source code.

Python 2.6 then took another step forward, including the higher-level ast module in its standard library. ast is a convenient Python-written toolbox to aid working with _ast . All in all we now have a very convenient framework for processing Python source code. A full Python-to-AST parser is included with the standard distribution - what more could we ask? This makes all kinds of language transformation tasks with Python very simple.

What follows are a few examples of cool things that can be done with the new _ast and ast modules.

Manually building ASTs import ast node = ast.Expression(ast.BinOp( ast.Str( 'xy' ), ast.Mult(), ast.Num( 3 ))) fixed = ast.fix_missing_locations(node) codeobj = compile (fixed, '<string>' , 'eval' ) print eval (codeobj) Let's see what is going on here. First we manually create an AST node, using the AST node classes exported by ast . Then the convenient fix_missing_locations function is called to patch the lineno and col_offset attributes of the node and its children. Another useful function that can help is ast.dump . Here's a formatted dump of the node we've created: Expression( body=BinOp( left=Str(s='xy'), op=Mult(), right=Num(n=3))) The most useful single-place reference for the various AST nodes and their structure is Parser/Python.asdl in the source distribution.

Breaking compilation into pieces Given some source code, we first parse it into an AST, and then compile this AST into a code object that can be evaluated: import ast source = '6 + 8' node = ast.parse(source, mode= 'eval' ) print eval ( compile (node, '<string>' , mode= 'eval' )) Again, ast.dump can be helpful to show the AST that was created: Expression( body=BinOp( left=Num(n=6), op=Add(), right=Num(n=8)))

Simple visiting and transformation of ASTs import ast class MyVisitor (ast.NodeVisitor): def visit_Str ( self , node): print 'Found string "%s"' % node.s class MyTransformer (ast.NodeTransformer): def visit_Str ( self , node): return ast.Str( 'str: ' + node.s) node = ast.parse( ''' favs = ['berry', 'apple'] name = 'peter' for item in favs: print '%s likes %s' % (name, item) ''' ) MyTransformer().visit(node) MyVisitor().visit(node) This prints: Found string "str: berry" Found string "str: apple" Found string "str: peter" Found string "str: %s likes %s" The visitor class implements methods that are called for relevant AST nodes (for example visit_Str is called for Str nodes). The transformer is a bit more complex. It calls relevant methods for AST nodes and then replaces them with the returned value of the methods. To prove that the transformed code is perfectly valid, we can just compile and execute it: node = ast.fix_missing_locations(node) exec compile (node, '<string>' , 'exec' ) As expected , this prints: str: str: peter likes str: berry str: str: peter likes str: apple

Reproducing Python source from AST nodes Armin Ronacher wrote a module named codegen that uses the facilities of ast to print back Python source from an AST. Here's how to show the source for the node we transformed in the previous example: import codegen print codegen.to_source(node) And the result: favs = [ 'str: berry' , 'str: apple' ] name = 'str: peter' for item in favs: print 'str: %s likes %s' % (name, item) Yep, looks right. codegen is very useful for debugging or tools that transform Python code and want to save the results . Unfortunately, the version you get from Armin's website isn't suitable for the ast that made it into the standard library. A slightly patched version of codegen that works with the standard 2.6 library can be downloaded here.