This is the second part of the Protecting a Python codebase series. This time we will be playing with Python Abstract Syntax Trees in order to protect the original code of a Python based project.

Python Abstract Syntax Trees

As the Python Documentation says:

The ast module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like.

The built-in module provides the tools to navigate, create or modify a tree. A Python AST is tree-like representation of the python code that’s closer to the bytecode implementation than the source code itself.

Take for instance the code in a little python module named hello.py :

1 2 def hello_world ( name = 'World' ): print ( 'Hello {0}' . format ( name ))

We can get the tree representation using the ast module:

1 2 3 4 5 import ast with open ( 'hello.py' ) as f : code = '' . join ( f . readlines ()) root = ast . parse ( code )

We can also write a node visitor to inspect the tree structure:

1 2 3 4 5 6 7 8 9 10 11 import ast class Visitor ( ast . NodeVisitor ): def __init__ ( self ): self . stack = 0 def visit ( self , node ): print " " * self . stack , node self . stack += 1 super ( Visitor , self ). visit ( node ) self . stack -= 1

Applying the visitor to the parsed code will result in something like:

>>> Visitor().visit(node) <_ast.Module object at 0x7fb5200282d0> <_ast.FunctionDef object at 0x7fb5200281d0> <_ast.arguments object at 0x7fb520028210> <_ast.Name object at 0x7fb520028d90> <_ast.Param object at 0x7fb521c46810> <_ast.Str object at 0x7fb520028550> <_ast.Print object at 0x7fb520028dd0> <_ast.Call object at 0x7fb520028ed0> <_ast.Attribute object at 0x7fb520028f10> <_ast.Str object at 0x7fb520028f50> <_ast.Load object at 0x7fb521c46590> <_ast.Name object at 0x7fb520028f90> <_ast.Load object at 0x7fb521c46590>

Each element has its own set of attributes to fulfill the operation, Module objects have a body attribute which is a list with the different elements defined in it. Call have the function name to call (in the example above it refers to the format method in String ). There’s a _field property with the list of attributes supported by the different types.

Python Abstract Syntax Trees are a really powerful tool, we can build really neat tools with them, tools to gather code statistics, tools to translate Python to other languages. We will use it to write a code scrambling and unscrambling tool, that combined with an import hook will serve as a protection mechanism.

For that we need a NodeTransformer class:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 from ast import NodeTransformer class Scrambler ( NodeTransformer ): HEADER = '# scrambled' def __init__ ( self , scramble = True ): self . do_scramble = scramble def visit ( self , node ): node_out = super ( Scrambler , self ). visit ( node ) if hasattr ( node_out , 'body' ) and isinstance ( node_out . body , list ): if self . do_scramble : node_out . body = self . scramble ( node_out . body ) else : node_out . body = self . unscramble ( node_out . body ) return node_out def scramble ( self , items ): return self . _step2 ( self . _step1 ( items [:])) def unscramble ( self , items ): return self . _step1 ( self . _step2 ( items [:])) def _step1 ( self , items ): i = 0 length = len ( items ) while ( i + 1 ) < length : items [ i ], items [ i + 1 ] = items [ i + 1 ], items [ i ] i += 2 return items def _step2 ( self , items ): length = len ( items ) if length % 2 == 0 : items [: length / 2 ], items [ length / 2 :] = \ items [ length / 2 :], items [: length / 2 ] else : items [:( length - 1 ) / 2 ], items [( length + 1 ) / 2 :] = \ items [( length + 1 ) / 2 :], items [:( length - 1 ) / 2 ] return items

This code will take a Python AST, and scramble it by swapping them items in those with a body attribute that’s of type list . The transformation applied will be similar to:

>>> original = [1, 2, 3, 4, 5, 6] >>> after_step_1 = step1(original) [2, 1, 4, 3, 6, 5] >>> after_step_2 = step2(after_step_1) [3, 6, 5, 2, 1, 4]

Which is easy to transform back, it’s enough to apply the transformations in the reverse order to get the original object.

If we apply it to the following code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import os import pprint def files_tree ( path = None ): out = {} path = path or os . getcwd () files = os . listdir ( path ) for file in files : if file . startswith ( '.' ): continue sub_path = os . path . join ( path , file ) if os . path . isdir ( sub_path ): value = files_tree ( sub_path ) else : value = sub_path out [ file ] = value return out tree = files_tree () print 'Items in the top level: {0}' . format ( len ( tree . keys ())) print 'Items in the second level: {0}' . format ( sum ( len ( val . keys ()) for val in tree . values () if isinstance ( val , dict )) ) pprint . pprint ( tree )

It will come out as:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # scrambled print 'Items in the second level: {0}' . format ( sum (( len ( val . keys ()) for val in tree . values () if isinstance ( val , dict )))) print 'Items in the top level: {0}' . format ( len ( tree . keys ())) pprint . pprint ( tree ) def files_tree ( path = None ): files = os . listdir ( path ) return out for file in files : out [ file ] = value if os . path . isdir ( sub_path ): value = files_tree ( sub_path ) else : value = sub_path sub_path = os . path . join ( path , file ) if file . startswith ( '.' ): continue path = ( path or os . getcwd ()) out = {} import pprint import os tree = files_tree ()

And this is just a simple transformation, more changes can be implemented in order to make things a bit more interesting, some options to take into account are:

Change parameters order in functions

Change decorators order in functions

Change arguments order in function calls

Change arguments order in class creations

Rename variable names

Encode strings with rot13 or similar

This scrambler with the corresponding import hook are published in the companion examples repository.

Conclusion

Python Abstract Syntax Trees is a very versatile tool that allows us to create a solution that better fits the project where it’s used.

But, as it’s mentioned in the previous article, the code can be inspected from a shell once it was imported.

If you plan digging more in Python AST, I recommend Green Tree Snakes - the missing Python AST docs docs.