Python Source Obfuscation using ASTs

Introduction

For one of the challenges of the Hack in the Box Capture the Flag

game last week, I decided to release an obfuscated and compiled Python class.

After doing some research on the internet about this particular topic, it

appeared there is no real up-to-date tool for this. I mostly found paid

software and/or software that has been outdated for several years, or at least

looks like it is.

Well, that’s great. This allows me to make something new-ish.

The Actual Challenge

The actual challenge was actually rather easy; given a teamname

and a flag on the commandline, the Python script would verify whether the flag

is correct or not. By correctly using a few prints here and there, the

challenge can be solved within minutes. Which is why we obfuscate it! As we

obviously want the teams playing our CTF to get some headaches

Abstract Syntax Trees

According to Wikipedia, an Abstract Syntax Tree is a tree representation of

the abstract syntactic structure of source code written in a programming

language. In other words, an AST represents the original source code as a

tree.

Fortunately for us, Python provides a built-in ast module which is able to

parse Python source into an AST (actually using the built-in compile()

method.) Besides that, the ast module gives us access to all available ast

nodes (e.g., Call, BinOp, etc.)

With this knowledge, we’re able to rewrite the original AST using the

ast.NodeTransformer class. For a brief example of what this

looks like, please refer to the official documentation.

Finally, after rewriting the AST, we can do two things. We can generate a

compiled python object directly from the AST. Unfortunately I did not find a

way to do this (or a library, for that matter), if you know of one, feel free

to point me to it The other option is to generate Python code from the AST

again and to compile it from there. For this step, one would use the

codegen.py module. (Note that I submitted a pull request,

as the current version gave me an error with regards to the omission of

parentheses for binary operations.)

We’re now at the point where we can parse Python code into an AST, rewrite it,

and write a new Python source from the rewritten AST. The final step for my

challenge was to compile the created Python source into an object, which can

be done by executing the following command on the commandline. (I’m sure it’s

also possible using a Python function, but this works just fine for the

moment.)

$ python -mcompileall .

Decompilation

Before diving into the obfuscations I performed on the AST, I’d like to note

that the compiled Python object can be decompiled using Mysterie’s

python decompiler, uncompyle2.

Obfuscation through ASTs

So basically I only did a few simple obfuscations, which already proved to be

painful enough, but it’s a nice start for anyone that’s looking into doing

something similar.

The ast.NodeVisitor class, which was mentioned earlier in this blogpost,

allows one to visit each AST node in the tree, with the possibility to

modify them or to delete them. We can do this by implementing visit_

functions. For example, in order to analyzer/modify/delete certain Name

nodes, which are used for variable lookups etc, we implement a visit_Name

function in our obfuscation class (which, btw, extends ast.NodeVisitor.)

Modifying AST nodes using the NodeTransformer

The NodeTransformer can modify an existing AST in a fairly simple way. By

returning the original node, the AST remains untouched, as is showed in the

following example code.

from ast import NodeTransformer class Example01(NodeTransformer): def visit_Str(self, node): return node

One can modify an AST node by return a new node. For example, to replace all

strings with an empty string, see the following snippet.

from ast import NodeTransformer class Example02(NodeTransformer): def visit_Str(self, node): return Str(s='')

And, finally, to delete a node, simply return None in the visit_ function,

although this can give weird situations in which the new AST is not valid

anymore.

Example Obfuscation – Strings

In the AST, constant String nodes are represented with Str nodes. These

Str nodes have one interesting field, namely the s field, which contains

the actual string. For example, in the AST of the following Python source,

there will be exactly one Str node with the s field set to “Hello AST”.

print 'Hello AST'

For the challenge, I implemented a handful simple string obfuscations. Take

for example the following code (rewritten a bit, but similar to the code in

the obfuscator.)

from ast import NodeTransformer, BinOp, Str, Add class StringObfuscator(NodeTransformer): def visit_Str(self, node): return BinOp(left=Str(s=node.s[:len(node.s)/2]), op=Add(), right=Str(s=node.s[len(node.s)/2:])),

Noteworthy in this example code is that BinOp is a node representing a

binary operation, in this case addition (because of the Add node.) A binary

operation takes a left operand and a right one. On the left we put the first

half of the actual string, and on the right we put the second half of the

string. When running this “obfuscator” on our example once, we get the

following code. (Note that you can run such obfuscator multiple times to

achieve extra painful code. This is what I did for the challenge :p)

print ('Hell' + 'o AST')

Other string obfuscations included reversing a string, i.e., “abc” ->

“cba”[::-1], and converting single-length strings (which you’ll get soon

enough when recursively running the obfuscator a few times) into a chr()

statement (i.e., “a” -> chr(0×61).)

The Obfuscated Challenge

After running the original challenge a few times through

the obfuscator, which, in addition to obfuscating strings, also

obfuscates integers, import statements, and global variable names, we get

our actual challenge.

And, yes, running the obfuscator several times does indeed look like the

following.

$ python hitbctfobf.py hitbctforig.py|python hitbctfobf.py -|...

Outro

Having pasted the original challenge in the blogpost, there’s not much left of

the challenge itself. However, I found the methods behind the obfuscation

fairly interesting, and perhaps so does somebody else..