March 14 2014

At Hacker School, I've been building an alternate universe Python by overwriting builtin functions and statements with Harry Potter spells. This is a thing you can do at Hacker School!

Although this project started as a joke, I've quickly descended so deeply into Python internals that I've, with the guidance of the fabulous Hacker School facilitator Allison Kaptur, made edits to the CPython source code, and compiled a Python to compile a Python. All to replace the import statement with accio .

But before we get into compiling the Harry Potter Python I lovingly call Nagini, let's first talk about some Python internals basics, with spells as examples, of course.

Overwriting Builtin Functions

Python builtin functions are stored in a module called __builtins__ that's automatically imported on startup.

>>> dir ( __builtins__ ) ['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BufferError', 'BytesWarning', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError', 'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '_', '__debug__', '__doc__', '__import__', '__name__', '__package__', 'abs', 'all', 'any', 'apply', 'basestring', 'bin', 'bool', 'buffer', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'cmp', 'coerce', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'execfile', 'exit', 'file', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'intern', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'long', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'raw_input', 'reduce', 'reload', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'unichr', 'unicode', 'vars', 'xrange', 'zip']

Overwriting Python builtins is surprisingly easy!

>>> wingardium_leviosa = __builtins__ . float >>> del __builtins__ . float >>> float ( 3 ) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> NameError : name 'float' is not defined >>> wingardium_leviosa ( 3 ) 3.0

However, overwriting import is not so easy. Let's try:

>>> accio = import File "<stdin>" , line 1 accio = import ^ SyntaxError : invalid syntax

Python is expecting the name of a module after import , and thus it throws a SyntaxError . This is an effect of import x being a statement, rather than an expression.

Hm. I remember seeing the function __import__ listed when we ran dir(__builtins__) . Maybe we can overwrite that instead:

>>> accio = __builtins__ . __import__ >>> accio sys File "<stdin>" , line 1 accio sys ^ SyntaxError : invalid syntax # :(

What if we tried calling accio like a function?

>>> accio ( sys ) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> NameError : name 'sys' is not defined

Maybe we need to pass 'sys' as a string?

>>> accio ( 'sys' ) <module 'sys' (built-in)> # Ooh! >>> sys = accio ( 'sys' ) >>> sys <module 'sys' (built-in)>

Aha. So the statement import x probably does something like: 1. call the __import__ function on x : __builtins__.__import__('x') 2. assign the name x to the module returned by __import__

And import sys is like shorthand for the command:

>>> sys = __builtins__ . __import__ ( 'sys' )

(Here I'm only describing simple import statements, but more complex statements like from x import y.w, y.z work similarly.)

So we have a way to add accio as a function, but not as a statement. I'm unsatisfied.

For fun, can we delete import?

>>> del import File "<stdin>" , line 1 del import ^ SyntaxError : invalid syntax >>> del __builtins__ . __import__ >>> import os Traceback (most recent call last): File "<stdin>" , line 1 , in <module> ImportError : __import__ not found

Kind of! Although I want import os to be a SyntaxError rather than an ImportError because clearly import is the wrong thing to type and the user should know to type accio instead.

So, to completely overwrite import with accio , we'll need to learn where Python defines statements.

Grammar

Eli Bendersky wrote a great blog post about adding an until statement to Python. Since we want to replace a statement, rather than add one, our method will be a bit different.

Regardless, it looks like the place to start for changing Python's statements is in the Grammar file in the Python source code. Python source code! Isn't this fun?!

Python's source code is stored in a Mercurial repository, so first we'll have to install Mercurial.

$ brew install mercurial

Then we can clone CPython (like git clone ):

$ hg clone http://hg.python.org/cpython

This will take a whole minute. Grab a coffee.

In the Python Mercurial repo, different versions of Python have different branches. By default we're on a Python3 branch. I'm still running Python2 on my machine, so let's checkout version 2.7:

$ cd cpython $ hg checkout 2.7

Now let's compile CPython and see if it works!

$ ./configure --with-pydebug $ make -s -j2

I get a warning message saying some modules were unable to be built, but I am unstoppable. We are unstoppable. Let's continue.

It seems like the place to start is in the file Grammar/Grammar , so let's start poking around there. This is what it looks like. Searching for 'import' brings us to lines 52-60:

import_stmt : import_name | import_from import_name : 'import' dotted_as_names import_from : ( 'from' ( '.' * dotted_name | '.' + ) 'import' ( '*' | '(' import_as_names ')' | import_as_names )) import_as_name : NAME [ 'as' NAME ] dotted_as_name : dotted_name [ 'as' NAME ] import_as_names : import_as_name ( ',' import_as_name ) * [ ',' ] dotted_as_names : dotted_as_name ( ',' dotted_as_name ) * dotted_name : NAME ( '.' NAME ) *

Cool! We can kind of understand what's going on here just from reading. It looks like an import_stmt is either an import_name or an import_from which have the format import x and from x import y , respectively. What happens if we just change 'import' to 'accio' in lines 53 and 55? Let's try it. After making the change and saving the Grammar file, type the following command to compile:

$ make -s -j2

Ach. If only it was that easy. This throws an error:

Traceback (most recent call last): File "/Users/amyhanlon/projects/nagini/cpython/Lib/runpy.py" , line 151 , in _run_module_as_main mod_name , loader , code , fname = _get_module_details ( mod_name ) File "/Users/amyhanlon/projects/nagini/cpython/Lib/runpy.py" , line 113 , in _get_module_details code = loader . get_code ( mod_name ) File "/Users/amyhanlon/projects/nagini/cpython/Lib/pkgutil.py" , line 283 , in get_code self . code = compile ( source , self . filename , 'exec' ) File "/Users/amyhanlon/projects/nagini/cpython/Lib/sysconfig.py" , line 4 import sys ^ SyntaxError : invalid syntax

This error occurs while trying to execute a Python script! Compiling CPython requires running Python scripts! Interesting. Maybe at this point we remember that Python is bootstrapped. We look back at the Python Developer's Guide and we find that "Vast areas of CPython are written completely in Python: as of this writing, CPython contains slightly more Python code than C."

So then we wonder - when CPython is compiling, does it execute Python scripts with the Python that's currently being compiled? Or does it use another already-compiled muggle Python, like our environment Python? If it uses the Python that's currently being compiled, we'll need to change these .py scripts to say accio instead of import . Otherwise, what do we do? Our muggle Python only understands import and not accio ...

Let's look into one of the .py scripts within Lib to investigate. Here's the first line of the Lib/keyword.py script:

#! /usr/bin/env python

Aha! This script is executed via our environment Python! Our environment Python only understands import . So keyword.py needs to have import and not accio . However, since we got a SyntaxError on an import statement, that must mean that at least sometimes during the process of compiling we're required to use accio instead of import . Hrm... Any ideas?

Yo Dawg, I Heard You Like Pythons

What if we did something crazy like compiled an intermediary Python that understands both accio and import , and used that Python to compile another Python that only understands accio ? (Full credit for this idea goes to Allison Kaptur.)

So, for our intermediary Python we'll need to edit the Grammar file like so:

import_name : 'import' dotted_as_names | 'accio' dotted_as_names import_from : (( 'from' ( '.' * dotted_name | '.' + ) 'import' ( '*' | '(' import_as_names ')' | import_as_names )) | ( 'from' ( '.' * dotted_name | '.' + ) 'accio' ( '*' | '(' import_as_names ')' | import_as_names )))

Thus this Python should understand both import and accio . Let's compile.

$ make -s -j2

Eep! No errors! Just the warning about missing modules that we also received before we made any changes! Now we need to prepend our $PATH so that this Python will become our environment Python (but only for this terminal session). That way this intermediary Python will be used to compile our final Python. Let's make a symlink to the python.exe that was created when we ran make , and then add the path to that symlink to our $PATH:

$ mkdir bin $ cd bin $ ln -s ../python.exe python $ export PATH = ` pwd ` : $PATH

Now we'll need to duplicate this entire cpython directory and make our final Python:

$ cd ../ $ cp -r cpython nagini-python $ cd nagini-python

We want to change the Grammar file for this Python to only allow accio :

import_name : 'accio' dotted_as_names import_from : ( 'from' ( '.' * dotted_name | '.' + ) 'accio' ( '*' | '(' import_as_names ')' | import_as_names ))

And then we want to replace every instance of import in every .py file to accio . We'll use a blackbox bash command to accomplish that:

$ for i in ` find . -name '*.py' ` ; do sed -i '' 's/[[:<:]]import[[:>:]]/accio/g' $i ; done

Now we just need to compile this new Python!

$ make -s -j2

Let's make a symlink to this Python...

$ mkdir bin $ cd bin $ ln -s ../python.exe python $ export PATH = ` pwd ` : $PATH $ python

And fire it up...

>>> import sys File "<stdin>" , line 1 import sys ^ SyntaxError : invalid syntax >>> accio sys >>> sys . modules . keys () ['copy_reg', 'sre_compile', '_sre', 'encodings', 'site', '__builtin__', 'sysconfig', '__main__', 'encodings.encodings', 'abc', 'posixpath', '_weakrefset', 'errno', 'encodings.codecs', 'sre_constants', 're', '_abcoll', 'types', '_codecs', 'encodings.__builtin__', '_warnings', 'genericpath', 'stat', 'zipimport', '_sysconfigdata', 'warnings', 'UserDict', 'encodings.ascii', 'sys', '_osx_support', 'codecs', 'os.path', 'sitecustomize', 'signal', 'traceback', 'linecache', 'posix', 'encodings.aliases', 'exceptions', 'sre_parse', 'os', '_weakref']

HOLY SHIT IT WORKS!

Fin

That's it. We just compiled two Pythons and fooled around with source code for the sake of a joke. Grab yourselves a beer, friends. Victory.

My super messy and not-really-prepared-for-the-general-public GitHub repo contains both versions of Python, for reference.

tags: python harry potter bootstrapping cpython compilers grammar hacker school python internals