Exploring Dynamic Scoping in Python

Experimenting with Code Objects & Bytecode

Introduction Ruby has anonymous code blocks, Python doesn't. Anonymous code blocks are (apparently) an important feature in implementing DSL s, much touted by Ruby protaganists. As far as I can tell, the major difference between code block in Ruby and functions in Python is that code blocks are executed in the current scope. You can rebind local variables in the scope in which the code block is executed. Python functions have a lexical scope, with the execption that you can't rebind variables in the enclosing scope. Note It turns out that this is wrong. Ruby code blocks are lexically scoped like Python functions. This article is really an exploration of dynamic scoping. If you define a function inside a function or method which uses the variable 'x', this will be loaded from the scope in which the function was defined; not the scope in which it is executed. This is enormously useful, but perhaps not always the desired behaviour. If a function assigns to the variable 'x' this will always be inside the scope of the function and not affect the scope the function was defined in or executed in. I thought it would be fun to try and implement this feature of anonymous code blocks for Python, using code objects. This should be a fun way to learn more about the implementation of Python scoping rules by experimenting with byte-code. If this sounds like it's a hack, then it's only because it is. It is interesting to note however that Aspect Oriented Programming is a well accepted technique in Java, and is mainly implemented at the bytecode level. This article looks at the byte-code operations used in code objects and experiments with creating new ones. Although the details of the byte-codes are shown, no great technical knowledge should be needed to follow the article.

Code Objects Python doesn't have code blocks. It does have code objects. These can be executed in the current scope, but they are inconvenient to create inside a program. The code must be stored as a string, compiled and then executed. >> > x = 3

>> > codeString = "print x

x = 7

"

>> > codeObject = compile ( codeString , '<CodeString>' , 'exec' )

>> > exec codeObject

3

>> > print x

7

>> > Functions store a code object representing the body of the function as the func_code attribute. For a reference on function attributes, see the function type. The byte-code contains instructions telling the interpreter how to load and store values. It is a combination of the function attributes and the byte-code, including code object attributes, that implement the scoping rules. You can't just execute the code object of a function: >> > x = 3

>> > def function ( ) :

. . . print x

. . . x = 7

. . .

>> > codeObject = function . func_code

>> > exec codeObject

Traceback ( most recent call last ) :

File "<stdin>" , line 1 , in ?

File "<stdin>" , line 2 , in function

UnboundLocalError : local variable 'x' referenced before assignment

>> > The co_freevars attribute of the code object contains a list of the variables from the enclosing scope used by the code object. Their are various other attributes like co_varnames which tell the interpreter how to load names. For a reference on code objects, see: Code Objects (Unofficial Reference Wiki). Code objects are immutable, or at least the interesting attributes are read only, so we can't just change the attributes we are interested in. We can create new code objects. The documentation doesn't seem to encourage this though : >> > x = 3

>> > from types import CodeType

>> > print CodeType . __doc__

code ( argcount , nlocals , stacksize , flags , codestring , constants , names ,

varnames , filename , name , firstlineno , lnotab [ , freevars [ , cellvars ] ] )



Create a code object . Not for the faint of heart .

>> > In order to implement code blocks I would like to take the code objects from a function and transform them into ones which can be executed in the current scope. There is an interesting recipe which transforms bytecodes and creates new code objects in this way: Implementing the make statement by hacking bytecodes. Luckily there is an easier way.

Byte-Codes There is a great module called Byteplay. This lets you manipulate byte-codes and create new code objects. Ideal for my purposes. It is also great for exploring byte-codes. Let's see what the byte-code looks like for some functions. The Python Byte Code Instructions comes in handy here. The following Python creates three code blocks and uses Byteplay to print out the names of the byte-codes operations. The three code blocks come from a function which is defined in the global scope, the same code (without the argument 'x') compiled from a string in the global scope, and a function defined inside another function. from byteplay import Code

from pprint import pprint



z = 1

def testFunction ( x ) :

y = 1

print x

print y

print z



print 'From Function:'

code = Code . from_code ( testFunction . func_code )

byteCode1 = code . code

pprint ( byteCode1 )



codeObject = compile ( """

y = 1

print y

print z""" , '<Summink>' , 'exec' )



print

print 'From current scope:'

code = Code . from_code ( codeObject )

byteCode2 = code . code

pprint ( byteCode2 )





def anotherScope ( ) :

z = 1

def testFunction ( x ) :

y = 1

print x

print y

print z

code = Code . from_code ( testFunction . func_code )

byteCode3 = code . code



return byteCode3



byteCode3 = anotherScope ( )



print

print 'Code defined in another scope, using a local rather than a global.'

pprint ( byteCode3 ) This prints out the following (you don't need to read it all) : From Function: [(SetLineno, 6), (LOAD_CONST, 1), (STORE_FAST, 'y'), (SetLineno, 7), (LOAD_FAST, 'x'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 8), (LOAD_FAST, 'y'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 9), (LOAD_GLOBAL, 'z'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)] From current scope: [(SetLineno, 2), (LOAD_CONST, 1), (STORE_NAME, 'y'), (SetLineno, 3), (LOAD_NAME, 'y'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 4), (LOAD_NAME, 'z'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)] Code defined in another scope, using a local rather than a global. [(SetLineno, 66), (LOAD_CONST, 1), (STORE_FAST, 'y'), (SetLineno, 67), (LOAD_FAST, 'x'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 68), (LOAD_FAST, 'y'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 69), (LOAD_DEREF, 'z'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)] In summary, this tells us: Store a local variable: STORE_FAST Load an argument: LOAD_FAST Load a variable local to function: LOAD_FAST Load a global: LOAD_GLOBAL Load a value from the enclosing scope: LOAD_DEREF Load a value from the same scope: LOAD_NAME Store a value in the same scope: STORE_NAME So in order to rescope a code block to execute in the current scope, we need to transform LOAD_FAST and LOAD_DEREF into LOAD_NAME , and STORE_FAST and STORE_DEREF (which we haven't seen here) into STORE_NAME .

Transforming Byte-codes The Byteplay module allows us to iterate over the opcodes. It stores them as a list of tuples. Because lists are mutable we can replace the byte-codes we are interested in. The Byteplay module also has a dictionary called opmap , which is a mapping of opcode names to their symbolic values. from byteplay import Code , opmap



LOAD_FAST = opmap [ 'LOAD_FAST' ]

STORE_FAST = opmap [ 'STORE_FAST' ]

LOAD_NAME = opmap [ 'LOAD_NAME' ]

STORE_NAME = opmap [ 'STORE_NAME' ]

LOAD_DEREF = opmap [ 'LOAD_DEREF' ]

STORE_DEREF = opmap [ 'STORE_DEREF' ]



def AnonymousCodeBlock ( function ) :

code = Code . from_code ( function . func_code )

newBytecode = [ ]

for opcode , arg in code . code :

if opcode in ( LOAD_FAST , LOAD_DEREF ) :

opcode = LOAD_NAME

elif opcode in ( STORE_FAST , STORE_DEREF ) :

opcode = STORE_NAME

newBytecode . append ( ( opcode , arg ) ) At the start of the function AnonymousCodeBlock we use Code.from_code to turn the function byte-code object into a Byteplay object. By the end, so far, we have a list newBytecode which holds our transformed bytecode. There is one more step. We need to turn this back into a code object, but one which executes in the current scope. This means that we need to set the freevars attribute to () (empty) and the newlocals attribute to False . code . code = newBytecode

code . newlocals = False

code . freevars = ( )

return code . to_code ( ) Because we're not interested in functions which take arguments, we ought to check the function we've been passed. inspect.getargspec makes this easy. The full AnonymousCodeBlock , looks like this. import inspect

from byteplay import Code , opmap



LOAD_FAST = opmap [ 'LOAD_FAST' ]

STORE_FAST = opmap [ 'STORE_FAST' ]

LOAD_NAME = opmap [ 'LOAD_NAME' ]

STORE_NAME = opmap [ 'STORE_NAME' ]

LOAD_DEREF = opmap [ 'LOAD_DEREF' ]

STORE_DEREF = opmap [ 'STORE_DEREF' ]



def AnonymousCodeBlock ( function ) :

argSpec = inspect . getargspec ( function )

if [ i for x in argSpec if x is not None for i in x ] :

raise TypeError ( "Function '%s' takes arguments" % function . func_name )



code = Code . from_code ( function . func_code )

newBytecode = [ ]

for opcode , arg in code . code :

if opcode in ( LOAD_FAST , LOAD_DEREF ) :

opcode = LOAD_NAME

elif opcode in ( STORE_FAST , STORE_DEREF ) :

opcode = STORE_NAME

newBytecode . append ( ( opcode , arg ) )

code . code = newBytecode

code . newlocals = False

code . freevars = ( )

return code . to_code ( )