Python Programming, news on the Voidspace Python Projects and all things techie.

Django on Jython, Python Implementations and Performance

Django now runs on Jython which is great news. Jeff Hardy is also making progress running Django on IronPython. As usual the news sparked a plague of comments on Reddit. There seems to be a lot of confusion about the different implementation of Python, and which bits of CPython acts as the reference implementation. (Even Ruby is getting a language specification...)

CPython is the reference implementation but several aspects have been explicitly described as implementation details. These include:

Stack frames

Bytecode instructions

The Global Interpreter Lock

Reference counting for garbage collection

Jython and PyPy do use Python stack frames, and so tend to have less issues than IronPython when running Python applications that depend on certain obscure implementation details. (IronPython doesn't and is faster in consequence.)

PyPy has implemented the GIL (mainly as a matter of convenience) - Jython and IronPython don't have a GIL and can scale multi-threaded code across several CPU cores.

None of PyPy, Jython and IronPython use reference counting for garbage collection. This means faster garbage collection but non-deterministic calling of destructors - which would normally be called immediately the reference count drops to zero in CPython (it also means no uncollectable cycles either which can happen in CPython when you have cycles involving destructors).

IronPython uses native .NET strings, and so all strings are Unicode. In my experience this has made working with strings much more pleasant in IronPython - roll on Python 3. This also used to be the case with Jython, but I believe that Jython now has byte strings. This makes it easier to get Django running, as Django 1.0 uses the difference between byte-strings and Unicode strings to determine whether it is serving text or binary data.

IronPython does a lot of magic to allow you to store binary data in strings (it can still be a cause of bugs - but they are bugs and should be reported to the IronPython team), but you can't dispatch on type. This makes it questionable whether an unpatched Django will ever run on IronPython without some other flag (or way of patching in a compatible 'bytes' type implementation). Jeff certainly seems to be making good progress though.

A new page popped up recently on the Python wiki (relevant I promise):

This is my answer to the question Why is Python slower than xxx Language ?:

Python as a language is a set of rules (its syntax and semantics) and so doesn't have a 'speed'. Only a specific language implementation can have a measurable speed, and then we can only compare performance with a specific implementation of another language. In general you can't compare the speed of one language to another - you can only compare implementations.

Having said that, as a dynamic language Python will typically perform slower for specific benchmarks than standard implementations of some other languages (although it is faster than plenty of others). As a dynamic language a lot of information about the program can only be determined at runtime. This means that a lot of common compiler tricks, that rely on knowing the type of objects at compile time, can't work. Despite this there are a lot of things that can be done to improve the performance of dynamic languages (beyond the performance of statically typed languages many believe), several of which have been done before in virtual machines like Strongtalk and are being explored for Python in the PyPy JIT tracing compiler.

Generators, finally and Iterator Finalization

Raymond Chen has run a series of posts on the implementation of iterators (generators in Python speak) in C#. The C# compiler creates an inner class that acts as a state machine, which is nothing like as elegant as the Python implementation of course.

Today was part 3 in the series:

This entry concentrates on an additional place (to the expected) that a finally block can be entered: inside the finalizer of an iterator. I was intrigued, and discovered that the same is true in Python. If you have a generator with a finally block, and the iterator is garbage collected before the generator is exhausted, then the finally block is executed:

Python 2.5 .1 ( r251 : 54869 , Apr 18 2007 , 22 : 0 8 : 04 )

[ GCC 4.0 .1 ( Apple Computer , Inc . build 5367 ) ] on darwin

Type "help" , "copyright" , "credits" or "license" for more information .

>> > def f ( ) :

. . . try :

. . . for i in range ( 5 ) :

. . . yield i

. . . finally :

. . . print 'done'

. . .

>> > it = f ( )

>> > it . next ( )

0

>> > it . next ( )

1

>> > del it

done

This is the right decision of course. Something that is the wrong decision (in my opinion) is that if a finally block is entered because of an exception, and there is a return in the finally block then the exception is swallowed instead of being raised. Either a return in a finally should be disallowed (as it is in C#) or the exception should be raised.

The Python implementation of generators is particularly elegant because of the way functions / stack frames are implemented. As an overview... a stack frame has a code object associated with it. This has the bytecode sequence (as a byte-string) and a counter that points to the current bytecode instruction. Every time a new bytecode is executed the counter is incremented. When the function returns, nothing holds a reference to the stack frame anymore and it is garbage collected (actually they are expensive to create - so a pool of zombie stack frames is kept for reuse).

When you create a generator it holds a reference to the stack frame, and every time you call next execution continues at the next bytecode - until a yield or return is hit. The stack frame is kept alive until the generator is garbage collected.

Archives