I’ve been using Python for a while. Recently I have noted some nuances, wonders and counter-intuitive things I ran into. The list grew surprisingly fast.

Disclaimer: Most of the problems that I list here can be understood and explained. It’s just my opinion that something is odd, so forgive me if I raise something that, in your opinion, is not an issue at all.

It’s less apparent, but still reasonable to point out that the select module is inconsistent. It specifies methods, some of which take seconds, others take milliseconds as a parameter.

I’m often in doubt about the behaviour of get-like methods, due to inconsistent behaviour. Currently some of them raise exception, others just return None. Why None?

Unfortunately pprint is not special and has to be used with brackets. Hopefully this issue was addressed by Python 3 – print will also always require brackets.

I’d love to use pprint.pprint in the same way print is used.

This often appears in conjunction with print:

Constructing tuples is misleading for beginners. For example foo(1,2) is way different than foo((1,2)). On the other hand foo(1) is the same as foo((1)).

All my friends who learn Python have a problem with sort. Apparently <list>.sort() returns None, which causes a lot confusion. What they need is the builtin sorted.

Internals

Circular imports It’s not a surprise that Python doesn’t handle circular imports. But what does that actually mean? Let’s create two files: a.py and b.py. Let’s import b from a and a from b: $ cat a.py var_1 = 1 import b var_2 = 2 print “Hello world from module a” print “Imported b, b.var_1=%r b.var_2=%r” % ( getattr (b, “var_1″ , None ), getattr (b, “var_2″ , None )) $ cat b.py var_1 = 1 import a var_2 = 2 print “Hello world from module b” print “Imported a, a.var_1=%r a.var_2=%r” % ( getattr (a, “var_1″ , None ), getattr (a, “var_2″ , None )) Think for a while about what result you would expect. $ python -c "import a" Hello world from module 'b'. Imported a, a.var_1=1 a.var_2=None Hello world from module 'a'. Imported b, b.var_1=1 b.var_2=2 What actually happened? We requested module a. Module a runs.

a requests module b. Flow goes to b.

b requests a. Python understands that it’s actually in the middle of creating module a, and gives back the reference to half-loaded namespace from module a.

Module b prints out a.var_1, which has the correct value, but a.var_2 is not set yet, so it defaults to None.

After that everything continues normally.

Module naming and side effects It’s often forgotten that direct imports from inside the module, like import a imports something way different than global import import blah.a. For example, if you created a file /tmp/a.py: $ cd /tmp; PYTHONPATH=.. python >>> import a >>> import tmp.a The commands are importing different module from Python’s point of view. If a.py has any side effects, they will be executed twice. A common bug is to use local paths from inside the module, while encouraging users to use global module paths from outside. This leads to double imports. So, if you’re importing local files from inside a python module, consider the syntax with dot: # Imagine we’re in a module *tmp*, in a file *b.py*: import a # bad, import is different than *tmp.a* import tmp.a # better, but we can’t rename the module easily from . import a # perfect! imports file a.py from _this_ module.

What do imports import? If I need a.b.c.d() , Python requires me to understand which of the parts describe a module, which describe a file inside module, which describe a class and which a function. In this case I could assume that a.b.c are modules and d() is a global function: >>> from a.b.c import d >>> d() But that can be wrong! a can be a module and b.c.d() can describe a class, sub class and sub sub class. >>> from a import b >>> b.c.d() That’s not all. In normal cases lacking a proper import causes an exception. Sometimes it doesn’t… My favourite example is os.path. I still don’t know if I should import os or os.path. Both versions work: >>> import os >>> os.path.devnull ‘/dev/null’ >>> import os.path >>> os.path.devnull ‘/dev/null’

Module reload Apparently Python does allow you to dynamically reload modules.

That’s a pretty neat feature. But in practice it’s not very useful –

modules are usually imported from global namespaces and local imports

from inside the code are considered slow.

Loading order Python uses several mechanisms for loading modules: System modules are loaded from /usr/lib/python2.5 .

. Other are in /usr/lib/python2.5/site-packages .

. There’s also /usr/share/python-support .

. And /usr/lib/python-support .

. I haven’t yet mentioned eggs .

. And eggs have *.pth files. Python Eggs are really dirty. Install a few of them and run: >>> import sys >>> sys.path [ '' , '/usr/lib/python2.5/site-packages/multiprocessing-2.6.2.1-py2.5-linux-x86_64.egg' , '/usr/lib/python2.5/site-packages/amqplib-0.6.1-py2.5.egg' , ...] Yes! Eggs are injected into the loading paths, polluting your system Python installation, and hurting Python startup time. Btw. I tried to force Python to use eggs from my home directory:

it’s painful. Not to mention the problems there are with

platform-specific eggs.

999+1 is not 1000 The well-known “feature” of Python integers is that they don’t play nicely with is operator. Internally, small integers are reused objects and is, which checks object memory location, works fine. Greater integers are created as new objects every time, so is fails. >>> 1 is 1 True >>> 1000 is 1000 True >>> 999 + 1 is 1000 False >>> 2 + 1 is 3 True To make things even worse the behaviour changes over python versions.

For example 100+1 is 101 returns True in Python 2.5 but False

in Python 2.4.

The order of unpacking Have you ever wondered what is the order during tuple unpacking? >>> _, _, _ = 1 , 2 , 3 >>> _ 3 On the other hand this syntax is not allowed in function declarations. Strange. >>> def a (_, _, _): print _ … File “<stdin>” , line 1 SyntaxError: duplicate argument ‘_’ in function definition Speaking of function declarations, there’s a nice feature that allows you to define named parameters before unnamed ones, although this syntax works only for function definitions, not for usage: >>> def foo (a= 1 , b= 2 , *args, **kwargs): … print “a=%r b=%r args=%r kwargs=%r” % (a,b,args,kwargs … >>> foo( 4 , 5 , 6 ) a= 4 b= 5 args=( 6 ,) kwargs={} >>> foo(a= 4 , b= 5 , 6 ) # I would expect this to work! File “<stdin>” , line 1 SyntaxError: non-keyword arg after keyword arg

Python has deterministic garbage collection Unlike other dynamic languages, Python uses reference counting as a garbage collection mechanism. During normal execution objects are freed right in the moment where they lose the last reference. This means that while the program runs Python shouldn’t have any unexpected hiccups!

Unlike Java or Erlang, Python can run predictably smoothly. Am I saying that Python is a proper realtime language, and that you could use it in

a medical ventilator? Well, Python does have advanced garbage collection

but it’s only used to free cyclic references. With proper programming discipline you can avoid creating reference loops. Oh, and please

do avoid setting __del__ destructors, as Python can’t free reference loops with objects that define them.