The Challenge

As a rough order of magnitude, Giles Thomas (co-founder of PythonAnywhere) estimates that there are between 1.8 and 4.3 million Python developers in the world.

So how hard can it be to find a Python developer? Well, not very hard at all if the goal is just to find someone who can legitimately list Python on their resume. But if the goal is to find a Python guru who has truly mastered the nuances and power of the language, then the challenge is most certainly a formidable one.

First and foremost, a highly-effective recruiting process is needed, as described in our post In Search of the Elite Few – Finding and Hiring the Best Developers in the Industry. Such a process can then be augmented with targeted questions and techniques, such as those provided here, that are specifically geared toward ferreting out Python virtuosos from the plethora of some-level-of-Python-experience candidates.

Python Guru or Snake in the Grass?

So you’ve found what appears to be a strong Python web developer. How do you determine if he or she is, in fact, in the elite top 1% of candidates that you’re looking to hire? While there’s no magic or foolproof technique, there are certainly questions you can pose that will help determine the depth and sophistication of a candidate’s knowledge of the language. A brief sampling of such questions is provided below.

It is important to bear in mind, though, that these sample questions are intended merely as a guide. Not every “A” candidate worth hiring will be able to properly answer them all, nor does answering them all guarantee an “A” candidate. At the end of the day, hiring remains as much of an art as it does a science.

Python in the Weeds…

While it’s true that the best developers don’t waste time committing to memory that which can easily be found in a language specification or API document, there are certain key features and capabilities of any programming language that any expert can, and should, be expected to be well-versed in. Here are some Python-specific examples:

Q: Why use function decorators? Give an example.

A decorator is essentially a callable Python object that is used to modify or extend a function or class definition. One of the beauties of decorators is that a single decorator definition can be applied to multiple functions (or classes). Much can thereby be accomplished with decorators that would otherwise require lots of boilerplate (or even worse redundant!) code. Flask, for example, uses decorators as the mechanism for adding new endpoints to a web application. Examples of some of the more common uses of decorators include adding synchronization, type enforcement, logging, or pre/post conditions to a class or function.

Q: What are lambda expressions, list comprehensions and generator expressions? What are the advantages and appropriate uses of each?

Lambda expressions are a shorthand technique for creating single line, anonymous functions. Their simple, inline nature often – though not always – leads to more readable and concise code than the alternative of formal function declarations. On the other hand, their terse inline nature, by definition, very much limits what they are capable of doing and their applicability. Being anonymous and inline, the only way to use the same lambda function in multiple locations in your code is to specify it redundantly.

List comprehensions provide a concise syntax for creating lists. List comprehensions are commonly used to make lists where each element is the result of some operation(s) applied to each member of another sequence or iterable. They can also be used to create a subsequence of those elements whose members satisfy a certain condition. In Python, list comprehensions provide an alternative to using the built-in map() and filter() functions.

As the applied usage of lambda expressions and list comprehensions can overlap, opinions vary widely as to when and where to use one vs. the other. One point to bear in mind, though, is that a list comprehension executes somewhat faster than a comparable solution using map and lambda (some quick tests yielded a performance difference of roughly 10%). This is because calling a lambda function creates a new stack frame while the expression in the list comprehension is evaluated without doing so.

Generator expressions are syntactically and functionally similar to list comprehensions but there are some fairly significant differences between the ways the two operate and, accordingly, when each should be used. In a nutshell, iterating over a generator expression or list comprehension will essentially do the same thing, but the list comprehension will create the entire list in memory first while the generator expression will create the items on the fly as needed. Generator expressions can therefore be used for very large (and even infinite) sequences and their lazy (i.e., on demand) generation of values results in improved performance and lower memory usage. It is worth noting, though, that the standard Python list methods can be used on the result of a list comprehension, but not directly on that of a generator expression.

Q: Consider the two approaches below for initializing an array and the arrays that will result. How will the resulting arrays differ and why should you use one initialization approach vs. the other?

>>> # INITIALIZING AN ARRAY -- METHOD 1 ... >>> x = [[1,2,3,4]] * 3 >>> x [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]] >>> >>> >>> # INITIALIZING AN ARRAY -- METHOD 2 ... >>> y = [[1,2,3,4] for _ in range(3)] >>> y [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]] >>> >>> # WHICH METHOD SHOULD YOU USE AND WHY?

While both methods appear at first blush to produce the same result, there is an extremely significant difference between the two. Method 2 produces, as you would expect, an array of 3 elements, each of which is itself an independent 4-element array. In method 1, however, the members of the array all point to the same object. This can lead to what is most likely unanticipated and undesired behavior as shown below.

>>> # MODIFYING THE x ARRAY FROM THE PRIOR CODE SNIPPET: >>> x[0][3] = 99 >>> x [[1, 2, 3, 99], [1, 2, 3, 99], [1, 2, 3, 99]] >>> # UH-OH, DON’T THINK YOU WANTED THAT TO HAPPEN! ... >>> >>> # MODIFYING THE y ARRAY FROM THE PRIOR CODE SNIPPET: >>> y[0][3] = 99 >>> y [[1, 2, 3, 99], [1, 2, 3, 4], [1, 2, 3, 4]] >>> # THAT’S MORE LIKE WHAT YOU EXPECTED! ...

Q: What will be printed out by the second append() statement below?

>>> def append(list=[]): ... # append the length of a list to the list ... list.append(len(list)) ... return list ... >>> append(['a','b']) ['a', 'b', 2] >>> >>> append() # calling with no arg uses default list value of [] [0] >>> >>> append() # but what happens when we AGAIN call append with no arg?

When the default value for a function argument is an expression, the expression is evaluated only once, not every time the function is called. Thus, once the list argument has been initialized to an empty array, subsequent calls to append without any argument specified will continue to use the same array to which list was originally initialized. This will therefore yield the following, presumably unexpected, behavior:

>>> append() # first call with no arg uses default list value of [] [0] >>> append() # but then look what happens... [0, 1] >>> append() # successive calls keep extending the same default list! [0, 1, 2] >>> append() # and so on, and so on, and so on... [0, 1, 2, 3]

Q: How might one modify the implementation of the ‘append’ method in the previous question to avoid the undesirable behavior described there?

The following alternative implementation of the append method would be one of a number of ways to avoid the undesirable behavior described in the answer to the previous question:

>>> def append(list=None): ... if list is None: list = [] # append the length of a list to the list ... list.append(len(list)) ... return list ... >>> append() [0] >>> append() [0]

Q: How can you swap the values of two variables with a single line of Python code?

Consider this simple example:

>>> x = 'X' >>> y = 'Y'

In many other languages, swapping the values of x and y requires that you to do the following:

>>> tmp = x >>> x = y >>> y = tmp >>> x, y ('Y', 'X')

But in Python, makes it possible to do the swap with a single line of code (thanks to implicit tuple packing and unpacking) as follows:

>>> x,y = y,x >>> x,y ('Y', 'X')

Q: What will be printed out by the last statement below?

>>> flist = [] >>> for i in range(3): ... flist.append(lambda: i) ... >>> [f() for f in flist] # what will this print out?

In any closure in Python, variables are bound by name. Thus, the above line of code will print out the following:

[2, 2, 2]

Presumably not what the author of the above code intended!

A workaround is to either create a separate function or to pass the args by name; e.g.:

>>> flist = [] >>> for i in range(3): ... flist.append(lambda i = i : i) ... >>> [f() for f in flist] [0, 1, 2]

Q: What are the key differences between Python 2 and 3?

Although Python 2 is formally considered legacy at this point, its use is still widespread enough that is important for a developer to recognize the differences between Python 2 and 3.

Here are some of the key differences that a developer should be aware of:

Text and Data instead of Unicode and 8-bit strings. Python 3.0 uses the concepts of text and (binary) data instead of Unicode strings and 8-bit strings. The biggest ramification of this is that any attempt to mix text and data in Python 3.0 raises a TypeError (to combine the two safely, you must decode bytes or encode Unicode, but you need to know the proper encoding, e.g. UTF-8) This addresses a longstanding pitfall for naïve Python programmers. In Python 2, mixing Unicode and 8-bit data would work if the string happened to contain only 7-bit (ASCII) bytes, but you would get UnicodeDecodeError if it contained non-ASCII values. Moreover, the exception would happen at the combination point, not at the point at which the non-ASCII characters were put into the str object. This behavior was a common source of confusion and consternation for neophyte Python programmers.

print function. The print statement has been replaced with a print() function

statement has been replaced with a function xrange – buh-bye. xrange() no longer exists ( range() now behaves like xrange() used to behave, except it works with values of arbitrary size)

no longer exists ( now behaves like used to behave, except it works with values of arbitrary size) API changes: zip() , map() and filter() all now return iterators instead of lists dict.keys() , dict.items() and dict.values() now return “views” instead of lists dict.iterkeys() , dict.iteritems() and dict.itervalues() are no longer supported

Comparison operators. The ordering comparison operators ( < , <= , >= , > ) now raise a TypeError exception when the operands don’t have a meaningful natural ordering. Some examples of the ramifications of this include: Expressions like 1 < '' , 0 > None or len <= len are no longer valid None < None now raises a TypeError instead of returning False Sorting a heterogeneous list no longer makes sense – all the elements must be comparable to each other

, , , ) now raise a exception when the operands don’t have a meaningful natural ordering. Some examples of the ramifications of this include:

More details on the differences between Python 2 and 3 are available here.

Q: Is Python interpreted or compiled?

As noted in Why Are There So Many Pythons?, this is, frankly, a bit of a trick question in that it is malformed. Python itself is nothing more than an interface definition (as is true with any language specification) of which there are multiple implementations. Accordingly, the question of whether “Python” is interpreted or compiled does not apply to the Python language itself; rather, it applies to each specific implementation of the Python specification.

Further complicating the answer to this question is the fact that, in the case of CPython (the most common Python implementation), the answer really is “sort of both”. Specifically, with CPython, code is first compiled and then interpreted. More precisely, it is not precompiled to native machine code, but rather to bytecode. While machine code is certainly faster, bytecode is more portable and secure. The bytecode is then interpreted in the case of CPython (or both interpreted and compiled to optimized machine code at runtime in the case of PyPy).

Q: What are some alternative implementations to CPython? When and why might you use them?

One of the more prominent alternative implementations is Jython, a Python implementation written in Java that utilizes the Java Virtual Machine (JVM). While CPython produces bytecode to run on the CPython VM, Jython produces Java bytecode to run on the JVM.

Another is IronPython, written in C# and targeting the .NET stack. IronPython runs on Microsoft’s Common Language Runtime (CLR).

As also pointed out in Why Are There So Many Pythons?, it is entirely possible to survive without ever touching a non-CPython implementation of Python, but there are advantages to be had from switching, most of which are dependent on your technology stack.

Another noteworthy alternative implementation is PyPy whose key features include:

Speed. Thanks to its Just-in-Time (JIT) compiler, Python programs often run faster on PyPy.

Memory usage. Large, memory-hungry Python programs might end up taking less space with PyPy than they do in CPython.

Compatibility. PyPy is highly compatible with existing python code. It supports cffi and can run popular Python libraries like Twisted and Django.

Sandboxing. PyPy provides the ability to run untrusted code in a fully secure way.

Stackless mode. PyPy comes by default with support for stackless mode, providing micro-threads for massive concurrency.

Q: What’s your approach to unit testing in Python?

The most fundamental answer to this question centers around Python’s unittest testing framework. Basically, if a candidate doesn’t mention unittest when answering this question, that should be a huge red flag.

unittest supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework. The unittest module provides classes that make it easy to support these qualities for a set of tests.

Assuming that the candidate does mention unittest (if they don’t, you may just want to end the interview right then and there!), you should also ask them to describe the key elements of the unittest framework; namely, test fixtures, test cases, test suites and test runners.

A more recent addition to the unittest framework is mock. mock allows you to replace parts of your system under test with mock objects and make assertions about how they are to be used. mock is now part of the Python standard library, available as unittest.mock in Python 3.3 onwards.

The value and power of mock are well explained in An Introduction to Mocking in Python. As noted therein, system calls are prime candidates for mocking: whether writing a script to eject a CD drive, a web server which removes antiquated cache files from /tmp, or a socket server which binds to a TCP port, these calls all feature undesired side-effects in the context of unit tests. Similarly, keeping your unit-tests efficient and performant means keeping as much “slow code” as possible out of the automated test runs, namely filesystem and network access.

[Note: This question is for Python developers who are also experienced in Java.]

Q: What are some key differences to bear in mind when coding in Python vs. Java?

Disclaimer #1. The differences between Java and Python are numerous and would likely be a topic worthy of its own (lengthy) post. Below is just a brief sampling of some key differences between the two languages.

Disclaimer #2. The intent here is not to launch into a religious battle over the merits of Python vs. Java (as much fun as that might be!). Rather, the question is really just geared at seeing how well the developer understands some practical differences between the two languages. The list below therefore deliberately avoids discussing the arguable advantages of Python over Java from a programming productivity perspective.

With the above two disclaimers in mind, here is a sampling of some key differences to bear in mind when coding in Python vs. Java:

Dynamic vs static typing. One of the biggest differences between the two languages is that Java is restricted to static typing whereas Python supports dynamic typing of variables.

Static vs. class methods. A static method in Java does not translate to a Python class method. In Python, calling a class method involves an additional memory allocation that calling a static method or function does not. In Java, dotted names (e.g., foo.bar.method) are looked up by the compiler, so at runtime it really doesn’t matter how many of them you have. In Python, however, the lookups occur at runtime, so “each dot counts”.

Method overloading. Whereas Java requires explicit specification of multiple same-named functions with different signatures, the same can be accomplished in Python with a single function that includes optional arguments with default values if not specified by the caller.

Single vs. double quotes. Whereas the use of single quotes vs. double quotes has significance in Java, they can be used interchangeably in Python (but no, it won’t allow beginnning the same string with a double quote and trying to end it with a single quote, or vice versa!).

Getters and setters (not!). Getters and setters in Python are superfluous; rather, you should use the ‘property’ built-in (that’s what it’s for!). In Python, getters and setters are a waste of both CPU and programmer time.

Classes are optional. Whereas Java requires every function to be defined in the context of an enclosing class definition, Python has no such requirement.

Indentation matters… in Python. This bites many a newbie Python programmer.

The Big Picture

An expert knowledge of Python extends well beyond the technical minutia of the language. A Python expert will have an in-depth understanding and appreciation of Python’s benefits as well as its limitations. Accordingly, here are some sample questions that can help assess this dimension of a candidate’s expertise:

Q: What is Python particularly good for? When is using Python the “right choice” for a project?

Although likes and dislikes are highly personal, a developer who is “worth his or her salt” will highlight features of the Python language that are generally considered advantageous (which also helps answer the question of what Python is “particularly good for”). Some of the more common valid answers to this question include:

Ease of use and ease of refactoring, thanks to the flexibility of Python’s syntax, which makes it especially useful for rapid prototyping.

More compact code, thanks again to Python’s syntax, along with a wealth of functionally-rich Python libraries (distributed freely with most Python language implementations).

A dynamically-typed and strongly-typed language, offering the rare combination of code flexibility while at the same time avoiding pesky implicit-type-conversion bugs.

It’s free and open source! Need we say more?

With regard to the question of when using Python is the “right choice” for a project, the complete answer also depends on a number of issues orthogonal to the language itself, such as prior technology investment, skill set of the team, and so on. Although the question as stated above implies interest in a strictly technical answer, a developer who will raise these additional issues in an interview will always “score more points” with me since it indicates an awareness of, and sensitivity to, the “bigger picture” (i.e., beyond just the technology being employed). Conversely, a response that Python is always the right choice is a clear sign of an unsophisticated developer.

Q: What are some drawbacks of the Python language?

For starters, if you know a language well, you know its drawbacks, so responses such as “there’s nothing I don’t like about it” or “it has no drawbacks” are very telling indeed.

The two most common valid answers to this question (by no means intended as an exhaustive list) are:

The Global Interpreter Lock (GIL). CPython (the most common Python implementation) is not fully thread safe. In order to support multi-threaded Python programs, CPython provides a global lock that must be held by the current thread before it can safely access Python objects. As a result, no matter how many threads or processors are present, only one thread is ever being executed at any given time. In comparison, it is worth noting that the PyPy implementation discussed earlier in this article provides a stackless mode that supports micro-threads for massive concurrency.

Execution speed. Python can be slower than compiled languages since it is interpreted. (Well, sort of. See our earlier discussion on this topic.)

Wrap Up

The questions and tips presented herein can be extremely valuable aids in identifying true Python development masters. We hope you find them to be a useful foundation for “separating the wheat from the chaff” in your quest for the elite few among Python software developers. Yet it is important to remember that these are merely intended as tools to be incorporated into the larger context of your overall recruiting toolbox and strategy.

And, for those who may have mistakenly read this guide hoping to learn how to capture a reptile (sorry dude, wrong kind of python!), we recommend instead checking out the Wildlife Foundation of Florida’s Python Challenge.