...making Linux just a little more fun!

Understanding Threading in Python

By Krishna G Pai

When programming, in any language, the capability to spawn worker threads is integral to the performance of any application. Whether it be running a separate thread to handle user interaction in a GUI app, while running a potentially blocking process in the background (like your browser is doing now), threading is essential. This document attempts to show what is possible and what not while Threading in Python.

1.Why Threading in Python?

Let us say you write, in Python, a nifty utility that lets you filter your mail.

You build a GUI Frontend using PyGTK. Now if you embed the filter code in the frontend, you risk making the application unresponsive (you still have a dial up connection, and any server interaction entails a considerable waiting time). Since you don't work at Microsoft, you decide this is unacceptable and thus you start a separate thread each time you want to filter your mail.

Thus threads increase the responsiveness of your programs. Threads also increase efficiency and speed of a program, not to mention the algorithmic simplicity.

Combined with the power of python, this makes programming in python very attractive indeed.

2.The Basics

Let us first see how to start a simple thread. Threading is supported via the thread and threading modules. These modules are supposed to be optional, but if you use an OS that doesn't support threading, you'd better switch to Linux.

The code given below runs a simple thread in the background. (Text version)

#!/usr/bin/env python import time import thread def myfunction(string,sleeptime,*args): while 1: print string time.sleep(sleeptime) #sleep for a specified amount of time. if __name__=="__main__": thread.start_new_thread(myfunction,("Thread No:1",2)) while 1:pass

We start a new thread by using the start_new_thread() function which takes the address of the object to be run, along with arguments to be passed to the object, which are passed in a tuple.

2.1 Locks

Now that we have one thread running, running multiple threads is as simple as calling start_new_thread() multiple times. The problem now would be to synchronize the many threads which we would be running. Synchronization is done using a Lock object. Locks are created using the allocate_lock() factory function.

Locks are used as mutex objects, and are used for handling critical sections of code. A thread enters the critical section by calling the acquire() method, which can either be blocking or non-blocking. A thread exits the critical section, by calling the release() method.

The following listing shows how to use the Lock object. (Text version)

#!/usr/bin/env python import time import thread def myfunction(string,sleeptime,lock,*args): while 1: #entering critical section lock.acquire() print string," Now Sleeping after Lock acquired for ",sleeptime time.sleep(sleeptime) print string," Now releasing lock and then sleeping again" lock.release() #exiting critical section time.sleep(sleeptime) # why? if __name__=="__main__": lock=thread.allocate_lock() thread.start_new_thread(myfunction,("Thread No:1",2,lock)) thread.start_new_thread(myfunction,("Thread No:2",2,lock)) while 1:pass

The code given above is fairly straight forward. We call lock.acquire() just before entering the critical section and then call lock.release() to exit the critical section.

The inquisitive reader now may be wondering why we sleep after exiting the critical section.

Let us examine the output of the above listing.

Output.

Thread No:2 Now Sleeping after Lock acquired for 2 Thread No:2 Now releasing lock and then sleeping again Thread No:1 Now Sleeping after Lock acquired for 2 Thread No:1 Now releasing lock and then sleeping again Thread No:2 Now Sleeping after Lock acquired for 2 Thread No:2 Now releasing lock and then sleeping again Thread No:1 Now Sleeping after Lock acquired for 2 Thread No:1 Now releasing lock and then sleeping again Thread No:2 Now Sleeping after Lock acquired for 2

Here every thread is given an opportunity to enter the critical section. But the same cannot be said if we remove time.sleep(sleeptime) from the above listing.

Output without time.sleep(sleeptime)

Thread No:1 Now Sleeping after Lock acquired for 2 Thread No:1 Now releasing lock and then sleeping again Thread No:1 Now Sleeping after Lock acquired for 2 Thread No:1 Now releasing lock and then sleeping again Thread No:1 Now Sleeping after Lock acquired for 2 Thread No:1 Now releasing lock and then sleeping again Thread No:1 Now Sleeping after Lock acquired for 2 Thread No:1 Now releasing lock and then sleeping again Thread No:1 Now Sleeping after Lock acquired for 2 Thread No:1 Now releasing lock and then sleeping again Thread No:1 Now Sleeping after Lock acquired for 2 Thread No:1 Now releasing lock and then sleeping again Thread No:1 Now Sleeping after Lock acquired for 2 Thread No:1 Now releasing lock and then sleeping again Thread No:1 Now Sleeping after Lock acquired for 2

Why does this happen? The answer lies in the fact that Python is not fully threadsafe. Unlike Java, where threading was considered to be so important that it is a part of the syntax, in Python threads were laid down at the altar of Portability.

In fact the documentation reads:

Not all built-in functions that may block waiting for I/O allow other threads to run. (The most popular ones ( time.sleep() , file.read() , select.select() ) work as expected.)

, , ) work as expected.) It is not possible to interrupt the acquire() method on a lock -- the KeyboardInterrupt exception will happen after the lock has been acquired.

What this means is that quite probably any code like the following:

while 1: lock.acquire() ..... #some operation ..... lock.release()

will cause starvation of one or more threads.

3. The Global Interpreter Lock

Currently, The Python Interpreter (Python 2.3.4) is not thread safe. There are no priorities, no thread groups. Threads cannot be stopped and suspended, resumed or interrupted. That is, the support provided is very much basic. However a lot can still be accomplished with this meager support, with the use of the threading module, as we shall see in the following sections. One of the main reasons is that in actuality only one thread is running at a time. This is because of some thing called a Global Interpreter Lock (GIL). In order to support multi-threaded Python programs, there's a global lock that must be held by the current thread before it can safely access Python objects. Without the lock competing threads could cause havoc, for example: when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice. Thus only the thread that has acquired the GIL may operate on Python Objects or call Python C API functions.

In order to support multi threaded Python programs the interpreter regularly releases and reacquires the lock, by default every 10 bytecode instructions. This can however be changed using the sys.setcheckinterval() function. The lock is also released and reacquired around potentially blocking I/O operations like reading or writing a file, so that other threads can run while the thread that requests the I/O is waiting for the I/O operation to complete.

In particular note:

C extensions can release the GIL.

Blocking I/O can release the GIL.

The Python Interpreter keeps some book keeping info per thread, for which it uses a data structure called PyThreadState . Earlier the state was stored in global variables and switching threads could cause problems. In particular, exception handling is now thread safe when the application uses sys.exc_info() to access the exception last raised in the current thread. There's one global variable left, however: the pointer to the current PyThreadState structure. While most thread packages have a way to store ``per-thread global data,'' Python's internal platform independent thread abstraction doesn't support this yet. Therefore, the current thread state must be manipulated explicitly. The global interpreter lock is used to protect the pointer to the current thread state. When releasing the lock and saving the thread state, the current thread state pointer must be retrieved before the lock is released (since another thread could immediately acquire the lock and store its own thread state in the global variable). Conversely, when acquiring the lock and restoring the thread state, the lock must be acquired before storing the thread state pointer

------------------------------------------------------------------ Global Thread | Global Thread | Global Thread | Global Thread | Pointer 1 | Pointer 2 | Pointer 2 | Pointer 2 | ------------------------------------------------------------------ ^ ^ ^ ^ | | | | | | | | | | | | ------------------------------------------------ | | | Global Interpreter Lock | | | ------------------------------------------------ ^ ^ ^ ^ | | | | | | | | Thread No Thread No Thread No Thread No 1 2 3 4

4.Using the Threading Module

Python manages to get a lot done using so little. The Threading module uses the built in thread package to provide some very interesting features that would make your programming a whole lot easier. There are in built mechanisms which provide critical section locks, wait/notify locks etc. In particular we shall look at:

Using the Thread object

Profiling the Threaded code

Using the Condition, Event, and Queue object.

4.1 Using the Threading Library

The major Components of the Threading module are:

Lock object

RLock object

Semaphore Object

Condition Object

Event Object

Thread Object

While we have visited the Lock object in the previous sections, the RLock object is something new. RLock provides a mechanism for a thread to acquire multiple instances of the same lock, each time incrementing the depth of locking when acquiring and decrementing the depth of locking when releasing. RLock makes it very easy to write code which conforms to the classical Readers Writers Problem. The Semaphore Object (rather the Semaphore Object Factory) is the general implementation of the Semaphore mooted by Dijikstra. We shall understand the implementation of the Condition, Event and Thread Objects via some examples.

4.2 The Thread Object

The Thread Object is a wrapper to the start_new_thread() function, which we saw earlier, but with a little more functionality. The Thread object is never used directly, but only by subclassing the threading.Thread interface. The user is supposed then to override possibly the __init__() or run() function. Do not override the start() function, or provide more than one argument to run . Note that you are supposed to call Thread.__init__() if you are overriding __init__() .

Let us see a simple example:

#!/usr/bin/env python #simple code which uses threads import time from threading import Thread class MyThread(Thread): def __init__(self,bignum): Thread.__init__(self) self.bignum=bignum def run(self): for l in range(10): for k in range(self.bignum): res=0 for i in range(self.bignum): res+=1 def test(): bignum=1000 thr1=MyThread(bignum) thr1.start() thr1.join() if __name__=="__main__": test()

There are 2 things to note here, the thread does not start running until the start() method is called, and that join() makes the calling thread wait until the thread has finished execution.

So Far, So Good! However being ever curious we wonder wether there are any performance gains in using threads.

4.3 Profiling Threaded Code.

It is the practice of every good programmer to profile his code, to find out his weak spots, his strengths, and in general to know his inner soul ;-). And since we are dealing with the Tao of Threading in Python, we might as well as ask ourselves which is the faster, two threads sharing the load or one heavy duty brute force thread?

Which is faster? thread1 thread2 -------- --------- for i in range(bignum): for i in range(bignum): for k in range(bignum): for k in range(bignum): res+=i res+=i or? thread 3 -------------------- for i in range(bignum): for k in range(bignum): res+=i for i in range(bignum): for k in range(bignum): res+=i

Following the way of the masters we make no assumptions, and let the code do the talking. Generally there are 2 ways to profile code in Python, the most common and comprehensive way would be to use the profile.run() method, or time the execution of the code using time.clock(). We shall do both. Consider the listing shown below.

#!/usr/bin/env python #Let us profile code which uses threads import time from threading import Thread class MyThread(Thread): def __init__(self,bignum): Thread.__init__(self) self.bignum=bignum def run(self): for l in range(10): for k in range(self.bignum): res=0 for i in range(self.bignum): res+=1 def myadd_nothread(bignum): for l in range(10): for k in range(bignum): res=0 for i in range(bignum): res+=1 for l in range(10): for k in range(bignum): res=0 for i in range(bignum): res+=1 def thread_test(bignum): #We create 2 Thread objects for the 2 threads. thr1=MyThread(bignum) thr2=MyThread(bignum) thr1.start() thr2.start() thr1.join() thr2.join() def test(): bignum=1000 #Let us test the threading part starttime=time.clock() thread_test(bignum) stoptime=time.clock() print "Running 2 threads took %.3f seconds" % (stoptime-starttime) #Now run without Threads. starttime=time.clock() myadd_nothread(bignum) stoptime=time.clock() print "Running Without Threads took %.3f seconds" % (stoptime-starttime) if __name__=="__main__": test()

Profiling Threaded Code in Python

We get some surprising results, notably the following..

Running 2 threads took 0.000 seconds Running Without Threads took 5.160 seconds

Being ever sceptically we try to profile the threaded code by running, profile.run('test()') . What we see seems to add credence to the results achieved earlier.

42 function calls in 5.170 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 5.170 5.170 :1(?) 2 0.000 0.000 0.000 0.000 prof3.py:10(__init__) 1 5.170 5.170 5.170 5.170 prof3.py:24(myadd_nothread) 1 0.000 0.000 0.000 0.000 prof3.py:38(thread_test) 1 0.000 0.000 5.170 5.170 prof3.py:50(test) 1 0.000 0.000 5.170 5.170 profile:0(prof3.test()) 0 0.000 0.000 profile:0(profiler) 2 0.000 0.000 0.000 0.000 threading.py:147(Condition) 2 0.000 0.000 0.000 0.000 threading.py:152(__init__) 1 0.000 0.000 0.000 0.000 threading.py:180(_release_save) 1 0.000 0.000 0.000 0.000 threading.py:183(_acquire_restore) 1 0.000 0.000 0.000 0.000 threading.py:186(_is_owned) 1 0.000 0.000 0.000 0.000 threading.py:195(wait) 2 0.000 0.000 0.000 0.000 threading.py:356(_newname) 2 0.000 0.000 0.000 0.000 threading.py:373(__init__) 2 0.000 0.000 0.000 0.000 threading.py:387(_set_daemon) 4 0.000 0.000 0.000 0.000 threading.py:39(__init__) 2 0.000 0.000 0.000 0.000 threading.py:402(start) 6 0.000 0.000 0.000 0.000 threading.py:44(_note) 2 0.000 0.000 0.000 0.000 threading.py:468(join) 2 0.000 0.000 0.000 0.000 threading.py:507(isDaemon) 5 0.000 0.000 0.000 0.000 threading.py:609(currentThread)

As I said it seems to add credence, until we notice that run() isn't even called! What does this teach us? Apart from distrusting the profiler, never follow any mechanical way of testing your code. The reason why the output is misleading and the profiler silent is because both time.clock() and the profiler measure the time spent in running the current thread.

How then do we measure time taken by the two threads? We use the time.time() function.

But doesn't time.time() give the absolute time, you ask? What about Context switches? True, but since we are only interested in the measuring the total time taken and not the work distribution, we can ignore context switches (and indeed the code has been structured to ignore context switches).

The correct way to profile the code is given in the following listing: Correct Way to Profile the Code.

The results we now obtain are more realistic and reliable.

Running 2 threads took 5.125 seconds Running Without Threads took 5.137 seconds

4.4 Condition, Event and Queue Objects.

As we can see there is no significant difference between threaded and non threaded apps.

Conditions are a way of synchronizing access between multiple threads, which wait for a particular condition to be true to start any major processing. Condition Objects are a very elegant mechanism by which it is possible to implent the Producer Consumer Problem. Indeed this is true for anything in Python. Conditions take a lock object or if none is supplied creates its own RLock object. A Thread waits for a particular condition to be true by using the wait() function, while it signals another thread by using the notify() or notifyAll() method.

Let us see how the classical Producer Consumer Problem is solved using this.

#!/usr/bin/env python #Let us profile code which uses threads import thread import time from threading import * class itemQ: def __init__(self): self.count=0 def produce(self,num=1): self.count+=num def consume(self): if self.count: self.count-=1 def isEmpty(self): return not self.count class Producer(Thread): def __init__(self,condition,itemq,sleeptime=1): Thread.__init__(self) self.cond=condition self.itemq=itemq self.sleeptime=sleeptime def run(self): cond=self.cond itemq=self.itemq while 1 : cond.acquire() #acquire the lock print currentThread(),"Produced One Item" itemq.produce() cond.notifyAll() cond.release() time.sleep(self.sleeptime) class Consumer(Thread): def __init__(self,condition,itemq,sleeptime=2): Thread.__init__(self) self.cond=condition self.itemq=itemq self.sleeptime=sleeptime def run(self): cond=self.cond itemq=self.itemq while 1: time.sleep(self.sleeptime) cond.acquire() #acquire the lock while itemq.isEmpty(): cond.wait() itemq.consume() print currentThread(),"Consumed One Item" cond.release() if __name__=="__main__": q=itemQ() cond=Condition() pro=Producer(cond,q) cons1=Consumer(cond,q) cons2=Consumer(cond,q) pro.start() cons1.start() cons2.start() while 1: pass

Producer Consumer Listing in Python

Here the currentThread() function returns the id of the currently running thread. Note that wait() has an optional argument specifying the seconds it should wait before it times out. I would discourage this use because they used a polling mechanism to implement this, and according to the source code we poll atleast 20 times ever second. Once again I would like to point out how not to use the Condition Object. Consider the following, borrowed from Python-2.3.4/Libs/threading.py

def notify(self, n=1): currentThread() # for side-effect assert self._is_owned(), "notify() of un-acquire()d lock" __waiters = self.__waiters waiters = __waiters[:n] if not waiters: if __debug__: self._note("%s.notify(): no waiters", self) return self._note("%s.notify(): notifying %d waiter%s", self, n, n!=1 and "s" or "") for waiter in waiters: waiter.release() try: __waiters.remove(waiter) except ValueError: pass def notifyAll(self): self.notify(len(self.__waiters)) Python-2.3.4/Lib/threading.py

threading.py

cons1=Consumer(cond,q,sleeptime=1) cons2=Consumer(cond,q,sleeptime=1)

Whatdoes is to maintain a list of threads waiting on the current condition. It then attempts to notify them by removing the first n waiters from the list. And since it removes the same first n waiters every time, this can potentially cause starvation to certain threads. So to test our theory out, let us modify the Producer consumer listing given above by making the following changes:This will potentially cause starvation to one of the threads depending upon how they are inserted into the list. In fact a test run by making the above changes shows this to be the case. Thus you will have to be carefull to potential pitfalls in using Python Threading. Notice that there is no point in calling notifyAll() since it is inefficient and should be avoided.

By now you must have a pretty good idea about how to go about threading in Python. For closure I shall briefly describe the Event and Queue Objects and describe how to use them.

The Event Object is actually a thin wrapper on the Condition Object so that we don't have to mess about with locks. The methods provided are very much self explanatory: set() clear() isSet() wait(timeout) . One thing to note is that the Event Object uses notifyAll() , thus use it only when necessary.

The code given is a simple example of the event object. event.py

Although Queues dont come under the Threading module, they do provide an easy interface which should be suitable to sovle most problems. The main advantage of Queue is that it doesn't implement the threading module. Thus you can instead use it with the thread module. Queues are a simple and efficient way of implementing stack, priority queue etc. Since it handles both data protection and synchronization. The methods used are put(item,block) get(block) Queue(maxsize) qsize() empty() full() .

A simple example using Queues is given in the following listing. q.py

References

I suggest you read the following for more info:

Happy Hacking ! ;-}