Source code: Lib/multiprocessing/

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

The multiprocessing module also introduces APIs which do not have analogs in the threading module. A prime example of this is the Pool object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism). The following example demonstrates the common practice of defining such functions in a module so that child processes can successfully import that module. This basic example of data parallelism using Pool ,

from multiprocessing import Pool def f ( x ): return x * x if __name__ == '__main__' : with Pool ( 5 ) as p : print ( p . map ( f , [ 1 , 2 , 3 ]))

will print to standard output

[ 1 , 4 , 9 ]

The Process class¶ In multiprocessing , processes are spawned by creating a Process object and then calling its start() method. Process follows the API of threading.Thread . A trivial example of a multiprocess program is from multiprocessing import Process def f ( name ): print ( 'hello' , name ) if __name__ == '__main__' : p = Process ( target = f , args = ( 'bob' ,)) p . start () p . join () To show the individual process IDs involved, here is an expanded example: from multiprocessing import Process import os def info ( title ): print ( title ) print ( 'module name:' , __name__ ) print ( 'parent process:' , os . getppid ()) print ( 'process id:' , os . getpid ()) def f ( name ): info ( 'function f' ) print ( 'hello' , name ) if __name__ == '__main__' : info ( 'main line' ) p = Process ( target = f , args = ( 'bob' ,)) p . start () p . join () For an explanation of why the if __name__ == '__main__' part is necessary, see Programming guidelines.

Contexts and start methods¶ Depending on the platform, multiprocessing supports three ways to start a process. These start methods are spawn The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver. Available on Unix and Windows. The default on Windows and macOS. fork The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic. Available on Unix only. The default on Unix. forkserver When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork() . No unnecessary resources are inherited. Available on Unix platforms which support passing file descriptors over Unix pipes. Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess. See bpo-33725. Changed in version 3.4: spawn added on all unix platforms, and forkserver added for some unix platforms. Child processes no longer inherit all of the parents inheritable handles on Windows. On Unix using the spawn or forkserver start methods will also start a resource tracker process which tracks the unlinked named system resources (such as named semaphores or SharedMemory objects) created by processes of the program. When all processes have exited the resource tracker unlinks any remaining tracked object. Usually there should be none, but if a process was killed by a signal there may be some “leaked” resources. (Neither leaked semaphores nor shared memory segments will be automatically unlinked until the next reboot. This is problematic for both objects because the system allows only a limited number of named semaphores, and shared memory segments occupy some space in the main memory.) To select a start method you use the set_start_method() in the if __name__ == '__main__' clause of the main module. For example: import multiprocessing as mp def foo ( q ): q . put ( 'hello' ) if __name__ == '__main__' : mp . set_start_method ( 'spawn' ) q = mp . Queue () p = mp . Process ( target = foo , args = ( q ,)) p . start () print ( q . get ()) p . join () set_start_method() should not be used more than once in the program. Alternatively, you can use get_context() to obtain a context object. Context objects have the same API as the multiprocessing module, and allow one to use multiple start methods in the same program. import multiprocessing as mp def foo ( q ): q . put ( 'hello' ) if __name__ == '__main__' : ctx = mp . get_context ( 'spawn' ) q = ctx . Queue () p = ctx . Process ( target = foo , args = ( q ,)) p . start () print ( q . get ()) p . join () Note that objects related to one context may not be compatible with processes for a different context. In particular, locks created using the fork context cannot be passed to processes started using the spawn or forkserver start methods. A library which wants to use a particular start method should probably use get_context() to avoid interfering with the choice of the library user. Warning The 'spawn' and 'forkserver' start methods cannot currently be used with “frozen” executables (i.e., binaries produced by packages like PyInstaller and cx_Freeze) on Unix. The 'fork' start method does work.

Exchanging objects between processes¶ multiprocessing supports two types of communication channel between processes: Queues The Queue class is a near clone of queue.Queue . For example: from multiprocessing import Process , Queue def f ( q ): q . put ([ 42 , None , 'hello' ]) if __name__ == '__main__' : q = Queue () p = Process ( target = f , args = ( q ,)) p . start () print ( q . get ()) # prints "[42, None, 'hello']" p . join () Queues are thread and process safe. Pipes The Pipe() function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example: from multiprocessing import Process , Pipe def f ( conn ): conn . send ([ 42 , None , 'hello' ]) conn . close () if __name__ == '__main__' : parent_conn , child_conn = Pipe () p = Process ( target = f , args = ( child_conn ,)) p . start () print ( parent_conn . recv ()) # prints "[42, None, 'hello']" p . join () The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.

Synchronization between processes¶ multiprocessing contains equivalents of all the synchronization primitives from threading . For instance one can use a lock to ensure that only one process prints to standard output at a time: from multiprocessing import Process , Lock def f ( l , i ): l . acquire () try : print ( 'hello world' , i ) finally : l . release () if __name__ == '__main__' : lock = Lock () for num in range ( 10 ): Process ( target = f , args = ( lock , num )) . start () Without using the lock output from the different processes is liable to get all mixed up.

Sharing state between processes¶ As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes. However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so. Shared memory Data can be stored in a shared memory map using Value or Array . For example, the following code from multiprocessing import Process , Value , Array def f ( n , a ): n . value = 3.1415927 for i in range ( len ( a )): a [ i ] = - a [ i ] if __name__ == '__main__' : num = Value ( 'd' , 0.0 ) arr = Array ( 'i' , range ( 10 )) p = Process ( target = f , args = ( num , arr )) p . start () p . join () print ( num . value ) print ( arr [:]) will print 3.1415927 [ 0 , - 1 , - 2 , - 3 , - 4 , - 5 , - 6 , - 7 , - 8 , - 9 ] The 'd' and 'i' arguments used when creating num and arr are typecodes of the kind used by the array module: 'd' indicates a double precision float and 'i' indicates a signed integer. These shared objects will be process and thread-safe. For more flexibility in using shared memory one can use the multiprocessing.sharedctypes module which supports the creation of arbitrary ctypes objects allocated from shared memory. Server process A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies. A manager returned by Manager() will support types list , dict , Namespace , Lock , RLock , Semaphore , BoundedSemaphore , Condition , Event , Barrier , Queue , Value and Array . For example, from multiprocessing import Process , Manager def f ( d , l ): d [ 1 ] = '1' d [ '2' ] = 2 d [ 0.25 ] = None l . reverse () if __name__ == '__main__' : with Manager () as manager : d = manager . dict () l = manager . list ( range ( 10 )) p = Process ( target = f , args = ( d , l )) p . start () p . join () print ( d ) print ( l ) will print { 0.25 : None , 1 : '1' , '2' : 2 } [ 9 , 8 , 7 , 6 , 5 , 4 , 3 , 2 , 1 , 0 ] Server process managers are more flexible than using shared memory objects because they can be made to support arbitrary object types. Also, a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory.