Due to variation between operating systems and the way OS courses are taught, some programmers may have an outdated mental model about the difference between processes and threads in Linux. Even the name "thread" suggests something extremely lightweight compared to a heavy "process" - a mostly wrong intuition.

In fact, for the Linux kernel itself there's absolutely no difference between what userspace sees as processes (the result of fork ) and as threads (the result of pthread_create ). Both are represented by the same data structures and scheduled similarly. In kernel nomenclature this is called tasks (the main structure representing a task in the kernel is task_struct), and I'll be using this term from now on.

In Linux, threads are just tasks that share some resources, most notably their memory space; processes, on the other hand, are tasks that don't share resources. For application programmers, proceses and threads are created and managed in very different ways. For processes there's a slew of process-management APIs like fork , wait and so on. For threads there's the pthread library. However, deep in the guts of these APIs and libraries, both processes and threads come into existence through a single Linux system call - clone .

The clone system call We can think of clone as the unifying implementation shared between processes and threads. Whatever perceived difference there is between processes and threads on Linux is achieved through passing different flags to clone . Therefore, it's most useful to think of processes and threads not as two completely different concepts, but rather as two variants of the same concept - starting a concurrent task. The differences are mostly about what is shared between this new task and the task that started it. Here is a code sample demonstrating the most important sharing aspect of threads - memory. It uses clone in two ways, once with the CLONE_VM flag and once without. CLONE_VM tells clone to share the virtual memory between the calling task and the new task clone is about to create . As we'll see later on, this is the flag used by pthread_create : static int child_func ( void * arg ) { char * buf = ( char * ) arg ; printf ( "Child sees buf = \" %s \"

" , buf ); strcpy ( buf , "hello from child" ); return 0 ; } int main ( int argc , char ** argv ) { // Allocate stack for child task. const int STACK_SIZE = 65536 ; char * stack = malloc ( STACK_SIZE ); if ( ! stack ) { perror ( "malloc" ); exit ( 1 ); } // When called with the command-line argument "vm", set the CLONE_VM flag on. unsigned long flags = 0 ; if ( argc > 1 && ! strcmp ( argv [ 1 ], "vm" )) { flags |= CLONE_VM ; } char buf [ 100 ]; strcpy ( buf , "hello from parent" ); if ( clone ( child_func , stack + STACK_SIZE , flags | SIGCHLD , buf ) == - 1 ) { perror ( "clone" ); exit ( 1 ); } int status ; if ( wait ( & status ) == - 1 ) { perror ( "wait" ); exit ( 1 ); } printf ( "Child exited with status %d. buf = \" %s \"

" , status , buf ); return 0 ; } Some things to note when clone is invoked: It takes a function pointer to the code the new task will run, similarly to threading APIs, and unlike the fork API. This is the glibc wrapper for clone . There's also a raw system call which is discussed below. The stack for the new task has to be allocated by the parent and passed into clone . The SIGCHLD flag tells the kernel to send the SIGCHLD to the parent when the child terminates, which lets the parent use the plain wait call to wait for the child to exit. This is the only flag the sample passes into clone by default. This code sample passes a buffer into the child, and the child writes a string into it. When called without the vm command-line argument, the CLONE_VM flag is off, and the parent's virtual memory is copied into the child. The child sees the message the parent placed in buf , but whatever it writes into buf goes into its own copy and the parent can't see it. Here's the output: $ ./clone-vm-sample Child sees buf = "hello from parent" Child exited with status 0. buf = "hello from parent" But when the vm argument is passed, CLONE_VM is set and the child task shares the parent's memory. Its writing into buf will now be observable from the parent: $ ./clone-vm-sample vm Child sees buf = "hello from parent" Child exited with status 0. buf = "hello from child" A bunch of other CLONE_* flags can specify other things that will be shared with the parent: CLONE_FILES will share the open file descriptors, CLONE_SIGHAND will share the signal dispositions, and so on. Other flags are there to implement the semantics required by POSIX threads. For example, CLONE_THREAD asks the kernel to assign the same thread group id to the child as to the parent, in order to comply with POSIX's requirement of all threads in a process sharing a single process ID .

Calling clone in process and thread creation Let's dig through some code in glibc to see how clone is invoked, starting with fork , which is routed to __libc_fork in sysdeps/nptl/fork.c . The actual implementation is specific to the threading library, hence the location in the nptl folder. The first thing __libc_fork does is invoke the fork handlers potentially registered beforehead with pthread_atfork . The actual cloning happens with: pid = ARCH_FORK (); Where ARCH_FORK is a macro defined per architecture (exact syscall ABIs are architecture-specific). For x86_64 it maps to: #define ARCH_FORK() \ INLINE_SYSCALL (clone, 4, \ CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, 0, \ NULL, &THREAD_SELF->tid) The CLONE_CHILD_* flags are useful for some threading libraries (though not the default on Linux today - NPTL). Otherwise, the invocation is very similar to the clone code sample shown in the previous section. You may wonder where is the function pointer in this call. Nice catch! This is the raw call version of clone , where execution continues from the point of the call in both parent and child - close to the usual semantics of fork . Now let's turn to pthread_create . Through a dizzying chain of macros it reaches a function named create_thread (defined in sysdeps/unix/sysv/linux/createthread.c ) that calls clone with: const int clone_flags = ( CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM | CLONE_SIGHAND | CLONE_THREAD | CLONE_SETTLS | CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | 0 ); ARCH_CLONE ( & start_thread , STACK_VARIABLES_ARGS , clone_flags , pd , & pd -> tid , tp , & pd -> tid ) Browse through man 2 clone to understand the flags passed into the call. Briefly, it is asked to share the virtual memory, file system, open files, shared memory and signal handlers with the parent thread/process. Additional flags are passed to implement proper identification - all threads launched from a single process have to share its process ID to be POSIX compliant. Reading the glibc source code is quite an exercise in mental resilience, but it's really interesting to see how everything fits together "in the real world".