In 1998, Ask Ars was an early feature of the newly-launched Ars Technica. Now, as then, it's all about your questions and our community's answers. Each week, we'll dig into our question bag, provide our own take, then tap the wisdom of our readers. To submit your own question, see our helpful tips page.

Question: What is a CPU thread (as in "multithreaded CPU," "simultaneous multithreading," etc.)?

Tech pundits, analysts, and reviewers often speak of "multithreaded" programs, or even "multithreaded processors," without ever defining what, exactly, a "thread" is. Truth be told, some of those using the term probably don't really know what it means, but the concept isn't really very hard to grasp. At least, it isn't hard when you look at it from the point of view of the CPU (the operating system definition of a "thread" is another matter).

From the CPU's perspective, a thread (short for "thread of execution") is merely an ordered sequence of instructions that tells the computer what to do. In most of my articles on Ars and in my book, I prefer to speak of "instruction streams" instead of "threads," because the thread is a more complicated and OS-centric concept. As far as most CPUs are concerned, they merely execute whatever instruction streams come into their front end, and they don't care if that instruction stream is from a process or a thread. There may be some special-purpose register values that differ between the two, but the basic functioning of the processor doesn't change.

So when someone talks about a "multithreaded processor," they're talking about a processor that can execute multiple instruction streams simultaneously. There are two ways that a processor can perform such a feat: simultaneous multithreading, and using multiple cores. Neither of these methods is mutually exclusive, and both are often used together.

Simultaneous multithreading (SMT) is a trick that lets the processor work more than one thread at a time. The front end of the processor alternates among the different threads in a form of time-sharing, fetching batches of instructions from one thread and then the other. The actual execution core of most multithreaded processors typically doesn't know or care which instruction stream a particular instruction comes from—the parts of the machine that do track which instruction goes with which thread will handle the chore of retiring the right instructions with the right stream.

The other way to make a multithreaded processor is to put more than one processor core on the same die. Each actively executing instruction stream is assigned to a single core, so a four-core processor can support four threads at once, or an eight-core processor can do eight threads at once, and so on.

The life of a thread

An instruction stream enters the CPU by being fetched into the processor's front end. The first time a particular stream of instructions is fetched, like when a new program is loaded, the instructions move from main memory into the processor's L1 cache. The front end then fetches instructions in batches from the L1 cache and decodes them into the processor's internal instruction format.

Once the instructions are decoded, they're ready to be dispatched to the chip's execution hardware, where the actual number-crunching happens. The execution units carry out the arithmetic and memory operations specified by the instructions, and write the results to the processor's registers.

In an out-of-order processor, where instructions are reordered to be executed in the fastest possible sequence, there's an additional step after execution. The instructions must be put back in program order, and their results written back to main memory.

When a new thread is loaded into the processor, the original thread's state is saved out to main memory and all of the original thread's instructions are removed from the pipeline. The new thread then begins at the fetch stage, and is decoded, dispatched, and retired as described above.