Recently, I spent some time discussing traces with Marc Smith (we previously wrote two papers on the matter). Something that came up is the idea of representing a sequential composition of parallel processes as a parallel composition of sequential processes. Let me explain…

We, as programmers, often start off reasoning sequentially and then add on reasoning about parallel processes later — usually with much difficulty. Think of producing a control-flow graph for an imperative program, for example. If you have a sequential program, it is usually quite straightforward to form a flow graph for the program. When you try to add parallel processes, you need multiple flow graphs side by side. And they will interact, so really you need to link them together at the points of interaction. It quickly gets quite involved, and leaves you wishing that you didn’t have to involve concurrency.

What Marc and I found is the opposite: our reasoning was straightforward for parallel processes, but sequential composition was a headache. We were trying to represent the trace of a program as a tree, with each node representing either a parallel composition or a process without parallelism. In fact, due to the associativity and commutativity of parallel composition, this type of tree can always be flattened into an unstructured bag of parallel processes:

We then wanted to link processes in the tree that communicated with each other, to work out information about dependence and to look at neighbourhoods of connected processes. That was all fine, but gets quickly difficult when you need to consider sequential composition, specifically the sequential composition of parallel compositions.

Imagine that one of the tree nodes is first a program without parallel composition, and later becomes the parallel composition of two sub-processes, i.e. P = Q;(R || S). We don’t have a way to represent sequence in our neat parallel trees above, and this causes our representation to become awkward. We can’t have a simple ordered sequential list of trees, because if two processes have sequence information, they can proceed in either order, so really we would need a sequential lattice of trees, which is getting nasty.

At this point we wished we didn’t have to deal with sequential composition — at least, not sequential composition that can contain parallel composition. And via a small transformation, it turns out that we don’t. Consider the process: a; ((b; b’) || c); d, where everything in there is assumed to be a communication. We can refactor this into a set of parallel processes (P1 || P2 || P3):

P1 = a; par1start ; par1end ; d

P2 = par1start ; b; b’; par1end

P3 = par1start ; c ; par1end

The key insight is really this: if you are running multiple processes in parallel, it is the same if you start them at the outset of the program, then synchronise with them to start them, as if you set them going when you need them. So even though the process doing c won’t happen until a has, we don’t wait to set that process off partway through the program. We set that process (P3 above) off straight away, then send it a message (par1start) when we want to really set it going. The par1end message makes sure that P1 doesn’t do anything more until P2 and P3 have completed: all three processes synchronise together on the par1start and par1end messages.

With this transformation, we never have any parallel composition inside sequential composition in our traces. All we have is one giant parallel composition of every process in the system, where each process is a sequential composition of events. There is no more nesting than that; we have flattened the process network into an unstructured bag of lists of events. In fact, we can now represent the processes as a graph (with links formed by the communications: the dotted lines in the diagram above). The semantics of the process are unchanged, and all the information about the ordering dependencies is still there if you want it (embodied in the unique parstart/parend communications: the italicised events in the graph above). This form of representation actually makes it really nice to track dependencies and relations throughout the process network, by following the communication links.

Addendum

You could even go one step further, and transform the system into a collection of parallel processes, where each process had at most three events in sequence, where at most one of those would be an event from the original program, i.e. the above would become:

P1A = a ; seq1

P1B = seq1; par1start ; seq2

P1C = seq2; par1end; seq3

P1D = seq3; d

P2A = par1start; b; seq4

P2B = seq4; b’ ; par1end

P3 = par1start ; c ; par1end

I’m not sure how useful this is, though.