[pypy-dev] STM

Hi all, (Andrew: I'll answer your mail too, but this is independent) Sorry if this mail is more general than just pypy-dev, but I'll post it here at least at first. When thinking more about STM (and also after reading a paper that cfbolz pointed me to), I would like to argue that the currently accepted way to add STM to languages is slightly bogus. So far, the approach is to take an existing language, like Python or C, and add a keyword like "transaction" that delimits a nested scope. You end up with syntax like this (example in Python): def foo(): before() with transaction: during1() during2() after() In this example, "during1(); during2()" is the content of the transaction. But the issue with this approach is that there is no way to structure the transactions differently. What if I want a transaction that starts somewhere, and ends at some unrelated place? This is of course related to the way "we" (I, but others before me) described how transactions could be applied to the CPython interpreter. When you think about the C source of CPython, there is a precise point at which the previous transaction should stop and the next one immediately start (namely, between bytecodes). But there is no way to express this using the syntax above. What would be needed is something like (example in C): void mainloop() { while (1) { dispatch_next_bytecode(); call_stuff_between_bytecodes(); } } void call_stuff_between_bytecodes() { ...; /* here, we would like the previous transaction to stop and the next one to start */ ...; } The above issue occurs when trying to apply STM to CPython, but I believe the issue to be much more general. The solution proposed with the "transaction" keyword assumes a generally-non-STM program into which we carefully insert STM here and there. This is reasonable from the point of view of performance, because running a STM transaction is costly. But if we ignore this point of performance (for example, by assuming we have a hybrid HTM-STM-based approach that was carefully tweaked for years, with the GC and the JIT that are aware of it), then this argument no longer applies. The solution that we really need for CPython requires a different point of view: a generally-STM program, in which we carefully add here and there a "yield", i.e. a point at which it's ok to end the current transaction and start the next one. While this is obvious in the case of the CPython interpreter, I'd argue that it is a useful point of view in general. The paper mentioned above says that the current approach to multithreading is completely non-deterministic, with a large collection of tools to help us "patch" it here and there to make it more deterministic --- i.e. multithreaded programs behave randomly, and you need to carefully use locks, semaphores, and so on, in order to get back something that behaves as you want, hopefully in all cases. That approach is, according to this paper, the reason for which programming with multiple threads is so hard (and I tend to agree). The paper argues in favour of a would-be system that would be the other way around: deterministic with explicit non-determinism added. So, taking this point of view and making it more concrete, I'd like to argue that what we really need is not a nested "transaction" keyword, in C or in Python or in any language. That would be just another way to add some amount of determinism to a inherently non-deterministic approach. Instead, we need by default all threads to run always in a coarse-grained transaction; and using a "yield" keyword or function call, we would as needed make the transactions more fine-grained. (I ignore here the issue of I/O.) In other words, this would be a step back to the old world of cooperative multithreading: each thread needs to call "yield" regularly, otherwise it never allows other threads to run. The difference with the old world of cooperative multithreading is that nowadays we have STM --- "never allows other threads to run" has become a figure of speach, a way to reason about programs *as if* they were really switching between threads only at the points where we call "yield". What *really* occurs is that each thread runs a transaction between two yield points, so each thread can actually run concurrently with the others, and the only issue is that if the threads actually do conflict, then the performance degrades as some threads need to repeat their action. This gives a model where we can write a multithreaded application by starting with almost no "yield" points, and then, *if performance requires it*, we can carefully add extra yield points to reduce conflicts. In other words, it is a model in which we start writing multithreaded applications using just plain old (rather deterministic) cooperative multithreading, and then, as needs dictate, we insert some extra points of non-determinism in order to boost performance. Following that reasoning, I no longer want to go for a PyPy version exposing a "with transaction" context manager. Instead, it would rather be something like a function to call at any point to "yield" control --- or, in the idea that it's better to avoid having yields at random unexpected points, maybe an alternative solution that looks more promising: to start a new thread, the programmer calls a function "start_transactional_thread(g)" where g is a generator; when g yields, it marks the end of a transaction and the start of the next one. (The exact yielded value should be None.) The difference with the previous solution is that you cannot yield from a random function; your function has to have some structure, namely be a generator, in order to use this yield. Also, it turns out that this model is easy to write a non-STM implementation for, for debugging and testing: _running = [] def start_transactional_thread(g): _running.append(g) def run(): "Call this in the main thread to run and wait for all started sub-threads" while _running: i = random.choice(len(_running)) g = _running.pop(i) try: g.next() _running.append(g) except StopIteration: pass (Interestingly, this run() function is similar to stackless.run(). The exact relationships between stackless and this model are yet to investigate, but it seems that one point of view on what I'm proposing might be essentially "use stackless, remove some determinism from the already-unclear tasklet switching order, and add STM under the hood".) A bientôt, Armin.