During the walk-through in Part 1 - Hello CUDA, you might be puzzled by my constant calling of the current-context! function. Shouldn't once be enough? Or, even cleaner and more functional, wouldn't it be better to just supply the context as an argument to context-aware functions? Yes, it would be much better, especially in more complex systems. However, early design decisions in popular technologies are here to stay as legacy, and we do not have a choice.

The designers of CUDA tried to design an API that appeals to what people are used to. CPU programs rarely need arguments such as context, device, etc. As you have seen in the Hello CUDA article, a function call that programmers like is (launch! fun params) , not (launch! ctx device stream something-else fun params) . Most C++ programs are not as heavily multithreaded as Clojure programs are, while managing extra arguments everywhere is an immediately obvious annoyance. However, in more dynamic threading setups, these side effects require constant nannying.

To a functional programmer, that mistake seems obvious. An additional parameter or two are much easier to deal with, especially with Clojure's macros, than having to constantly worry about switching the current context to the right one for the actual thread. In the C++ world, it is still not so obvious! OpenCL takes the right approach by supplying context as an argument to functions that need it, but that leads to the API being a bit more verbose than CUDA, and it is one of the main sources for complaints related to OpenCL!

I guess you've got the point by now, so I'll stop complaining about CUDA's sinful ways, and get on to how to handle contexts.

Here is how the CUDA context handling was envisioned. The program has one or more threads, and is handling one or more contexts, while functions that need to be executed in the context do not receive that information as an argument. The function receives the current context from the thread-local storage that CUDA driver maintains for each thread.

This leads to the following situations:

One thread manages one context: this is the case in most example programs; CUDA seems easy and simple. One thread manages many contexts: the programmer needs to keep references and keep switching the current context whenever he wants to switch the GPU that will execute the trailing code. Many threads manage one context: the programmer needs to set the current context in each thread before CUDA function calls. Many threads manage many contexts: there will be a bunch of references and a bunch of setting the current context calls. Some of the previous cases, with a catch that the code does not control the thread it is executing in! Example: core.async!

As calling current-context! replaces the context that may have previously been current, it may not be enough for cases beyond simple programs. CUDA driver maintains a context stack for each thread. There are functions push-context! and pop-context! that can be used to put the context on top of the stack, making it current, and, after the work in that context has been done, remove the context from the top, reverting the current context to the one that was at the top previously. While current-context! is completely destructive, push/pop offer a bit more gracious mechanism. However, this mechanism still relies on side effects, and great care should be taken when working in multithreaded setups that use thread-pooled executors.

A small demonstration:

( do ( def new-ctx ( context ( device 0 ) ) ) ( current-context! ctx ) ( = ctx ( current-context ) ) )

true

( do ( push-context! new-ctx ) ( = new-ctx ( current-context ) ) )

true

( do ( pop-context! ) ( = ctx ( current-context ) ) )

true

The CUDA functions that work inside the context will always work with the top context in the current context stack of the thread.