Getting Started with OpenCL in Haskell

Introduction

OpenCL provides a way to interface with GPUs, CPUs, and other hardware to perform portable parallel processing. Over the last few weeks, I’ve found that Haskell has good support for OpenCL. For a beginner like myself, there are really three different things to learn about:

OpenCL execution model

OpenCL C dialect

Haskell bindings

If you’re brand new to OpenCL, I highly recommend this very short series on YouTube by Justin Hensley of AMD:

It has an early-90s Microsoft-instructional-video vibe, but aside from that, it’s really a great overview.

Which OpenCL package?

There are several OpenCL packages on Hackage, and it’s not particularly clear which you should use. I’ll try to summarise them here:

OpenCL - this is probably the package you want. It’s a fork of OpenCLRaw that provides a higher-level interface. This is the package I’ll use in my examples. (Thanks to Anthony Cowley for suggesting this package.)

OpenCLRaw - the original thin binding to the OpenCL C library. It exposes much of the API using types from Foreign.C.Types and, as a result, it’s not very convenient. The original homepage link from Hackage is also dead.

hopencl - a binding originally written by Benedict Gaster of AMD. I tried this package first, but creating OpenCL Context s didn’t work for me.

OpenCLWrappers - yet another fork of OpenCLRaw . It doesn’t seem to offer much beyond what the OpenCL package does. I haven’t tried using it however.

language-c-quote - not an OpenCL binding per-se, but a way to quasi-quote OpenCL C code. More on this later.

For the sake of completeness, these are the packages I’ll use in this example:

OpenCL - the OpenCL bindings

- the OpenCL bindings CLUtil - utilities built on OpenCL

- utilities built on language-c-quote - OpenCL C quasiquoting

- OpenCL C quasiquoting mainland-pretty - pretty-printing quasiquoted OpenCL C

- pretty-printing quasiquoted OpenCL C vector - indexed arrays for storing data

What about OpenCL versions?

It doesn’t seem to be necessary to match the OpenCL version on the machine you’re using with the OpenCL version targeted by a Haskell package. For instance, I’m writing this on a MacBook Pro with OpenCL 1.2, but I haven’t had any problems (yet) running the OpenCL package, which targets OpenCL 1.0.

The OpenCL C language is intended to be backwards compatible (this is mentioned in the spec), but I’m not sure to what extent this also extends to the runtime. I’ll report more on this in the future if I discover any important caveats.

OpenCL Native Libraries

MacOS has had its own OpenCL implementation since Snow Leopard (10.6), which works with the OpenCL Haskell package. If you’re using a different platform, I can’t provide any guidance, except to say that you’ll probably need to install something to provide an Installable Client Driver (ICD) for OpenCL.

Hello-World Example

The example I’ll cover in this post is taken from my haskell-opencl-examples project on GitHub. Specifically, this is example 01-hello-world/Main.hs.

Imports and Pragmas

I use the QuasiQuotes language extension to quasiquote an OpenCL C kernel.

This is the full list of imports. Everything is imported explicitly or in qualified form except for the Control.Parallel.OpenCL package.

Platforms, Devices and Contexts

The platform model of OpenCL centers around the notions of Platform, Device and Context:

Platform A set of OpenCL Devices available to a host; allows creation of Contexts. Device Something like a GPU or CPU. Context A group of Devices for computation.

Most interactions with OpenCL occur in IO . The source examples below indicate when they’re occuring inside a do block of IO .

Important functions for enumerating the platforms and devices are the following:

clGetPlatformIDs - lists OpenCL platforms

- lists OpenCL platforms clGetDeviceIDs - lists devices of a given type for a platform

- lists devices of a given type for a platform clGetPlatformInfo - information about a platform

- information about a platform clGetDeviceXXX - various information about a device

We can get an overview of the OpenCL environment of a machine with a function like describePlatforms below.

On my machine, this produces the following output:

Platform: Apple Device: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Device: Intel(R) HD Graphics 530 Device: AMD Radeon Pro 460 Compute Engine

Context and Queue Creation

In order to perform a computation, we need to create a context in which the computation will run. The context groups devices and allows creation of things like memory buffers, queues and compiled kernels. Contexts are created using either clCreateContext or clCreateContextFromType . In the example below, we’ll create a context for a CPU device using clCreateContextFromType .

OpenCL uses an asynchronous processing model. Commands are sent to a queue, and are then executed in a way that is determined by the OpenCL implementation. These enqueued operations can perform actions such as copying memory, executing kernels, and so on. Dependencies between enqueued actions are expressed by passing pointers to them at various points in the API. To create a queue for commands, we can use clCreateCommandQueue

If any errors occur, they will be thrown as exceptions in IO . The last parameter to clCreateContextFromType is a function of type String -> IO () , which is also used to report errors.

Kernel

The kernel is the OpenCL code that we’re going to execute. This example uses the language-c-quote package so that the kernel source can be quasi-quoted.

In this example, the kernel source is supplied to OpenCL as a String . The kernel has to be compiled by OpenCL at runtime into a form that can be executed on the hardware we’ve chosen.

Buffers

Data is sent from Haskell to OpenCL using a buffer. In this example, we’re going to use Storable Vectors (from the vector package) to hold data on the Haskell side. These are a good choice because they store data contigously under-the-hood.

In order to use Vectors easily with OpenCL, we’ll make use of the CLUtil library by Anthony Cowley et al.

Run the kernel

To run the compiled kernel, we hook up the buffers to the kernel arguments and then enqueue the kernel to be run.

Copy output data

At this point, the kernel has not necessary even started running. OpenCL has an asynchronous processing model where many execution details are left to the implementation. Consequently, it’s important to wait for computations to finish. In this case, we’ll use bufferToVector , from CLUtil , which internally waits for the kernel execution to complete. It knows the dependent operations because execEvent is passed in as a parameter. bufferToVector returns a new vector when it has finished, creating it from the bufOut output buffer.

And that’s it! We can print the outputData vector to confirm that the operation worked as expected and multiplied all the elements by 2.

Recap

This example covered: