Our business is largely around making software faster. For that we use OpenCL, but do you know what this programming language is? Why can’t this speeding-up be done using other languages like Java, C#, C++ or Python?

OpenCL the answer to high-level languages, where we were promised superfast software that was very quick to write. After 20 years this was still a promise, as compilers had to guess too much what was intended. OpenCL gives the programmer more control in the places where more control is needed to get high-performing code and leave less guesses for the compiler.

It’s C with some extra power

It’s like normal C with three extra concepts, all with the aim to make the software run faster.

Explicit Data Transfer

In other introductions to OpenCL the data-transfers are mentioned as one of the last parts, but I find this the most important one. Reason: in most cases this is the main bottleneck in performance-targeted code.

When moving your stuff to another house, you pack all in boxes first before loading the truck. Or would you load each item into the truck one-by-one? Transport-costs would be much higher that way.

While it would be great that the fastest data-transfers should be done automatically, it simply doesn’t work like that. This means that designing the data-transfers is an important task when making fast software. OpenCL lets you do this.

Multiple cores

Most people have heard of “cores”, as made famous by Intel. Each core can do a part of a computation and effectively reduce runtime. OpenCL implements this by isolating the code that runs on each core – what goes in and out the protected code is done explicitly. This way the code is really easy to scale up to thousands of cores.

Would you choose the best-in-class to write the multiplication tables from 1 to 20, or have each student write one of them? Even though the slowest student will limit the rest, the total time is still lower.

Where a normal processor has 1, 2, 4 or 8 cores, a graphics processor has hundreds or even thousands of cores. OpenCL-software works on both.

Vectors

Modern processors can do computations on more than one data-item at the same time. They can be described as sub-cores. This means that each core has parallelism on its own.

When reading, do you read one word at once or character by character? Your brains can parse multiple characters at the same time.

OpenCL has support for “vectors” ( bundles of alike data) to be able to program these sub-cores.

It runs on many types of devices

OpenCL is famous for being the standard programming model for a lot of modern processors. There is no other programming language that can do the same. Support is available on:

CPUs; standard processors by Intel, AMD and ARM

GPUs; graphics cards by Intel, AMD and NVIDIA

FPGAs; processors that are programmed on the hardware-level, by Altera and Xilinx.

DSPs; digital signal processors by TI

Mobile graphics processors by ARM, Imagination, Qualcomm, etc.

See the rest of the list here.

This means that code can be ported to new devices in days or weeks instead of having to rewrite everything from scratch.

How does translating to OpenCL work?

When software needs to be faster, the first step is to find out its bottlenecks – these “hot spots” will be ported to OpenCL, while the rest remains the same. Then comes the hardest part: changing the algorithms such that data-transfers are more efficient and all cores are used. The last step is to look into low-level optimisations like the vectors.

Above is a very simplified representation of OpenCL. Still you’ve seen that the language is very unique and powerful. That will change, as its concepts are slowly getting embedded into existing languages – till then OpenCL is the only standard which fully enables all hardware features.