Almost every year we increase the number of CPU cores in our devices to increase overall performance and user experience. Having an eight-core phone is not a big deal nowadays. Although there exists another kind of programmable unit that is usually ignored by most of the programmers. It has a multitude of computing cores, hundreds of them, and it is a GPU or graphics processing unit responsible for drawing the user interface and handling 3D experience in games. From the very beginning of its existence, it was a highly specialized device designed just for transforming and rendering the given data, and there was an only one-way flow of data: from CPU to GPU.

However, since the arriving of Nvidia CUDA (Compute Unified Device Architecture) in 2007 and OpenCL (Open Computing Language) in 2009, the graphics processing units became accessible for general-purpose, bidirectional computations (called General-Purpose GPU Programming or simply GPGPU).

From my perspective as a .NET developer, it would be a great opportunity to have access to a huge computational power of hundreds GPU cores, so I tried to figure out what is the current state of the art in my domain.

First of all, what CUDA and OpenCL are and what are the differences between them?

Image 1. CUDA vs OpenCL

In general, they are APIs that allow a programmer to perform a specific set of computations on GPU (or even exotic devices like FPGA). It means that instead of rendering the result on display, the GPU will somehow return it to the API caller.

There are considerable differences between the two technologies.

Firstly, CUDA is a proprietary framework developed and supported only by Nvidia, while OpenCL is an open standard rather than a complete solution or concrete implementation. Therefore, CUDA is available only on Nvidia devices, while any manufacturer may support OpenCL (by the way, Nvidia chips support it as well).

Secondly, CUDA is a GPU-specific technology (at least now), while OpenCL interface may be implemented by various devices (CPU, GPU, FPGA, ALU, etc.).

These differences have obvious consequences:

CUDA is a little bit more performant than OpenCL on Nvidia chips;

You certainly can rely on consistency between CUDA documentation and implementation having a single manufacturer (Nvidia), which is not the case with OpenCL;

OpenCL is the only way to go if you have to support hardware other than Nvidia chips.

How it works

Image 2. CUDA processing flow

Let us describe how GPGPU works with the scheme represented in Image 2:

Form the data to be processed in RAM Copy processing data into video RAM Instruct GPU to process the data Execute in parallel on each core Copy the result back to RAM

It should be noted that this kind of general-purpose GPU computations is reasonably restricted:

they cannot perform any IO;

they cannot directly reference data in computer memory.

Even though it seems simple on the general scheme, the computation model and API are not that intuitive, especially considering the fact that the native API itself is available only in C and C++.

Methinks it is the main reason why the GPGPU programming is not really widespread yet.

GPGPU on the .NET platform

There is no native support of GPU programming on .NET platform yet, so we will have to rely on third-party solutions. Moreover, there are not that many options to choose from, so let us briefly review available alternatives among actively developing projects. Interestingly, most of them focus on Nvidia CUDA rather than OpenCL.

Alea GPU by QuantAlea

Alea GPU is a proprietary CUDA-based library featuring free and commercial editions. Having even a free community edition allows you to produce commercial GPU-ready software for the consumer-level graphics cards (which are Nvidia GeForce series).

The documentation is really nice, with the samples provided both in C# and F#, and it also features really nice supplemental images. I would say that Alea GPU is the most mature, well-documented and easy-to-use solution at the moment.

Also, it is cross-platform and compatible with .Net Framework and Mono.

Hybridizer by Atimesh

Hybridizer is another commercial CUDA-based library, but I would not say it is comparable with Alea GPU in the sense of usability. Firstly, it is free for educational purposes only (but requires having a license anyway). Secondly, the configuration of the application is really awkward, since it requires to have an additional C++ project containing the generated code, which can be compiled only by Visual Studio 2015.

ILGPU by Marcel Köster

ILGPU is an open-source CUDA-based library featuring nice documentation and samples. It is not as abstract and easy-to-use as Alea GPU, but anyway it is an impressive and solid product even being developed by a single person.

Compatible with both .Net Framework and .Net Core.

Campy by Ken Domino

Campy is another example of an interesting open-source library being developed by a single programmer. It is still in early beta, but promises to have a really high-level API. It is built upon .NET Core.

I tried to use each of the mentioned solutions, but Hybridizer appeared too awkward to configure, while Campy simply did not work on my hardware. Therefore, we will proceed with our evaluation with Alea GPU and ILGPU libraries.

Evaluation

To have a taste of GPGPU programming on .NET, we will implement a simple app that will transform a set of images applying a simple filter to them.

There are going to be three implementations for comparison:

Using standard Task Parallel Library of .NET Framework; Using Alea GPU; Using ILGPU.

Since both libraries use CUDA, we will have to have an Nvidia device. Fortunately, I’ve got one.

In general, my workstation is a mid-range PC with the following specs:

CPU: Intel Core i5-4460 (4 cores no Hyper-Threading, 3.20 GHz base clock speed);

GPU: Nvidia Geforce GTX 1050 Ti (768 CUDA Cores, 4 GB GDDR5 VRAM, 1290 MHz Clock base clock speed);

RAM: 32 GB DDR3;

Storage: Samsung SSD 850 EVO 250 GB (which is not really necessary);

OS: Windows 10 Pro;

Before we proceed, we will have to install the CUDA Toolkit (required by ILGPU, not AleaGPU) from the official web site: https://developer.nvidia.com/cuda-downloads

Both libraries are cross-platform, but since Alea GPU is not yet adapted for .NET Core, we will create a Windows-based console app using the last .Net Framework installed on my workstation (which is 4.7.1).

I will use the following Nuget packages:

Install-Package Alea - Version 3.0.4

Install-Package FSharp.Core - Version 4.5.0

Install-Package ILGPU - Version 0.3.0

Install-Package SixLabors.ImageSharp - Version 1.0.0-beta0004

FSharp.Core is required by Alea GPU, because it is built upon it.

ImageSharp is a nice cross-platform image processing library, which will simplify the process of reading and saving the image data for us.

General program flow

Our program is going to be quite straightforward consisting of the following steps: