System configuration

I have a Intel Core i3 processor with NVIDIA GeForce 710M and 4 GB RAM running on Windows 10 64-bit. I also have Visual Studio 2012 where I will be configuring OpenCL SDK. Same guide can be used for AMD GPUs too with some variations like location of SDK folders.

1. Getting required drives and SDK

Two things are needed here. First, OpenCL runtime for your graphics card. It can be achieved by simply updating NVIDIA’s graphics card. Secondly, OpenCL SDK is needed for compiling OpenCL code. NVIDIA has hidden them under its CUDA toolkit. So, install CUDA toolkit and you will get OpenCL SDK too.

2. Setting up Visual Studio

I have Visual Studio 2012, hence, the configurations are based on that only. Create a new Visual Studio C++ application (any template). Under src, create a new C file by name of main.c. Similarly, create a kernel file by the name of kernel.cl. main.c will contain the host code. kernel.cl will contain the kernel to be executed.

3. Project configurations

OpenCL is suggested to run on 64-bit configurations. However, the created solution would be in 32–bit only. To fix this, right click on the project > Choose Properties in the context menu. A Property Pages window will open.

Click on Configuration Manager on right. In the Configuration Manager window, select <New..> from Active solution platform dropdown menu. In the New Solution Platform window, choose x64 as new platform and copy settings option as Win32. This will make the project targeted for 64-bit build.

Select x64 as new platform for 64-bit builds

For OpenCL config, go to C/C++ > General page. For Additional Include Directories , point to include folder inside your CUDA toolkit installation folder.

Property Page

Other tutorials on the internet suggest than instead of full path, one can also provide environmental variable $(CUDA_INC_PATH) here( It is automatically created when the toolkit is installed). I, however, found that on doing, Visual Studio’s IntelliSense and auto-complete features will fail to recognise OpenCL code, and mark whole codebase with errors(even though compilation would be successful). Hence, it is best to provide direct path to the include folder instead as done above.

Next in Property Pages, go to Linker > Input and add OpenCL.lib to Additional dependencies.

Add OpenCL.lib to Additional dependencies

Lastly, in Linker > General option page, add environmental variable $(CUDA_LIB_PATH) for Additional Library Directories. This variable contains the path to the directory containing OpenCL.dll(It is also automatically created with CUDA toolkit installation).

Setting path to OpenCL.dll

With this, we have completed configurations. Now, to code.

4. Adding code to the project

Since, this is not a OpenCL programming tutorial, hence, I would suggest you to copy main.c code from here and kernel.cl code from here. Since, the original main.c picks up code from vector_add_kernel.cl but we have kernel.cl, hence change the filename in main.c

30: fp = fopen("vector_add_kernel.cl", "r"); replace it with 30: fp = fopen("kernel.cl", "r");

5. Running the program

Press F5 to compile and run the program. If no compilation error, it will create a Debug build of the program and launch it.

A quick reminder here that since it is VS2012, it only supports C89 formats. Hence, if your code is incompatible with C89, it will throw error. It has been fixed from VS2013 though. Many examples on the internet simply fail to compile because of this reason. Hence, this example has been selected for showcasing here because it is C89 compliant.

A command prompt will open up quickly, print some messages and exit quickly. If you want to prevent the command prompt window from closing down you can do the following.

Open the Property pages of the project again. Goto Linker > System page. Change the SubSystem property to Console (/SUBSYSTEM:CONSOLE).

Config to prevent command prompt from closing down automatically

Now, try to launch the program again as a Release build by pressing Ctrl + F5. The program will launch in command prompt but will not exit automatically and will wait for a key press to exit.

Command Prompt waits for key press before exiting

6. Few thoughts over the code

This code has been ported from here with changes incorporated from Dr. Doob’s article’s code . The original code was not detecting my NVIDIA platform. Hence, I changed the API to fetch platform information differently.

// Get platform and device information

cl_platform_id platform_id = NULL;

cl_device_id device_id = NULL;

cl_uint ret_num_devices;

cl_uint ret_num_platforms;

cl_int ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);

ret = clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_GPU, 1,

&device_id, &ret_num_devices);

This did not work and failed to find NVIDIA plaform. Setting CL_DEVICE_TYPE_GPU to CL_DEVICE_TYPE_DEFAULT or CL_DEVICE_TYPE_ALL made the program choose Intel CPU which got stuck at building kernel, for reasons unknown.

Hence, I changed the code to the following which is more general form of finding OpenCL-compatible plaforms.

// Get platform and device information cl_device_id device_id = NULL; cl_uint ret_num_devices; cl_uint ret_num_platforms; cl_int ret = clGetPlatformIDs(0, NULL, &ret_num_platforms); cl_platform_id *platforms = NULL; platforms = (cl_platform_id*)malloc(ret_num_platforms*sizeof(cl_platform_id)); ret = clGetPlatformIDs(ret_num_platforms, platforms, NULL); printf("ret at %d is %d

", __LINE__, ret); ret = clGetDeviceIDs( platforms[1], CL_DEVICE_TYPE_ALL, 1, &device_id, &ret_num_devices);

clGetPlatformIDs(0, NULL, &ret_num_platforms); returns the number of platforms. This information is used to allocate memeory for cl_platform_id *platforms. A second call to clGetPlatformIDs(ret_num_platforms, platforms, NULL); populates platforms with data.

One can print the values of platforms but I didn’t . platforms[0] is Intel CPU and platforms[1] is NVIDIA GPU. Hence, I have selected it for clGetDeviceIDs call.