CUDA 10 in Clojure

November 21, 2018

Please share: Twitter.

New books are available for subscription.

New CUDA 10 support has just landed in ClojureCUDA ( ) with the latest version 0.6.0. Install CUDA 10 Toolkit, update your drivers, update the ClojureCUDA version in your project.clj , and you should be ready to go! I expect that all your existing code continues to work without changes!

CUDA is the leading environment for high performance computing on Nvidia GPU hardware. You've probably heard about it. What you may not know, is that instead of wrangling with C++ toolchain, you can use it directly from your interactive Clojure REPL!

Other Clojure high performance libraries, such as Neanderthal ( ) take advantage of ClojureCUDA to deliver speed dynamically to your Clojure programs.

Check these higher-level libraries to see how you can do fast calculations with a few lines of Clojure, without writing GPU code yourself. But, see how writing even low level code is not so scary when you have the full power of Clojure's interactive environment. Here is an interactive REPL session that builds the bulk increment function for the whole array.

Notice that I don't write any explanations for the following lines of code, and I bet you can understand what's happening.

Play with the environment ( require ' [ uncomplicate.clojurecuda.core :refer :all ] ' [ uncomplicate.commons.core :refer :all ] ) ( init ) true ( device-count ) 2 ( def my-nvidia-gpu ( device 0 ) ) #'user/my-nvidia-gpu ( info my-nvidia-gpu ) :async-engine-count 2 :managed-memory true :multi-gpu-board false :maximum-surface2d-layered-layers 2048 :maximum-texturecubemap-width 32768 :ecc-enabled false :max-pitch 2147483647 :max-grid-dim-y 65535 :compute-mode :default :can-map-host-memory true :max-grid-dim-z 65535 :pci-bus-id-string 0000:02:00.0 :maximum-texture2d-mipmapped-width 32768 :texture-pitch-alignment 32 :kernel-exec-timeout false :maximum-texture2d-linear-height 65000 :max-shared-memory-per-multiprocessor 98304 :total-mem 11721506816 :maximum-texture1d-layered-width 32768 :maximum-texturecubemap-layered-layers 2046 :maximum-texture3d-width 16384 :maximum-surface2d-layered-height 32768 :max-block-dim-z 64 :maximum-surface1d-width 32768 :maximum-surface3d-width 16384 :name GeForce GTX 1080 Ti :maximum-texture3d-height-alternate 8192 :max-threads-per-multiprocessor 2048 :max-shared-memory-per-block 49152 :maximum-texture3d-width-alternate 8192 :compute-capability-major 6 :texture-alignment 512 :global-memory-bus-width 352 :maximum-surface2d-layered-width 32768 :memory-clock-rate 5505000 :maximum-surfacecubemap-layered-layers 2046 :maximum-surface2d-height 65536 :clock-rate 1683000 :concurrent-kernels 1 :compute-capability-minor 1 :maximum-texture2d-width 131072 :max-threads-per-block 1024 :maximum-texture1d-linear-width 134217728 :integrated false :maximum-texture2d-layered-layers 2048 :max-block-dim-x 1024 :maximum-texture1d-mipmapped-width 16384 :maximum-texture2d-mipmapped-height 32768 :local-L1-cache-supported true :maximum-surface1d-layered-layers 2048 :pci-bus-id 2 :maximum-texture1d-layered-layers 2048 :maximum-surfacecubemap-layered-width 32768 :max-grid-dim-x 2147483647 :maximum-texture2d-height 65536 :global-L1-cache-supported true :maximum-texture2d-linear-pitch 2097120 :maximum-texturecubemap-layered-width 32768 :multi-gpu-board-group-id 0 :pci-domain-id 0 :maximum-surface3d-depth 16384 :maximum-surface2d-width 131072 :stream-priorities-supported true :multiprocessor-count 28 :tcc-driver false :warp-size 32 :unified-addressing true :maximum-texture3d-height 16384 :L2-cache-size 2883584 :maximum-surfacecubemap-width 32768 :maximum-texture1d-width 131072 :maximum-surface1d-layered-width 32768 :maximum-surface3d-height 16384 :pci-device-id 0 :max-registers-per-block 65536 :max-block-dim-y 1024 :surface-alignment 512 :maximum-texture3d-depth-alternate 32768 :maximum-texture3d-depth 16384 :total-constant-memory 65536 :maximum-texture2d-linear-width 131072 :max-registers-per-multiprocessor 65536 :maximum-texture2d-layered-height 32768

Create the context ( def ctx ( context my-nvidia-gpu ) ) #'user/ctx ( info ctx ) '(:dev-runtime-pending-launch-count 2048 :dev-runtime-sync-depth 2 :malloc-heap-size 8388608 :stack-size 1024 :api-version 3020 :stream-priority-range (0 -1) :cache-config :prefer-none :printf-fifo-size 1048576 :device #object(jcuda.driver.CUdevice 0x1db4190c "CUdevice[nativePointer=0x0]") :shared-config :four-byte-bank-size) ( = ctx ( current-context ) ) true

Play with memory ( def gpu-array ( mem-alloc 1024 ) ) #'user/gpu-array ( def main-array ( float-array ( range 256 ) ) ) #'user/main-array ( take 10 main-array ) 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ( memcpy-host! main-array gpu-array ) #object[uncomplicate.clojurecuda.internal.impl.CULinearMemory 0x515e61e0 "[email protected]"]

Compute something already extern "C" __global__ void increment ( int n , float * a ) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i < n) { a[i] = a[i] + 1.0f; } }; ( def kernel-source "extern \"C\" __global__ void increment (int n, float *a) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i < n) { a[i] = a[i] + 1.0f; } };" ) ( def hello-program ( compile! ( program kernel-source ) ) ) ( def hello-module ( module hello-program ) ) ( def increment ( function hello-module "increment" ) ) #'user/kernel-source#'user/hello-program#'user/hello-module#'user/increment ( launch! increment ( grid-1d 256 ) ( parameters 256 gpu-array ) ) nil ( take 12 ( memcpy-host! gpu-array ( float-array 256 ) ) ) 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0