At the beginning of October I began looking at an open source, GPU-driven database called Alenka. It's primary developer, Anton, has been working on it for about four years. Over the following eight weeks, Anton was kind enough to provide guidance on using the software as well as fixing various bugs I had uncovered during my testing.

Alenka uses the Nvidia's Thrust library's stable_sort_by_key for sorting, copy_if for filtering and copy_if and transform for grouping and, up until recently, ModernGPU's RelationalJoin for joining records.

The software runs on CentOS and Ubuntu and some users have reported getting it to run on Mac OSX but as of this writing it doesn't yet run on Windows since ModernGPU, a library Alenka relies on, has yet to be ported to the latest version of Visual Studio and Nvidia's CUDA 8.

Installing Dependencies The following was run on a fresh Ubuntu 16.04.1 LTS installation. The machine I'm using has an Nvidia GeForce GTX 1080 graphics card which comes with 8 GB of GDDR5X memory, an Intel Core i5 4670K clocked at 3.4 GHz, 32 GB of system RAM, a 960 GB SSD and a second, 3 TB mechanical drive which is used to store the 1.1 taxi trips dataset I use in my benchmarks. I'll first install a few dependencies to support Alenka and the GPU capabilities of my system. $ sudo apt update $ sudo apt install \ freeglut3-dev \ g++-4.9 \ gcc-4.9 \ libglu1-mesa-dev \ libx11-dev \ libxi-dev \ libxmu-dev \ nvidia-modprobe \ bison \ flex When I started looking at Alenka in October, the 367 driver from Nvidia seem to work the best with my GTX 1080 card and Ubuntu 16. $ sudo apt purge nvidia-* $ sudo add-apt-repository ppa:graphics-drivers/ppa $ sudo apt update $ sudo apt install nvidia-367 Throughout this past eight weeks Nvidia has continued to release newer drivers but I've kept to 367 as it seems stable. When I last checked 367.57 was the latest sub-revision of the 367 driver. With the driver and its dependencies installed I'll reboot the system. $ sudo reboot I've set GCC 4.9 to be the default version being used on this system. $ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 10 $ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 20 $ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 10 $ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 20 Next I'll download the 64-bit version of the CUDA 8 platform distribution for Ubuntu 16.04. $ curl -O https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb $ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64.deb $ sudo apt update $ sudo apt install cuda I'll then add the environment variables for the CUDA platform to my .bashrc file. $ echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc $ echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc $ source ~/.bashrc I can now run the CUDA compiler that's been installed. $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Sun_Sep__4_22:14:01_CDT_2016 Cuda compilation tools, release 8.0, V8.0.44

Compiling Alenka I'll first clone the Alenka repository from GitHub. $ cd ~ $ git clone https://github.com/antonmks/Alenka.git As of this writing I'm using the d05ecd revision. I suggest to always use the latest master revision of this software as each commit brings a lot of improvements but, for the sake of reproducibility, I'm including the revision I used with while working on this blog post. $ cd Alenka $ git rev-parse HEAD d05ecdf7f9d2e77b16d48b6bb7d5bc7da948789d Alenka's Makefile includes configuration for various levels of CUDA compute compatibility. If you're using a Maxwell-series card from Nvidia compute_50 , compute_52 and compute_53 should work. I'm using the GTX 1080 which is a Pascal-series card and it supports compute compatibility 6.1. Below is the modifications I've made to Alenka's Makefile. $ vi Makefile # GENCODE_SM30 := -gencode arch=compute_30,code=sm_30 # GENCODE_SM35 := -gencode arch=compute_35,code=sm_35 # GENCODE_SM50 := -gencode arch=compute_50,code=sm_50 GENCODE_SM61 : = -gencode arch = compute_61,code = sm_61 GENCODE_FLAGS : = $( GENCODE_SM30 ) $( GENCODE_SM35 ) $( GENCODE_SM50 ) $( GENCODE_SM61 ) Alenka relies on ModernGPU for some of its functionality so I'll clone it inside of Alenka's code repository. This way Alenka's Makefile's library path flag for ModernGPU doesn't need to be pointed elsewhere. $ git clone https://github.com/moderngpu/moderngpu.git $ cd moderngpu Again, for the sake of reproducibility, I'm using the d78a5f commit of ModernGPU. $ git rev-parse HEAD d78a5f9495f055c8eef3199fef8950e54b631088 Building ModernGPU is as straight-forward as calling make . $ make I'll create a small piece of code to test that ModernGPU is working with my card properly. $ vi hello.cu #include <moderngpu/transform.hxx> using namespace mgpu ; int main ( int argc , char ** argv ) { // The context encapsulates things like an allocator and a stream. // By default it prints device info to the console. standard_context_t context ; // Launch five threads to greet us. transform ([] MGPU_DEVICE ( int index ) { printf ( "Hello GPU from thread %d

" , index ); }, 5 , context ); // Synchronize on the context's stream to send the output to the console. context . synchronize (); return 0 ; } $ nvcc \ -std = c++11 \ --expt-extended-lambda \ -gencode arch = compute_61,code = compute_61 \ -I ./src/ \ -o hello \ hello.cu $ ./hello GeForce GTX 1080 : 1835.000 Mhz (Ordinal 0) 20 SMs enabled. Compute Capability sm_61 FreeMem: 6678MB TotalMem: 8110MB 64-bit pointers. Mem Clock: 5005.000 Mhz x 256 bits (320.3 GB/s) ECC Disabled Hello GPU from thread 0 Hello GPU from thread 1 Hello GPU from thread 2 Hello GPU from thread 3 Hello GPU from thread 4 With that working I'll change directory up one level and compile Alenka. $ cd .. $ make -j4 The above completed in 43 minutes.

Importing 1.1 Billion Taxi Trips I'll be importing the 104 GB of CSV data I created in my Billion Taxi Rides in Redshift blog post. This data sits in 56 gzip files and decompresses into around 500 GB of raw CSV data. This data lives on a 3 TB mechanical drive that is mounted at /media/mark/Archive2/ on my system. Alenka doesn't support importing data from gzip files so I will create a loop that will take each gzip file, decompress it into a file called data.csv , I'll then import that file into a table called 'trips' in Alenka and repeat for the remaining gzip files. The data will live on my 960 GB SSD drive when it's in Alenka's internal storage format. $ mkdir -p ~/taxis && cd ~/taxis $ vi load.sql A : = LOAD 'data.csv' USING ( ',' ) AS ( trip_id { 1 } : int , vendor_id { 2 } : varchar ( 3 ) NO ENCODING , pickup_datetime { 3 } : int , dropoff_datetime { 4 } : int , store_and_fwd_flag { 5 } : varchar ( 1 ) NO ENCODING , rate_code_id { 6 } : int , pickup_longitude { 7 } : decimal ( 14 , 2 ), pickup_latitude { 8 } : decimal ( 14 , 2 ), dropoff_longitude { 9 } : decimal ( 14 , 2 ), dropoff_latitude { 10 } : decimal ( 14 , 2 ), passenger_count { 11 } : int , trip_distance { 12 } : decimal ( 14 , 2 ), fare_amount { 13 } : decimal ( 14 , 2 ), extra { 14 } : decimal ( 14 , 2 ), mta_tax { 15 } : decimal ( 14 , 2 ), tip_amount { 16 } : decimal ( 14 , 2 ), tolls_amount { 17 } : decimal ( 14 , 2 ), ehail_fee { 18 } : decimal ( 14 , 2 ), improvement_surcharge { 19 } : decimal ( 14 , 2 ), total_amount { 20 } : decimal ( 14 , 2 ), payment_type { 21 } : varchar ( 3 ) NO ENCODING , trip_type { 22 } : int , pickup { 23 } : varchar ( 50 ) NO ENCODING , dropoff { 24 } : varchar ( 50 ) NO ENCODING , cab_type { 25 } : varchar ( 6 ) NO ENCODING , precipitation { 26 } : int , snow_depth { 27 } : int , snowfall { 28 } : int , max_temperature { 29 } : int , min_temperature { 30 } : int , average_wind_speed { 31 } : int , pickup_nyct2010_gid { 32 } : int , pickup_ctlabel { 33 } : varchar ( 10 ) NO ENCODING , pickup_borocode { 34 } : int , pickup_boroname { 35 } : varchar ( 13 ) NO ENCODING , pickup_ct2010 { 36 } : varchar ( 6 ) NO ENCODING , pickup_boroct2010 { 37 } : varchar ( 7 ) NO ENCODING , pickup_cdeligibil { 38 } : varchar ( 1 ) NO ENCODING , pickup_ntacode { 39 } : varchar ( 4 ) NO ENCODING , pickup_ntaname { 40 } : varchar ( 56 ) NO ENCODING , pickup_puma { 41 } : varchar ( 4 ) NO ENCODING , dropoff_nyct2010_gid { 42 } : int , dropoff_ctlabel { 43 } : varchar ( 10 ) NO ENCODING , dropoff_borocode { 44 } : int , dropoff_boroname { 45 } : varchar ( 13 ) NO ENCODING , dropoff_ct2010 { 46 } : varchar ( 6 ) NO ENCODING , dropoff_boroct2010 { 47 } : varchar ( 7 ) NO ENCODING , dropoff_cdeligibil { 48 } : varchar ( 1 ) NO ENCODING , dropoff_ntacode { 49 } : varchar ( 4 ) NO ENCODING , dropoff_ntaname { 50 } : varchar ( 56 ) NO ENCODING , dropoff_puma { 51 } : varchar ( 4 ) NO ENCODING ); STORE A INTO 'trips' APPEND BINARY ; $ for filename in /media/mark/Archive2/Taxi \ Data/20M \ blocks/trips_x*.csv.gz ; do gunzip -c " $filename " > data.csv ~/Alenka/alenka -l 200 load.sql done During the import I could see that multiple gigabytes of memory are being used. top - 21:16:57 up 1:20, 1 user, load average: 3,22, 2,27, 2,01 Tasks: 220 total, 3 running, 217 sleeping, 0 stopped, 0 zombie %Cpu0 : 0,9 us, 2,2 sy, 0,0 ni, 7,0 id, 90,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu1 : 1,3 us, 1,3 sy, 0,0 ni, 54,8 id, 42,5 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu2 : 0,9 us, 2,2 sy, 0,0 ni, 0,4 id, 96,5 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu3 : 51,1 us, 6,9 sy, 0,0 ni, 0,0 id, 40,7 wa, 0,0 hi, 1,3 si, 0,0 st KiB Mem : 32824752 total, 235308 free, 2173876 used, 30415568 buff/cache KiB Swap: 33430524 total, 33225596 free, 204928 used. 29118716 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7418 mark 20 0 15,968g 1,999g 992,6m R 58,9 6,4 0:56.58 alenka And the nvidia-smi tool showed 1,533 MB of GPU memory being used by Alenka. +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.57 Driver Version: 367.57 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 0000:02:00.0 On | N/A | | 27% 53C P2 57W / 200W | 2655MiB / 8110MiB | 2% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 932 G /usr/lib/xorg/Xorg 727MiB | | 0 1685 G compiz 390MiB | | 0 8047 C /home/mark/Alenka/alenka 1533MiB | +-----------------------------------------------------------------------------+ The import completed after 3 hours and 9 minutes.

Benchmarking Alenka I am keen to see how fast Alenka performs with my 4 benchmark queries I've run on various big data systems this year. As of this writing there are some issues to iron out in order for the queries to execute properly. I'll describe how far Anton and myself have come with each of them. Query 1: A : = SELECT cab_type AS type_of_cab , COUNT ( cab_type ) AS cnt FROM trips GROUP BY cab_type ; DISPLAY A USING ( '|' ); This query runs for a few minutes with high CPU and memory consumption. It should be finishing in seconds at the most so something is going astray. Work is being carried out to fix this issue and fingers crossed as some point in the future I'll be able to provide a benchmark time for it. top - 07:43:49 up 11:47, 1 user, load average: 0,95, 0,48, 0,19 Tasks: 219 total, 2 running, 217 sleeping, 0 stopped, 0 zombie %Cpu0 : 0,2 us, 0,2 sy, 0,0 ni, 99,6 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu1 : 0,2 us, 0,4 sy, 0,0 ni, 99,4 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu2 : 0,2 us, 0,1 sy, 0,0 ni, 99,7 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu3 : 0,1 us, 99,9 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem : 32824752 total, 640440 free, 2443244 used, 29741068 buff/cache KiB Swap: 33430524 total, 32682616 free, 747908 used. 27141564 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1419 mark 20 0 16,731g 3,764g 2,639g R 100,0 12,0 2:50.50 alenka Query 2: A : = SELECT passenger_count AS pac , AVG ( total_amount ) AS ta , COUNT ( passenger_count ) AS cnt FROM trips GROUP BY passenger_count ; DISPLAY A USING ( '|' ); This query does complete in 15.43 seconds. I've yet to audit the results but for the record, here they are: |37 |14.1900 |1 | |208 |7.0416 |1508 | |19 |5.0000 |1 | |137 |59.6400 |1 | |38 |7.2900 |1 | |158 |14.4400 |1 | |255 |17.9890 |10 | |249 |9.5000 |1 | |58 |25.9350 |2 | |223 |9.5000 |1 | |33 |8.5950 |2 | |25 |7.5900 |1 | |2 |13.7109 |161755340 | |49 |3.2457 |26 | |70 |10.6900 |1 | |155 |90.2300 |1 | |113 |13.3000 |1 | |125 |16.6000 |1 | |0 |10.6676 |3902029 | |34 |16.8000 |1 | |250 |12.5666 |3 | |163 |15.5300 |1 | |97 |9.9000 |1 | |177 |17.0000 |1 | |6 |14.3061 |23796601 | |211 |7.0000 |1 | |254 |6.5000 |1 | |8 |25.4042 |876 | |7 |25.7082 |913 | |5 |13.1016 |77761602 | |129 |8.7857 |7 | |165 |12.1400 |1 | |53 |7.2900 |1 | |134 |55.1400 |1 | |133 |10.3000 |1 | |66 |19.3000 |1 | |84 |43.8400 |1 | |3 |13.3259 |48313914 | |225 |16.0000 |1 | |141 |18.9400 |1 | |69 |5.7900 |1 | |4 |13.4157 |23325370 | |36 |61.5400 |1 | |61 |31.3400 |1 | |1 |13.1786 |772743590 | |13 |31.5000 |1 | |213 |2.5000 |4 | |65 |23.3600 |3 | |17 |39.9500 |1 | |247 |19.4400 |1 | |47 |9.0000 |1 | |9 |41.7145 |422 | |10 |42.4800 |16 | |164 |62.1400 |1 | |160 |15.3400 |1 | |15 |12.0500 |2 | |193 |7.5000 |1 | Query 3: A : = SELECT passenger_count AS pac , YEAR ( pickup_datetime ) AS pickup_year , COUNT ( passenger_count ) AS pc FROM trips GROUP BY passenger_count , pickup_year ; DISPLAY A USING ( '|' ); This query crashes Alenka after 11.77 seconds with the following complaint: terminate called after throwing an instance of 'thrust::system::detail::bad_alloc' what(): std::bad_alloc: out of memory The above is an interesting issue. During the execution Alenka allocates all non-allocated memory on the GPU before terminating. I suspect the data is being loaded in one go onto the GPU and there isn't enough memory for the columns of data being worked with. I suspect streaming in data in chunks and combining results could help this query finish properly. Query 4: A : = SELECT passenger_count AS pac , YEAR ( pickup_datetime ) AS pickup_year , CAST_TO_INT ( trip_distance ) AS distance , COUNT ( passenger_count ) AS the_count FROM trips GROUP BY passenger_count , pickup_year , distance ; B : = ORDER A BY pickup_year ASC , the_count desc ; DISPLAY B USING ( '|' ); This query crashes Alenka after 19.7 seconds with the same "out of memory" complaint from the Thrust library.