Update: MapD rebranded as OmniSci in 2018.

On August 2nd, Nvidia released a new Pascal-based Titan X graphics card. This new card brought an architectural upgrade over the previous Maxwell-based Titan X as well as upgrading the CUDA core count from 3,072 to 3,584 and increasing the base clock speed from 1 GHz to 1.417 GHz.

In my previous benchmark where I used MapD and 4 x Maxwell-based Titan Xs I saw some incredibly fast query times on what is consumer-grade hardware. In this blog post I'm going to see how much of an upgrade the new Pascal-based cards offer MapD when querying 1.1 billion taxi trips made in New York City over the course of six years.

The Hardware For this benchmark I'll be using a CentOS 7.2.1511 machine with an Intel Xeon E5-2667 v4 CPU clocked at 3.20GHz. This system has two of these chips, each with 8 cores and 16 threads. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping: 1 CPU MHz: 1200.375 BogoMIPS: 6406.18 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31 This machine has 512 GB of RAM. $ cat /proc/meminfo MemTotal: 528086776 kB ... For storage there are 2 Samsung EVO 850 1TB SSD drives setup in a RAID 1 mirror mode. $ cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda2[0] sdb2[1] 1024960 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md127 : active raid1 sdb3[1] sda3[0] 908492800 blocks super 1.2 [2/2] [UU] bitmap: 0/7 pages [0KB], 65536KB chunk $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk ├─sda1 8:1 0 64G 0 part │ └─centos_titan-swap 253:0 0 128G 0 lvm [SWAP] ├─sda2 8:2 0 1001M 0 part │ └─md126 9:126 0 1001M 0 raid1 /boot └─sda3 8:3 0 866.5G 0 part └─md127 9:127 0 866.4G 0 raid1 / sdb 8:16 0 931.5G 0 disk ├─sdb1 8:17 0 64G 0 part │ └─centos_titan-swap 253:0 0 128G 0 lvm [SWAP] ├─sdb2 8:18 0 1001M 0 part │ └─md126 9:126 0 1001M 0 raid1 /boot └─sdb3 8:19 0 866.5G 0 part └─md127 9:127 0 866.4G 0 raid1 / These drives expose ~930 GB of formatted capacity. $ df -H Filesystem Size Used Avail Use% Mounted on /dev/md127 930G 401G 530G 44% / ... The Maxwell-based Titan Xs would report a truncated model name when running Nvidia's system management interface: $ nvidia-smi ... |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX TIT... Off | 0000:02:00.0 Off | N/A | | 28% 59C P8 17W / 250W | 12199MiB / 12287MiB | 0% Default | +-------------------------------+----------------------+----------------------+ ... Thankfully now they've shortened the product name displayed and it now reads simply as "Titan X". $ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.35 Driver Version: 367.35 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X On | 0000:04:00.0 Off | N/A | | 23% 25C P8 14W / 250W | 2782MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN X On | 0000:05:00.0 Off | N/A | | 23% 25C P8 14W / 250W | 2748MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 TITAN X On | 0000:08:00.0 Off | N/A | | 23% 28C P8 14W / 250W | 2748MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 TITAN X On | 0000:09:00.0 Off | N/A | | 23% 27C P8 15W / 250W | 2748MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 TITAN X On | 0000:84:00.0 Off | N/A | | 23% 29C P8 15W / 250W | 2748MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 TITAN X On | 0000:85:00.0 Off | N/A | | 23% 26C P8 15W / 250W | 2748MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 TITAN X On | 0000:88:00.0 Off | N/A | | 23% 22C P8 13W / 250W | 2748MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 TITAN X On | 0000:89:00.0 Off | N/A | | 23% 27C P8 15W / 250W | 2748MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ I'll be using Nvidia's 367.35 driver which is the absolute lowest version you want to be running with these cards. If you're planning on building one of these systems and running it in a data centre then you'll need a 5U chassis as these consumer-grade cards have their power sockets facing upwards and the extra U is needed to give the power cables enough space.

Loading 1.1 Billion Trips into MapD To start, I'll download the 104 GB of CSV data I created in my Billion Taxi Rides in Redshift blog post. This data sits in 56 GZIP files and decompresses into around 500 GB of raw CSV data. I'll be storing this data on an NFS mount. $ cd /theHoard/trips/ $ vi urls.txt https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaa.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xab.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xac.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xad.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xae.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaf.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xag.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xah.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xai.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaj.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xak.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xal.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xam.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xan.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xao.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xap.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaq.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xar.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xas.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xat.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xau.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xav.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaw.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xax.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xay.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaz.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xba.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbb.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbc.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbd.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbe.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbf.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbg.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbh.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbi.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbj.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbk.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbl.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbm.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbn.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbo.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbp.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbq.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbr.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbs.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbt.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbu.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbv.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbw.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbx.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xby.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbz.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xca.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcb.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcc.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcd.csv.gz $ cat urls.txt | \ xargs -n 1 -P 6 \ wget I'll then create a schema for my trips table. The fragment size for this exercise will be 85 million. Fragment size tells MapD how to spread records across each GPU. $ vi create_trips_table.sql CREATE TABLE trips ( trip_id INTEGER , vendor_id VARCHAR ( 3 ) ENCODING DICT , pickup_datetime TIMESTAMP , dropoff_datetime TIMESTAMP , store_and_fwd_flag VARCHAR ( 1 ) ENCODING DICT , rate_code_id SMALLINT , pickup_longitude DECIMAL ( 14 , 2 ), pickup_latitude DECIMAL ( 14 , 2 ), dropoff_longitude DECIMAL ( 14 , 2 ), dropoff_latitude DECIMAL ( 14 , 2 ), passenger_count SMALLINT , trip_distance DECIMAL ( 14 , 2 ), fare_amount DECIMAL ( 14 , 2 ), extra DECIMAL ( 14 , 2 ), mta_tax DECIMAL ( 14 , 2 ), tip_amount DECIMAL ( 14 , 2 ), tolls_amount DECIMAL ( 14 , 2 ), ehail_fee DECIMAL ( 14 , 2 ), improvement_surcharge DECIMAL ( 14 , 2 ), total_amount DECIMAL ( 14 , 2 ), payment_type VARCHAR ( 3 ) ENCODING DICT , trip_type SMALLINT , pickup VARCHAR ( 50 ) ENCODING DICT , dropoff VARCHAR ( 50 ) ENCODING DICT , cab_type VARCHAR ( 6 ) ENCODING DICT , precipitation SMALLINT , snow_depth SMALLINT , snowfall SMALLINT , max_temperature SMALLINT , min_temperature SMALLINT , average_wind_speed SMALLINT , pickup_nyct2010_gid SMALLINT , pickup_ctlabel VARCHAR ( 10 ) ENCODING DICT , pickup_borocode SMALLINT , pickup_boroname VARCHAR ( 13 ) ENCODING DICT , pickup_ct2010 VARCHAR ( 6 ) ENCODING DICT , pickup_boroct2010 VARCHAR ( 7 ) ENCODING DICT , pickup_cdeligibil VARCHAR ( 1 ) ENCODING DICT , pickup_ntacode VARCHAR ( 4 ) ENCODING DICT , pickup_ntaname VARCHAR ( 56 ) ENCODING DICT , pickup_puma VARCHAR ( 4 ) ENCODING DICT , dropoff_nyct2010_gid SMALLINT , dropoff_ctlabel VARCHAR ( 10 ) ENCODING DICT , dropoff_borocode SMALLINT , dropoff_boroname VARCHAR ( 13 ) ENCODING DICT , dropoff_ct2010 VARCHAR ( 6 ) ENCODING DICT , dropoff_boroct2010 VARCHAR ( 7 ) ENCODING DICT , dropoff_cdeligibil VARCHAR ( 1 ) ENCODING DICT , dropoff_ntacode VARCHAR ( 4 ) ENCODING DICT , dropoff_ntaname VARCHAR ( 56 ) ENCODING DICT , dropoff_puma VARCHAR ( 4 ) ENCODING DICT ) WITH ( FRAGMENT_SIZE = 85000000 ); I'll create two environment variables with my credentials for MapD. $ read MAPD_USERNAME $ read MAPD_PASSWORD $ export MAPD_USERNAME $ export MAPD_PASSWORD The following will create the table schema using the mapdql cli tool. $ mapdql mapd \ -u $MAPD_USERNAME \ -p $MAPD_PASSWORD \ < create_trips_table.sql I'll then check that the table has been created: $ echo "\t" | mapdql mapd -u $MAPD_USERNAME -p $MAPD_PASSWORD User mapd connected to database mapd trips User mapd disconnected from database mapd MapD doesn't support loading CSV data from GZIP files at this time so I'll decompress the CSV files before loading them. $ find *.gz | \ xargs -n 1 -P 16 \ gunzip With the table and files in place I'll load the 500 GB of CSV data into MapD. $ for filename in *.csv ; do echo "COPY trips FROM '/theHoard/trips/ $filename ' WITH (header='false');" | \ mapdql \ mapd \ -u $MAPD_USERNAME \ -p $MAPD_PASSWORD done The above completed in 43 minutes.