During the summer I benchmarked BrytlytDB, a GPU-powered database. The software performed well and since then its version 1.0 has sat at the seventh position on my benchmarks recap board. This week I've had a chance to benchmark version 2.0 on two p16.8xlarge AWS EC2 instances again. Previously, there were a number of manual steps that were needed to get the software up and running but thankfully those steps have been largely removed thanks to an AWS Marketplace page where you can spin up a BrytlytDB cluster on the instance sizes and counts of your choice with a few clicks.

For those unfamiliar with BrytlytDB, it's a GPU-powered database built on top of PostgreSQL. Joins, Stored procedures, ODBC (important for Tableau users) and the PostgreSQL wire protocol all work as expected. When queries are run their execution plans are put together by PostgreSQL's regular codebase before being passed off to BrytlytDB to execute them on one or more of Nvidia's CUDA-powered GPUs. The GPUs don't need to live on the same machine either; ByrtlytDB, as will be shown in this benchmark, can work with a cluster of machines hosting their own discrete GPUs connected via a regular network.

In this benchmark I'll see how well the new version 2.0 of BrytlytDB runs on 32 Tesla K80 GPUs spread across two EC2 instances when querying 1.1 billion taxi trips. I'll be using the same dataset I've used to benchmark Amazon Athena, BigQuery, ClickHouse, Elasticsearch, EMR, kdb+/q, MapD, PostgreSQL, Redshift and Vertica. I have a single-page summary of all these benchmarks for comparison.

A GPU-Powered AWS EC2 Cluster For this benchmark I'll be using two p2.16xlarge EC2 instances running Ubuntu 16.04.2 LTS in Amazon Web Services' us-east-1a region in Northern Virginia. Each machine has 8 Nvidia K80 cards which have 2 GPUs each, 64 virtual CPUs and 732 GB of memory. There's also 20 Gbit/s of networking capacity available to each instance. Below are the specifications of the compute capabilities available each one of the EC2 instances. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Stepping: 1 CPU MHz: 1230.230 CPU max MHz: 3000.0000 CPU min MHz: 1200.0000 BogoMIPS: 4600.08 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0-15,32-47 NUMA node1 CPU(s): 16-31,48-63 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida This is the layout of the GPUs available on one of the two EC2 instances. $ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.98 Driver Version: 384.98 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:00:0F.0 Off | 0 | | N/A 65C P0 57W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 00000000:00:10.0 Off | 0 | | N/A 54C P0 73W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 Off | 00000000:00:11.0 Off | 0 | | N/A 73C P0 61W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 Off | 00000000:00:12.0 Off | 0 | | N/A 55C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla K80 Off | 00000000:00:13.0 Off | 0 | | N/A 71C P0 60W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla K80 Off | 00000000:00:14.0 Off | 0 | | N/A 56C P0 74W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla K80 Off | 00000000:00:15.0 Off | 0 | | N/A 74C P0 61W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla K80 Off | 00000000:00:16.0 Off | 0 | | N/A 57C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 8 Tesla K80 Off | 00000000:00:17.0 Off | 0 | | N/A 65C P0 59W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 9 Tesla K80 Off | 00000000:00:18.0 Off | 0 | | N/A 53C P0 72W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 10 Tesla K80 Off | 00000000:00:19.0 Off | 0 | | N/A 72C P0 61W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 11 Tesla K80 Off | 00000000:00:1A.0 Off | 0 | | N/A 57C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 12 Tesla K80 Off | 00000000:00:1B.0 Off | 0 | | N/A 70C P0 59W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 13 Tesla K80 Off | 00000000:00:1C.0 Off | 0 | | N/A 57C P0 74W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 14 Tesla K80 Off | 00000000:00:1D.0 Off | 0 | | N/A 74C P0 58W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 15 Tesla K80 Off | 00000000:00:1E.0 Off | 0 | | N/A 57C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ The first machine has an IP address of 52.91.174.189 and acts as the Coordinator and host to the first 16 data nodes. The second machine has an IP address of 34.235.138.233 and hosts the second set of 16 data nodes. TCP port 5432 is open between the two machines for communicating via PostgreSQL's wire protocol, TCP port 7777 is open for global transaction manager communication and ports 15,432 through to 20,000 are open for data node communication. In addition to the default 20 GB EBS volumes on each EC2 instance there are six 500 GB General Purpose SSDs attached which offer a baseline of 100 IOPS each and can burst to 3,000 IOPS if need be. These drives are setup in a RAID 0 configuration on each instance. This gives each instance around 2.2 TB of formatted storage capacity in addition to the 32 GB boot drives. The Marketplace AMI sets up the RAID formation automatically. $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 32G 0 disk └─xvda1 202:1 0 32G 0 part / xvdb 202:16 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvdc 202:32 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvdd 202:48 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvde 202:64 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvdf 202:80 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvdg 202:96 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data

BrytlytDB Up & Running The BrytlytDB software will be installed and running when the EC2 instances launch but since I'll be using the two machines as a cluster I'll register the second instance's data nodes on the first instance using the brytlyt PostgreSQL user account. $ vi register.sql CREATE NODE d16 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15432 ); CREATE NODE d17 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15433 ); CREATE NODE d18 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15434 ); CREATE NODE d19 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15435 ); CREATE NODE d20 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15436 ); CREATE NODE d21 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15437 ); CREATE NODE d22 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15438 ); CREATE NODE d23 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15439 ); CREATE NODE d24 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15440 ); CREATE NODE d25 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15441 ); CREATE NODE d26 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15442 ); CREATE NODE d27 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15443 ); CREATE NODE d28 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15444 ); CREATE NODE d29 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15445 ); CREATE NODE d30 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15446 ); CREATE NODE d31 WITH ( HOST = '34.235.138.233' , TYPE = 'datanode' , PORT = 15447 ); SELECT pgxc_pool_reload (); $ psql -U brytlyt -f register.sql Once that's run I can check that the data nodes have all registered. SELECT * FROM pgxc_node ; node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id ----------+-----------+-----------+----------------+----------------+------------------+------------- c0 | C | 5432 | localhost | f | f | 546080270 d0 | D | 15432 | 127.0.0.1 | f | f | -2088056588 d1 | D | 15433 | 127.0.0.1 | f | f | 342786568 d2 | D | 15434 | 127.0.0.1 | f | f | 823431008 d3 | D | 15435 | 127.0.0.1 | f | f | 907287964 d4 | D | 15436 | 127.0.0.1 | f | f | -1005442993 d5 | D | 15437 | 127.0.0.1 | f | f | -2088175618 d6 | D | 15438 | 127.0.0.1 | f | f | 653419373 d7 | D | 15439 | 127.0.0.1 | f | f | -2114583294 d8 | D | 15440 | 127.0.0.1 | f | f | -1181003975 d9 | D | 15441 | 127.0.0.1 | f | f | 1701757799 d10 | D | 15442 | 127.0.0.1 | f | f | 2938326 d11 | D | 15443 | 127.0.0.1 | f | f | -340513757 d12 | D | 15444 | 127.0.0.1 | f | f | 899853678 d13 | D | 15445 | 127.0.0.1 | f | f | -1529184863 d14 | D | 15446 | 127.0.0.1 | f | f | 297400288 d15 | D | 15447 | 127.0.0.1 | f | f | -2050394442 d16 | D | 15432 | 34.235.138.233 | f | f | -951284347 d17 | D | 15433 | 34.235.138.233 | f | f | 940304836 d18 | D | 15434 | 34.235.138.233 | f | f | 1871723707 d19 | D | 15435 | 34.235.138.233 | f | f | 1351481950 d20 | D | 15436 | 34.235.138.233 | f | f | -958755150 d21 | D | 15437 | 34.235.138.233 | f | f | 1823411759 d22 | D | 15438 | 34.235.138.233 | f | f | 1397545445 d23 | D | 15439 | 34.235.138.233 | f | f | 68184193 d24 | D | 15440 | 34.235.138.233 | f | f | -965766358 d25 | D | 15441 | 34.235.138.233 | f | f | 1469379654 d26 | D | 15442 | 34.235.138.233 | f | f | -2015650531 d27 | D | 15443 | 34.235.138.233 | f | f | 14563042 d28 | D | 15444 | 34.235.138.233 | f | f | 1604721461 d29 | D | 15445 | 34.235.138.233 | f | f | -557897158 d30 | D | 15446 | 34.235.138.233 | f | f | 1910249780 d31 | D | 15447 | 34.235.138.233 | f | f | -1701917292

Loading 1.1 Billion Trips Into BrytlytDB I'll first define the table schema for the 1.1 billion taxi trip records. $ psql -U brytlyt CREATE FOREIGN TABLE trips ( trip_id INTEGER , vendor_id VARCHAR ( 3 ), pickup_datetime DATE , dropoff_datetime DATE , store_and_fwd_flag VARCHAR ( 1 ), rate_code_id INTEGER , pickup_longitude DOUBLE PRECISION , pickup_latitude DOUBLE PRECISION , dropoff_longitude DOUBLE PRECISION , dropoff_latitude DOUBLE PRECISION , passenger_count INTEGER , trip_distance DOUBLE PRECISION , fare_amount DOUBLE PRECISION , extra DOUBLE PRECISION , mta_tax DOUBLE PRECISION , tip_amount DOUBLE PRECISION , tolls_amount DOUBLE PRECISION , ehail_fee DOUBLE PRECISION , improvement_surcharge DOUBLE PRECISION , total_amount DOUBLE PRECISION , payment_type VARCHAR ( 10 ), trip_type INTEGER , pickup VARCHAR ( 50 ), dropoff VARCHAR ( 50 ), cab_type VARCHAR ( 6 ), precipitation INTEGER , snow_depth INTEGER , snowfall INTEGER , max_temperature INTEGER , min_temperature INTEGER , average_wind_speed INTEGER , pickup_nyct2010_gid INTEGER , pickup_ctlabel VARCHAR ( 10 ), pickup_borocode INTEGER , pickup_boroname VARCHAR ( 13 ), pickup_ct2010 VARCHAR ( 6 ), pickup_boroct2010 VARCHAR ( 7 ), pickup_cdeligibil VARCHAR ( 1 ), pickup_ntacode VARCHAR ( 4 ), pickup_ntaname VARCHAR ( 56 ), pickup_puma VARCHAR ( 4 ), dropoff_nyct2010_gid INTEGER , dropoff_ctlabel VARCHAR ( 10 ), dropoff_borocode INTEGER , dropoff_boroname VARCHAR ( 13 ), dropoff_ct2010 VARCHAR ( 6 ), dropoff_boroct2010 VARCHAR ( 7 ), dropoff_cdeligibil VARCHAR ( 1 ), dropoff_ntacode VARCHAR ( 4 ), dropoff_ntaname VARCHAR ( 56 ), dropoff_puma VARCHAR ( 4 ) ) SERVER gm_fdw_server OPTIONS ( max_size '35000000' , is_random 'false' , index '10,24' ) DISTRIBUTE BY roundrobin ; Note in the above there are indices on the passenger count and cab type columns. Also, the timestamp columns are using a DATE column type. I've been told DATETIME fields will be supported in a future version of the software. This is important to take into consideration as there will be data loss as only the dates and not the times of the journeys can be queried. Also note that I had to use the larger INTEGER field instead of a SMALLINT field I normally use for several columns due to the current level of column type support in the software. For this benchmark I've downloaded and decompressed the 500 GB of CSV data I created in my Billion Taxi Rides in Redshift blog post onto the /usr/local/data folder on the first EC2 instance. The following command will load the data onto the two EC2 instances. It took around 50 minutes to complete. count = 0 for filename in /usr/local/data/xa* ; do psql -c "EXECUTE DIRECT ON ( d $count ) 'SELECT load_data(''trips'', '' $filename '', '','', 1000000, ''0'', '''', ''overwrite'')'" 2 > & 1 & psql -c "EXECUTE DIRECT ON ( d $(( $count + 16 )) ) 'SELECT load_data(''trips'', '' $filename '', '','', 1000000, ''0'', '''', ''overwrite'')'" 2 > & 1 & (( count++ )) done