Vertica is a distributed, column-oriented analytics database. It started life out in 2005 being developed by, among others, Michael Stonebraker. Stonebraker has a long history of developing database software including Ingres and PostgreSQL. In 2011, Vertica Systems was acquired by HP. Not long after a free community version of Vertica was released.

Vertica has had some famous users including Facebook, Groupon and Zynga. Maxime Beauchemin's LinkedIn profile describes part of his time at Facebook was spent on pushing the limits of Vertica to satisfy the firm's needs.

Vertica will be best suited on a cluster of machines but for this benchmark I'm interested in how fast it can work off of a single machine.

The dataset being used is the same one I've used to benchmark Amazon Athena, BigQuery, Elasticsearch, kdb+/q, MapD, PostgreSQL, Presto, Redshift and Spark. I've compiled a single-page summary of these benchmarks.

The Hardware For this benchmark I'll be running a fresh installation of Ubuntu 14.04.3 LTS on an Intel Core i5 4670K clocked at 3.4 GHz, 16 GB of DDR3 RAM and a SanDisk SDSSDHII960G 960 GB SSD drive.

Vertica Up & Running I'll first install a number of prerequisites. $ sudo apt update $ sudo apt install \ dialog \ libsensors4 \ mcelog \ ntp \ pstack \ sysstat I'll then set the time zone on my system and make sure the system clock is synchronised. $ sudo dpkg-reconfigure tzdata $ sudo service ntp stop $ sudo ntpd -gq $ sudo service ntp start For this benchmark I've downloaded version 8.0.1 of the Community Edition of Vertica to my home folder. The following will deploy Vertica's installation files. $ sudo dpkg -i vertica_8.0.1-0_amd64.deb Selecting previously unselected package vertica. (Reading database ... 56852 files and directories currently installed.) Preparing to unpack vertica_8.0.1-0_amd64.deb ... Unpacking vertica (8.0.1-0) ... Setting up vertica (8.0.1-0) ... Vertica Analytic Database V8.0.1-0 successfully installed on host ubuntu To complete your NEW installation and configure the cluster, run: /opt/vertica/sbin/install_vertica To complete your Vertica UPGRADE, run: /opt/vertica/sbin/update_vertica ---------------------------------------------------------------------------------- Important ---------------------------------------------------------------------------------- Before upgrading Vertica, you must backup your database. After you restart your database after upgrading, you cannot revert to a previous Vertica software version. ---------------------------------------------------------------------------------- View the latest Vertica documentation at http://my.vertica.com/docs/ Processing triggers for man-db (2.6.7.1-1ubuntu1) ... Processing triggers for libc-bin (2.19-0ubuntu6.6) ... With those files in place I'll launch Vertica's installer. $ sudo /opt/vertica/sbin/install_vertica \ --hosts = 127 .0.0.1 \ --failure-threshold = NONE The installer will create a dbadmin user that I'll used to operate the database for this exercise. Vertica Analytic Database 8.0.1-0 Installation Tool >> Validating options... Mapping hostnames in --hosts (-s) to addresses... >> Starting installation tasks. >> Getting system information for cluster (this may take a while)... Default shell on nodes: 127.0.0.1 /bin/bash >> Validating software versions (rpm or deb)... >> Beginning new cluster creation... successfully backed up admintools.conf on 127.0.0.1 >> Creating or validating DB Admin user/group... Password for new dbadmin user (empty = disabled) Successful on hosts (1): 127.0.0.1 Provided DB Admin account details: user = dbadmin, group = verticadba, home = /home/dbadmin Creating group... Adding group Validating group... Okay Creating user... Adding user, Setting credentials Validating user... Okay >> Validating node and cluster prerequisites... Prerequisites not fully met during local (OS) configuration for verify-127.0.0.1.xml: HINT (S0151): https://my.vertica.com/docs/8.0.x/HTML/index.htm#cshid=S0151 These disks do not have known IO schedulers: '/dev/mapper/ubuntu--vg- root' ('') = '' HINT (S0305): https://my.vertica.com/docs/8.0.x/HTML/index.htm#cshid=S0305 TZ is unset for dbadmin. Consider updating .profile or .bashrc WARN (S0170): https://my.vertica.com/docs/8.0.x/HTML/index.htm#cshid=S0170 lsblk (LVM utility) indicates LVM on the data directory. FAIL (S0020): https://my.vertica.com/docs/8.0.x/HTML/index.htm#cshid=S0020 Readahead size of (/dev/mapper/ubuntu--vg-root) is too low for typical systems: 256 < 2048 FAIL (S0310): https://my.vertica.com/docs/8.0.x/HTML/index.htm#cshid=S0310 Transparent hugepages is set to 'always'. Must be 'never' or 'madvise'. System prerequisites passed. Threshold = NONE >> Establishing DB Admin SSH connectivity... Installing/Repairing SSH keys for dbadmin >> Setting up each node and modifying cluster... Creating Vertica Data Directory... Updating agent... Creating node node0001 definition for host 127.0.0.1 ... Done >> Sending new cluster configuration to all nodes... Starting agent... >> Completing installation... Running upgrade logic No spread upgrade required: /opt/vertica/config/vspread.conf not found on any node Installation complete. Please evaluate your hardware using Vertica's validation tools: https://my.vertica.com/docs/8.0.x/HTML/index.htm#cshid=VALSCRIPT To create a database: 1. Logout and login as dbadmin. (see note below) 2. Run /opt/vertica/bin/adminTools as dbadmin 3. Select Create Database from the Configuration Menu Note: Installation may have made configuration changes to dbadmin that do not take effect until the next session (logout and login). To add or remove hosts, select Cluster Management from the Advanced Menu. The installer complained the dbadmin user doesn't have a time zone set for its newly created account so I'll set it manually before switching into the user. $ sudo sh -c "echo 'export TZ=Europe/Tallinn' >> \ /home/dbadmin/.bash_profile" $ sudo su - dbadmin I'll launch Vertica's Administration Tools to create a database. $ /opt/vertica/bin/adminTools This process is conducted through a series of text-based menus. I took the following steps through them. Skip past the licence file.

Accept the EULA.

Select option #6 "Configuration Menu".

Select option #1 "Create Database".

Create a database named "trips".

Create a password for the new database.

Select the "127.0.0.1" host for the database.

Set the catalogue and data pathnames to /home/dbadmin .

. After the database is created, go to the main menu and select "exit".

Loading 1.1 Billion Trips into Vertica I'll create an environment variable to store the password for the database. $ read VERTICA_PASS $ export VERTICA_PASS I'll then launch the command line client for Vertica. $ /opt/vertica/bin/vsql \ -U dbadmin \ -w $VERTICA_PASS The interface feels a lot like PostgreSQL. Running the \? command will output a list of available commands just like psql does. dbadmin=> \? See the Vertica Programmer's Guide for information on available commands. General \c[onnect] [DBNAME|- [USER]] connect to new database (currently "dbadmin") \cd [DIR] change the current working directory \q quit vsql \set [NAME [VALUE]] set internal variable, or list all if no parameters \timing toggle timing of commands (currently off) \unset NAME unset (delete) internal variable \! [COMMAND] execute command in shell or start interactive shell \password [USER] change user's password Query Buffer \e [FILE] edit the query buffer (or file) with external editor \g send query buffer to server \g FILE send query buffer to server and results to file \g | COMMAND send query buffer to server and pipe results to command \p show the contents of the query buffer \r reset (clear) the query buffer \s [FILE] display history or save it to file \w FILE write query buffer to file Input/Output \echo [STRING] write string to standard output \i FILE execute commands from file \o FILE send all query results to file \o | COMMAND pipe all query results to command \o close query-results file or pipe \qecho [STRING] write string to query output stream (see \o) Informational \d [PATTERN] describe tables (list tables if no argument is supplied) PATTERN may include system schema name, e.g. v_catalog.* \df [PATTERN] list functions \dj [PATTERN] list projections \dn [PATTERN] list schemas \dp [PATTERN] list table access privileges \ds [PATTERN] list sequences \dS [PATTERN] list system tables. PATTERN may include system schema name such as v_catalog, v_monitor, or v_internal. Example: v_catalog.a* \dt [PATTERN] list tables \dtv [PATTERN] list tables and views \dT [PATTERN] list data types \du [PATTERN] list users \dv [PATTERN] list views \l list all databases \z [PATTERN] list table access privileges (same as \dp) Formatting \a toggle between unaligned and aligned output mode \b toggle beep on command completion \C [STRING] set table title, or unset if none \f [STRING] show or set field separator for unaligned query output \H toggle HTML output mode (currently off) \pset NAME [VALUE] set table output option (NAME := {format|border|expanded|fieldsep|footer|null| recordsep|trailingrecordsep|tuples_only|title|tableattr|pager}) \t show only rows (currently off) \T [STRING] set HTML <table> tag attributes, or unset if none \x toggle expanded output (currently off) I'll create a table that will store the 1.1 billion taxi trips dataset. CREATE TABLE trips ( trip_id INTEGER , vendor_id VARCHAR ( 3 ), pickup_datetime DATETIME , dropoff_datetime DATETIME , store_and_fwd_flag VARCHAR ( 1 ), rate_code_id SMALLINT , pickup_longitude DECIMAL ( 18 , 14 ), pickup_latitude DECIMAL ( 18 , 14 ), dropoff_longitude DECIMAL ( 18 , 14 ), dropoff_latitude DECIMAL ( 18 , 14 ), passenger_count SMALLINT , trip_distance DECIMAL ( 6 , 3 ), fare_amount DECIMAL ( 6 , 2 ), extra DECIMAL ( 6 , 2 ), mta_tax DECIMAL ( 6 , 2 ), tip_amount DECIMAL ( 6 , 2 ), tolls_amount DECIMAL ( 6 , 2 ), ehail_fee DECIMAL ( 6 , 2 ), improvement_surcharge DECIMAL ( 6 , 2 ), total_amount DECIMAL ( 6 , 2 ), payment_type VARCHAR ( 3 ), trip_type SMALLINT , pickup VARCHAR ( 50 ), dropoff VARCHAR ( 50 ), cab_type VARCHAR ( 6 ), precipitation SMALLINT , snow_depth SMALLINT , snowfall SMALLINT , max_temperature SMALLINT , min_temperature SMALLINT , average_wind_speed SMALLINT , pickup_nyct2010_gid SMALLINT , pickup_ctlabel VARCHAR ( 10 ), pickup_borocode SMALLINT , pickup_boroname VARCHAR ( 13 ), pickup_ct2010 VARCHAR ( 6 ), pickup_boroct2010 VARCHAR ( 7 ), pickup_cdeligibil VARCHAR ( 1 ), pickup_ntacode VARCHAR ( 4 ), pickup_ntaname VARCHAR ( 56 ), pickup_puma VARCHAR ( 4 ), dropoff_nyct2010_gid SMALLINT , dropoff_ctlabel VARCHAR ( 10 ), dropoff_borocode SMALLINT , dropoff_boroname VARCHAR ( 13 ), dropoff_ct2010 VARCHAR ( 6 ), dropoff_boroct2010 VARCHAR ( 7 ), dropoff_cdeligibil VARCHAR ( 1 ), dropoff_ntacode VARCHAR ( 4 ), dropoff_ntaname VARCHAR ( 56 ), dropoff_puma VARCHAR ( 4 ) ) ORDER BY pickup_datetime , dropoff_datetime ; I'll then exit to the command line and execute the following to load the dataset in. The /home/mark/trips/ folder on my system has had it and its contents set to be owned by dbadmin . There are 56 gzip-compressed CSV files that make up the 1.1-billion-record dataset. $ time ( echo "COPY trips FROM '/home/mark/trips/trips_x*.csv.gz' GZIP DELIMITER ',' DIRECT;" | \ /opt/vertica/bin/vsql \ -U dbadmin \ -w $VERTICA_PASS ) The above took 3 hours 56 minutes and 43 seconds to complete. The dataset uses 153 GB of disk capacity when stored using Vertica's internal storage format. $ du -hs /home/dbadmin/trips/v_trips_node0001_data/ 153G /home/dbadmin/trips/v_trips_node0001_data/