Comparing Hosted Database Performance

While Compose is all about open source databases we also live in the cloud and so does Gary Sieling, who's written this Write Stuff article looking at ways to practically compare database performance across the net, in his case using SQL Server.

Many vendors of platform services resell server space in Amazon’s data centers, allowing web developers who choose a "platform as service" provider to mix and match vendors during or after building an application. If you choose to move a database for a production application from one provider to another, you need to run some tests to verify that you (and the vendor you’ve chosen) have set up the new database correctly.

If you move a database from one architectural platform to another, this requires careful analysis, as you don't have full access to the servers involved. For instance, it is common to use a shared database for development, and move to a service like Compose.io to get high availability and auto-scaling. Using the tests described below, you could compare the starting-state behavior of your application on Compose.io with an existing database, although the Compose.io installation will scale as you need new resources.

We can get much insight into the point-in-time behavior of the migrated system simply by running carefully selected SQL queries against an existing and new database. If we run a query many times, we'll see not only baseline performance, but the variance in query times. For instance, the following graph shows the performance comparison of a query that retrieves a lot of data, from a web server in the AWS us-east-1 datacenter pointing to the us-east-1 and us-west-1 data centers:

Latency from U.S. East to U.S. West

Not surprisingly, it is preferable that we configure our database and web servers to be co-located.

While you don't need a test suite to verify that you’ve built a database in the right data center, you still want to verify that traffic between the web host and database doesn’t leave the data center. For instance, if you set up a database with RDS (Amazon) directly, and access your database through Amazon’s DNS, traffic stays in the data center. If instead you use a connection string pointing to the provided IP address, the performance is quite different:

Latency to RDS using an IP address vs DNS

Many platform-as-a-service vendors offer a cheap or free plan for development purposes, which typically is a database you share with many other people. When considering upgrade options, you might use a better offering your vendor provides, such as Amazon RDS, a custom virtual machine, or a company like Compose.io. When comparing these options, the only common interface across all vendors to a database is SQL.

Many people evaluating the performance of SQL queries for the first time are quickly frustrated by the variability of timings because databases cache as much as they can in memory. Each run of a query may be faster or slower depending on how many disk pages are currently in RAM. At a minimum, it is helpful to run each query twice, but I found in testing that running queries dozens or hundreds of times exposes rare latency spikes, that can indicate underlying problems.

By running many test queries, we get a "fingerprint" for the infrastructure under test which we can compare to get a sense of the differences. Ideally you want to choose several queries that are representative of your use case and can stress one specific part of the infrastructure: memory, CPU, I/O, network. I found that it is helpful to identify a series of queries that run between 1 and 30 seconds - queries that are too fast only demonstrate how quickly you can acquire a connection, and queries that are too slow make iterative analysis difficult.

The following examples show several representative tests you can run.

Example 1: Connection creation

select 1;

Running a query that is essentially a no-op combines testing the speed of connection creation with network latency. If you rely on a "platform as a service" vendor for your web hosting, they may have set up connection pooling for you, so you'll get different results running this on their servers vs. your development workstation. For this case the only work required by the database is parsing the query, and the parsed query will likely be cached in RAM on the database.

For the charts in this article I’ve chosen to run each query 100 times against each of two databases. I've included lines for the average timing and two standard deviations. Unless otherwise mentioned the charts show a database in a shared host and a new RDS database in the same data center.

Before we proceed further it is instructive to run the connection test against the same database so you can see what it looks like when there is no difference:

Testing the same database from a developer workstation

The above chart was generated from a developer workstation connecting to the database - here is what it looks like when run for real on the production web server, against the old and new databases:

Testing the same database in AWS

Since there are occasionally timing spikes in this chart it suggests that a connection pool may be periodically dropping and reacquiring a connection, but otherwise these two environments seem so far equivalent.

Example 2: CPU

If you purchase a database run within a virtualized environment you are given a percentage of the CPU on the physical hardware, which is configured by the virtual machine's operating system.

To create a CPU bound query we can generate SHA hashes in a tight loop approximately a million times. The nice thing about this is that you can easily adjust the amount of time it takes to get good measurements (note that this query is for SQL Server). This requires the database to keep some data in RAM, so perceived performance differences between two environments may be caused by paging.

CPU usage - comparison of SQL Server on a shared host vs RDS

with cte (val, idx) as ( select hashbytes('SHA1', 'test'), 1 idx from ( select 1 idx union select 2 idx union select 3 idx union select 4 idx union select 5 idx ) a join ( select 1 idx union select 2 idx union select 3 idx union select 4 idx union select 5 idx ) b on 1 = 1 union all select hashbytes('SHA1', val), (idx + 1) idx from cte where idx < 32767 ) select count(*) from cte option (maxrecursion 32767)

As you can see, the new database has much better throughput between the two environments I'm testing, and to my surprise, much less variance in it's performance.

Often when database queries are bound by CPU they are doing large sorts or computations, which are best addressed through indexing or materializing computations in advance. If this doesn't work for your use case, Compose.io provides an option to increase scaling 1.5x, 2x, or 3x, to get the next level of CPU allocation (this would leave a corresponding ratio of disk space also available for use).

Example 3: Network

Network performance between web servers and databases can be a big time sink. Like the CPU example we can generate a large dataset, but opt to stream the data to the client to test the network:

with cte (val, idx) as ( select 'abcdefghijklmnop' val, 1 idx from ( select 1 idx union all select 2 idx union all select 3 idx union all select 4 idx union all select 5 idx ) a join ( select 1 idx union all select 2 idx union all select 3 idx union all select 4 idx union all select 5 idx ) b on 1=1 union all select val, (idx + 1) idx from cte where idx < 10000 ) select * from cte option (maxrecursion 32767) ;

In this case my old and new databases had similar behaviors, so to show a degenerate case, I'll show what happens if the database and web server are in separate data centers:

Latency from U.S. East to U.S. West

This is one of the most powerful network effects AWS has over Azure - once you've chosen to host with one provider you need to switch all parts of your infrastructure to avoid introducing major performance problems.

Example 4: I/O (Reading)

In my sample database I have a logging table with approximately 1 million rows. There is a column containing a log message, so we can force the database to pull each row from disk by making it check whether this column starts with a particular value. As long as there are no indexes on this column, it forces the database to look at every row and not take any shortcuts.

select count(*) from logs where message like 'email%';

The new database is much faster in my test case, likely due to moving from a non-SSD based environment to one with SSDs:

I/O Test - Reads - Shared environment vs RDS

Example 4: I/O (Writing)

To test the performance of write I/O I make a copy of the logging table and then roll it back. Some database are built to store table data, index data, and the undo information for a rollback on separate storage, so this test is a weighted average of all three.

This test does enough I/O to take some time but not enough to trigger the auto-scaling of Compose.io. This feature triggers based on each additional gigabyte of storage, so this test could be run for a longer time to see performance during and after additional resources are added to the database.

select * into logs_test from logs; rollback;

I/O Test - Shared environment vs RDS

Concluding Notes

I’ve found this style of analysis to be helpful during large database deployments and migrations, as it allows you to make educated guesses about database behaviors and spot configuration problems early. If you were to automate periodic tests you would get a benchmark for how your system behaves over time - for instance with Compose.io, some of the above tests would improve as your application scales and more resources are allocated to your deployment.

The charts above were generated using D3.js - the source code is on Github: https://github.com/garysieling/AWS-DB-tester. While the examples there use C# and SQL Server, they can be easily adapted to other languages and database products as needed.