Database performance is one of the major bottlenecks for most web applications. Most web developers are not database experts (and I’m no exception), there are however several basic methods to analyze and optimize database performance without resorting to expert consultants (such as those, whose founders blogs are an invaluable source of MySQL knowledge).



The Performance Equation

Database performance is affected by many different variables – the running machine specs, OS, database engine and configuration, table schema and the queries running against it. Since I’m dealing mostly (only?) with MySQL, this article covers it mainly (though it is probably relevant to a large degree for other engines). As for OS and machine specs, I’ll take them out of the equation as I’m interested in optimizing relative performance on the same machine.

Basically I’m interested in optimizing:

The structure of my database tables (schema)

The structure of my application-level queries (SELECT, UPDATE, INSERT, DELETE)

Profiling Database Performance

Blindly optimizing queries and database schema is counter-productive. First we should know what we should be optimizing, and for that we need data. The act of gathering data for optimization is called profiling or performance analysis.

The first step I take when profiling database performance for a web app is to measure the running time of all the queries running in it. Absolute run time of a query is not necessarily a good measure of how optimized / performant it is, since some queries are naturally more complex or pull more data – It will give me a good idea however of where to start improving the response time of the application I’m optimizing. My main goal is not specific query performance, but overall system performance.

Measuring the run time of a query can be done with simple timers using microtime() calls (Check out this post for an abstraction), running it in a tool that automatically provides such statistics (phpMyAdmin for example) or using an integrated profiler.

I am using the Zend_Db_Profiler, which is convenient for me since I’m using the Zend Framework and all database access converges to a Zend_Db_Adapter connection. The profiler basically uses the microtime() approach but integrates it transparently into all of the queries without me having to wrap them one by one.

Usage is pretty simple. First you need to pass an extra parameter to your Zend_Db_Adapter instance to activate profiling:

$params = array( 'host' => 'localhost', 'username' => 'dbusername', 'password' => 'dbpassword', 'dbname' => 'dbname', 'profiler' => true // turn on profiler, disabled by default ); $db = Zend_Db::factory('PDO_MYSQL', $params);

The Zend Framework manual shows some advanced usage (such as profiling directly into firebug, which is very convenient), however for our purposes we simply want to dump the queries and their run time.

$profiler = $db -> getProfiler(); $profile = ''; foreach($profiler -> getQueryProfiles() as $query) { $profile .= $query -> getQuery() . "

" . 'Time: ' . $query -> getElapsedSecs(); } echo $profile;

This snippet goes of course after the view has been rendered and all queries executed.

Using a relevant dataset

It’s important to know whether the data you are profiling against is relevant for when you believe you will start hitting performance issues. Granted, you can never really know when will that be, however it is always preferable to work with a dataset that resembles your (projected) production environment.

I will walk through a specific use-case I recently went through while preparing for the beta release of my startup, business platform Octabox (since defunct).

Running the profiler on one of the views in the application produced the following output on my development machine:

SELECT `history`.`id`, `history`.`type`, `history`.`action`, `tags`.`color`, `tags`.`id` AS `tag_id` FROM `history` LEFT JOIN `tags` ON tags.id=history.tag_id AND tags.user_id=1 WHERE (history.user_id=1) AND (history.logged_on >= '2008-09-27 00:00:00') ORDER BY `history`.`logged_on` DESC

Time: 0.00269

SELECT `tasks`.`id`, `tasks`.`task`,`lists`.`id` AS `list_id` FROM `tasks` INNER JOIN `lists` ON lists.id=tasks.list_id WHERE (tasks.done IS NULL) AND (tasks.due IS NOT NULL) AND (tasks.user_id=1) ORDER BY `tasks`.`due` ASC

Time: 0.000592

SELECT COUNT(*) AS `count` FROM `dispatch` WHERE (dispatch.user_id=1) AND (dispatch.status=0)

Time: 0.00044

At first sight it would appear that the first query would be our immediate suspect for optimization, though the gains would not be great (completes in just over 0.002 seconds). Switching to a database I’ve prepared before hand with plenty of dummy data (~1.1Gb, some tables over 5M rows) the output looks very different:

SELECT `history`.`id`, `history`.`type`, `history`.`action`, `tags`.`color`, `tags`.`id` AS `tag_id` FROM `history` LEFT JOIN `tags` ON tags.id=history.tag_id AND tags.user_id=1 WHERE (history.user_id=1) AND (history.logged_on >= '2008-09-27 00:00:00') ORDER BY `history`.`logged_on` DESC

Time: 0.000426

SELECT `tasks`.`id`, `tasks`.`task`,`lists`.`id` AS `list_id` FROM `tasks` INNER JOIN `lists` ON lists.id=tasks.list_id WHERE (tasks.done IS NULL) AND (tasks.due IS NOT NULL) AND (tasks.user_id=1) ORDER BY `tasks`.`due` ASC

Time: 58.16 //Hmmm… problem?

SELECT COUNT(*) AS `count` FROM `dispatch` WHERE (dispatch.user_id=1) AND (dispatch.status=0)

Time: 0.000322

A query that took just over 0.0005 seconds on a small database, took almost a minute to complete on a bigger one. Surprisingly enough the other queries remained blazingly fast (even faster than on my development machine with a small database, which is a credit to our server’s power).

Optimizing The Errant Query

So I’ve found a very problematic query to say the least. Taking 58.1 seconds to complete is obviously not acceptable for any real time application. Running EXPLAIN against the query revealed the following execution plan:

id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE tasks index user_id user_id 9 NULL 1868081 Using where 1 SIMPLE lists eq_ref PRIMARY PRIMARY 4 tasks.list_id 1

The EXPLAIN results show the most immediate problem with the query – it was performing a join against every row in a 1.8M row table.

Examining the details of the query plan, it appears both selects are using an index, the first one limited by a WHERE (on the surface at least – since all rows were scanned). Since despite using an index the entire table was scanned very slowly, I tried to see the query will respond if I removed the index. Using IGNORE INDEX(user_id), I re-ran the query (this time in the Mysql command line):

304 rows in set (1.53 sec)

Wow, that’s a big difference for not using an index. Running EXPLAIN on this reveals:

id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE tasks ALL NULL NULL NULL NULL 1866244 Using where; Using filesort 1 SIMPLE lists eq_ref PRIMARY PRIMARY 4 tasks.list_id 1

Despite using filesort to sort the query, it was running almost 30 times faster than before. By forcing it not to use an index which was not selective enough (i.e. not at all), the query ran a full table scan instead and completed much faster due to a better execution plan.

So we’ve improved greatly, but still not enough for my needs. Having one query taking over a second to complete when all the others are completing in sub 0.01 second times means it will be a bottleneck as the database grows (remember, the same query completed in just over 0.0005 seconds on a much smaller database).

Examining the structure of the table in question (tasks) revealed another interesting revelation – there is no index on the user_id column. So what happened? it appears there used to be an index on the user_id column, however it was removed just before all the dummy data was inserted into the database. Running ANALYZE TABLE tasks repaired the key information to the current state (no index on user_id).

I wanted to try the original query, this time with an actual index on the filtering column (user_id). After adding the index, I re-ran the query:

(304 total, Query took 0.0021 sec)

Much better! but I was not done. Running EXPLAIN yet again reveals the following execution plan:

id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE tasks ref user_id user_id 4 const 368 Using where; Using filesort 1 SIMPLE lists eq_ref PRIMARY PRIMARY 4 tasks.list_id 1

So indeed, using an actual index was a big help, filtering the set to only 368 rows before processing the rest of the query. However, you’d notice it is running a filesort which we’d like to avoid. Preferably, I could set up an index combination that both filters and sorts using an index.

Creating an index composed of both the ‘user_id’ and ‘due’ columns got it right on the head, leading to:

(304 total, Query took 0.0008 sec)

Not as dramatic as previous improvements, but still three times better than the last iteration.

Final Words

The entire process took several hours of head scratching to arrive at all the results I’ve shown here. I managed to take a query that would cripple our server (58 seconds for one completion!) to an extremely fast one (0.0008 seconds against a 1.8M row table), through the use of some basic profiling and examining the execution plan.

Being an edge case (a non existent index used to filter against a 1.8M row table…), I gained valuable experience on potential pitfalls and on reading between the lines in the execution plan.

Some technical gibberish:

All queries were run with SQL_NO_CACHE to prevent caching from tampering with the results. I ran each query at least 5 times to make sure I was getting the right readings.

The specs of the server machine that ran the queries are as follows:

Intel \ 2.4 GHz 1066FSB – Conroe \ Xeon 3060 (Dual Core)

2 x Generic \ 1024 MB \ DDR2 667 ECC

2 x Maxtor \ 146GB:SAS:10K RPM \ Atlas 10K – SAS

CentOS Enterprise Linux – x86_64 – OS ES 5.0

Running PHP 5.2.6 with MySQL 5.0.45 on Apache 2.2.4

To know when the next article is published, please subscribe to new articles using your Email below or follow me on Twitter.