Five simple ways to tune your LAMP application

Major web properties like Wikipedia, Facebook, and Yahoo! use the LAMP architecture to serve millions of requests a day, while web application software like Wordpress, Joomla, Drupal, and SugarCRM use this architecture to enable organizations to deploy web-based applications easily.

The strength of the architecture lies in its simplicity. While stacks like .NET and Java™ technology may use massive hardware, expensive software stacks, and complex performance tuning, the LAMP stack can run on commodity hardware, using open source software stacks. Because the software stack is a loose set of components rather than a monolithic stack, tuning for performance can be a challenge since each component needs to be analyzed and tuned.

However, there are several simple performance tasks that can have a huge impact on the performance of websites of any size. In this article, we will look at five such tasks designed to optimize LAMP application performance. These items should require very little if any architecture changes to your application, making them safe and easy options to maximize the responsiveness and hardware requirements for your web application.

Use an opcode cache

The easiest thing to boost performance of any PHP application (the "P" in LAMP, of course) is to take advantage of an opcode cache. For any website I work with, it's the one thing I make sure is present, since the performance impact is huge (many times with response times half of what they are without an opcode cache). But the big question most people new to PHP have is why the improvement is so drastic. The answer lies in how PHP handles web requests. Figure 1 outlines the flow of a PHP request.

Figure 1. PHP request

Since PHP is an interpreted language rather than a compiled one like C or the Java language, the entire parse-compile-execute steps are carried out for every request. You can see how this can be time- and resource-consuming, especially when scripts rarely change between requests. After the script is parsed and compiled, the script is in a machine parseable state as a series of opcodes. This is where an opcode cache comes into effect. It caches these compiled scripts as a series of opcodes to avoid the parse and compile steps for every request. You can see how such a workflow would work in Figure 2.

Figure 2. PHP request that utilizes an opcode cache

So when the cached opcodes of a PHP script exists, we can skip by the parse and compile steps of the PHP request process and directly execute the cache opcodes and output the results. The checking algorithm takes care of situations where you may have made a change to the script file, so on the first request of the changed script, the opcodes will be automatically recompiled and cached then for subsequent requests, replacing the cached script.

Opcode caches have long been popular for PHP, with some of the first ones coming about back in the heyday of PHP V4. Today there are a few popular choices that are in active development and being used:

Alternative PHP Cache (APC) is probably the most popular opcode cache for PHP (see Related topics). It is developed by several core PHP developers and has had major contributions to it, gaining its speed and stability from engineers at Facebook and Yahoo! It also sports several other speed improvements for handling PHP requests, including a user cache component we'll look at later in this article.

Wincache is an opcode cache that is most actively developed by the Internet Information Services (IIS) team at Microsoft® for use only on Windows® using the IIS web server (see Related topics). It was developed predominately in an effort to make PHP a first-class development platform on the Windows-IIS-PHP stack, as APC was known not to work well on that stack. It is very similar to APC in function and sports a user cache component, as well as a built-in session handler to leverage Wincache directly as a session handler.

eAccelerator is a fork of one of the original PHP caches, the Turck MMCache opcode cache (see Related topics). Unlike APC and Wincache, it is only an opcode cache and optimizer, so it does not contain the user cache components. It is fully compatible across UNIX® and Windows stacks, and it is quite popular for sites that don't intend to leverage the additional features APC or Wincache provide. This is often the case if you will be using a solution like memcache to have a separate user cache server for a multi-web server environment.

Without a doubt, an opcode cache is the first step in speeding up PHP by removing the need to parse and compile a script on every request. Once this first step is completed, you should see an improvement in response time and server load. But there is more you can do to optimize PHP, which we'll look next.

Optimize your PHP setup

While implementing an opcode cache is a big bang for performance improvement, there are a number of other tweaks you can do to optimize your PHP setup, based upon the settings in your php.ini file. These settings are more appropriate for production instances; on development or testing instances, you may not want to make these changes as it can make it more difficult to debug application issues.

Let's take a look at a few items that are important to help performance.

Things that should be disabled

There are several php.ini settings that should be disabled, since they are often used for backward-compatibility:

register_globals — This functionality used to be the default before PHP V4.2, where the incoming request variables are automatically assigned to normal PHP variables. Other than the major security issues in doing this (having unfiltered incoming request data being mixed with normal PHP variable content), there is also the overhead of having to do this on every request. So turning this off will keep your application safer and improve performance.

— This functionality used to be the default before PHP V4.2, where the incoming request variables are automatically assigned to normal PHP variables. Other than the major security issues in doing this (having unfiltered incoming request data being mixed with normal PHP variable content), there is also the overhead of having to do this on every request. So turning this off will keep your application safer and improve performance. magic_quotes_* — This is another relic of PHP V4, where incoming data would automatically escape risky form data. It was designed to be a security feature to help sanitize incoming data before having it sent to a database, but it isn't very effective since it doesn't protect users against the more common types of SQL injection attacks out there. Since most database layers support prepared statements that handle this risk much better, turning this off will again remove an annoying performance problem.

— This is another relic of PHP V4, where incoming data would automatically escape risky form data. It was designed to be a security feature to help sanitize incoming data before having it sent to a database, but it isn't very effective since it doesn't protect users against the more common types of SQL injection attacks out there. Since most database layers support prepared statements that handle this risk much better, turning this off will again remove an annoying performance problem. always_populate_raw_post_data — This is really only needed if for some reason you need to look at the entire payload of the incoming POST data unfiltered. Otherwise, it's just storing in memory a duplicate copy of the POST data, which isn't needed.

Disabling these options on legacy code can be risky, however, since they may be depending upon them being set for proper execution. Any new code should not be developed depending on these options being set, and you should look for ways to refactor your existing code away from using them if possible.

Things that should be enabled or have its setting tweaked

There are some good performance options you can enable in the php.ini file to give your scripts a bit of a speed boost:

output_buffering — You should make sure this is on, since it will flush output back to the browser in a large chunk rather than on every echo or print statement, where the latter can very much slow down your request response time.

— You should make sure this is on, since it will flush output back to the browser in a large chunk rather than on every or statement, where the latter can very much slow down your request response time. variables_order — This directive controls the order of the EGPCS ( Environment , Get , Post , Cookie , and Server ) variable parsing for the incoming request. If you aren't using certain superglobals (such as environment variables), you can safely remove them to gain a small speedup from not having to parse them on every request.

— This directive controls the order of the EGPCS ( , , , , and ) variable parsing for the incoming request. If you aren't using certain superglobals (such as environment variables), you can safely remove them to gain a small speedup from not having to parse them on every request. date.timezone — This is a directive that was added in PHP V5.1 to set the default timezone for use with the DateTime functions introduced then. If you don't set this in the php.ini file, PHP will do a number of system requests to figure out what it is, and in PHP V5.3, a warning will be emitted on every request.

These are considered "low-hanging fruit" in terms of settings that should be configured on your production instance. There is one more thing you should look at as far as PHP in concerned. This is the use of require() and include() (as well as their siblings require_once() and include_once() ) in your application. These optimize your PHP configuration and code to prevent unneeded file status checks on every request, which can slow down response times.

Manage your require() s and include() s

File status calls (meaning calls made to the underlying file system to check for the existence of a file) can be quite costly in terms of performance. One of the biggest culprits of file stats comes in the form of the require() and include() statement, which are used to bring code into your script. The sibling calls of require_once() and include_once() can be more problematic, as they not only need to verify the existence of the file, but also that it hasn't be included before.

So what's the best way to deal with this? There are a few things you can do to speed this up.

Use absolute paths for all require() and include() calls. This will make it more clear to PHP the exact file you are wishing to include, thus not needing to check the entire include_path for your file.

and calls. This will make it more clear to PHP the exact file you are wishing to include, thus not needing to check the entire for your file. Keep the number of entries in the include_path low. This will help for situations where it's difficult to provide an absolute path for every require() and include() call (often the case in large, legacy applications) by not checking locations where the file you are including won't be.

APC and Wincache also have mechanisms for caching the results of file status checks made by PHP, so repeated file-system checks are not needed. They are most effective when you keep your include file names static rather than variable-driven, so it's important to try to do this whenever possible.

Optimize your database

Database optimization can become a pretty advanced topic quickly, and I don't have nearly the space here to do this topic full justice. But if you are looking at optimizing the speed of your database, there are a few steps that you should take first which should help the most common issues encountered.

Put the database on its own machine

Database queries can become quite intense on their own, often pegging a CPU at 100 percent for doing simple SELECT statement with reasonable size datasets. If both your web server and database server are competing for CPU time on a single machine, this will definitely slow down your request. Thus I consider it a good first step to have the web server and database server on separate machines and be sure you make your database server the beefier of the two (database servers love lots of memory and multiple CPUs).

Properly design and index tables

Probably the biggest issues with database performance come as a result of poor database design and missing indexes. SELECT statements are usually overwhelmingly the most common types of queries run in a typical web application. They are also the most time-consuming queries run on a database server. Additionally, these kinds of SQL statements are the most sensitive to proper indexing and database design, so look to the following pointers for tips for optimal performance.

Make sure each table has a primary key. This provides the table a default order and a fast way to join the table against other tables.

Make sure any foreign keys in a table (that is, keys that link a record to a record in another table) are properly indexed. Many databases will enforce constraints on these keys automatically so that value actually matches a record in the another table, which can help this out.

Try to limit the number of columns in a table. Too many columns in a table can make the scan time for queries much longer than if there are just a few columns. In addition, if you have a table with many columns that aren't typically used, you are also wasting disk space with NULL value fields. This is also true with variable size fields, such as text or blob, where the table size can grow much larger than needed. In this case, you should consider splitting off the additional columns into a different table, joining them together on the primary key of the records.

Analyze the queries being run on the server

The best tool for improving database performance is analyzing what queries are being run on your database server and how long they are taking to run. Just about every database out there has tools for doing this. With MySQL, you can take advantage of the slow query log to find the problematic queries. To use it, set the slow_query_log setting to 1 in the MySQL configuration file, then log_output to FILE to have them logged to the file hostname-slow.log. You can set the long_query_time threshold to how long the query must run in number of seconds to be considered a "slow query." I'd recommend setting this to 5 seconds at first and move it down to 1 second over time, depending upon your data set. If you look at this file, you'll see the queries detailed similar to Listing 1.

Listing 1. MySQL slow query log

/usr/local/mysql/bin/mysqld, Version: 5.1.49-log, started with: Tcp port: 3306 Unix socket: /tmp/mysql.sock Time Id Command Argument # Time: 030207 15:03:33 # User@Host: user[user] @ localhost.localdomain [127.0.0.1] # Query_time: 13 Lock_time: 0 Rows_sent: 117 Rows_examined: 234 use sugarcrm; select * from accounts inner join leads on accounts.id = leads.account_id;

The key thing we want to look at is Query_time , which shows how long the query took. Another thing to look at is the numbers of Rows_sent and Rows_examined , since these can point to situations where a query might be written incorrectly if it's looking at too many rows or returning too many rows. You can delve deeper into how a query is written by prepending EXPLAIN to the query, which will return the query plan instead of the result set, as show in Listing 2.

Listing 2. MySQL EXPLAIN results

mysql> explain select * from accounts inner join leads on accounts.id = leads.account_id; +----+-------------+----------+--------+--------------------------+---------+--- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+--------+--------------------------+---------+-------- | 1 | SIMPLE | leads | ALL | idx_leads_acct_del | NULL | NULL | NULL | 200 | | | 1 | SIMPLE | accounts | eq_ref | PRIMARY,idx_accnt_id_del | PRIMARY | 108 | sugarcrm.leads.account_id | 1 | | +----+-------------+----------+--------+--------------------------+---------+--------- 2 rows in set (0.00 sec)

The MySQL manual dives much deeper into the topic of the EXPLAIN output (see Related topics), but the big thing I look at is places where the 'type' column is 'ALL', since this requires MySQL to do a full table scan and doesn't use a key for a lookup. These help point you to places where adding indexes will significantly help query speed.

Effectively cache data

As we saw in the previous section, databases can easily be the biggest pain point of performance in your web application. But what if the data you are querying doesn't change very often? In this case, it may be a good option to store those results locally instead of calling the query on every request.

Two of the opcode caches we looked at earlier, APC and Wincache, have facilities for doing just this, where you can store PHP data directly into a shared memory segment for quick retrieval. Listing 3 provides an example on how to do this.

Listing 3. Example of using APC for caching database results

<?php function getListOfUsers() { $list = apc_fetch('getListOfUsers'); if ( empty($list) ) { $conn = new PDO('mysql:dbname=testdb;host=127.0.0.1', 'dbuser', 'dbpass'); $sql = 'SELECT id, name FROM users ORDER BY name'; foreach ($conn->query($sql) as $row) { $list[] = $row; } apc_store('getListOfUsers',$list); } return $list; }

We'll only need to do the query one time. Afterward, we push the result into the APC user cache under the key getListOfUsers . From here on out, until the cache expires, you will be able to fetch the result array directly out of cache, skipping over the SQL query.

APC and Wincache aren't the only choices for a user cache; memcache and Redis are other popular choices that don't require you to run the user cache on the same server as the Web server. This gives added performance and flexibility, especially if your web application is scaled out across several Web servers.

Conclusion

In this article, we looked at five simple ways to tune your LAMP application for better performance. We looked at techniques not only at the PHP level, by leveraging an opcode cache and optimizing the PHP configuration, but also looked at optimizing your database design for proper indexing. We also took a look at leveraging a user cache (using APC as an example) to show how you can avoid repeated database calls when the data doesn't change very often.

Downloadable resources

Related topics