Régis Leroy Regis is a web performance and security auditor in France. In the first part of this two-part tutorial, he walks through what makes a HTTP server tick and gives tips on how to get more out of Apache. In the second part, he discusses strategies for adapting Apache behaviors to nginx.

1 Introduction

We'll use a PHP application, Drupal, as a base example of how a migration from Apache to Nginx could work, but a lot of information presented here could be applied to other web application technologies.

To make the example more realistic, we will use the Drupal CMS as a standard of PHP application. Not because this application is the best of all existing applications, but especially because Drupal 7, as a semi-object semi-procedural entirely-stored-in-document-root piece, is a good example of debt-based PHP application where a good HTTP server configuration can have a great impact. I'm pretty sure you will find in this study valuable things even for other PHP applications.

This is also an advanced study of Drupal on Apache, as you will not always have the ability to migrate to Nginx, but you should really study your web server configuration at all times. And showing an advanced Nginx configuration without showing the equivalent thing on Apache would be unfair.

1.1 Avoid the big mistakes

This text is quite long. Some of the content is Drupal oriented, we'll talk about Apache, PHP-fpm, and Nginx; if you do not have the time to read it in its entirety, you should at least know some things you should not try to do to migrate a PHP application to Nginx:

- You should not try to translate mod_rewrite rules to Nginx rewrite ; avoid the online converters. try_files and locations will solve most of your problems.

rules to Nginx ; avoid the online converters. and will solve most of your problems. - You should not map all .php extensions in locations to PHP executions; use it to debug PHP execution but never allow PHP execution for all locations which contain this extension.

extensions in locations to PHP executions; use it to debug PHP execution but never allow PHP execution for all locations which contain this extension. - You should not trust examples of configurations containing 10 or 15 lines solving all the problems. Real world configurations have more things to handle than just being a transparent proxy between the browser and your PHP engine.

I think the right way of doing such a migration is to understand how HTTP servers work, and to understand the needs of your PHP application. So let's start the scroll into details.

1.2 Drupal with Nginx

If you have a Drupal project and want to test Nginx you need to know about the drupal with Nginx github project by @perusio. This project contains a very big Nginx for Drupal configuration. The configuration is in fact a set of different configurations, where you need to read the files and comment or uncomment lines to match your needs. There is a huge number of good recipes in this project and we will use some of these recipes in this document.

It can be quite hard for beginners to understand all these rules, and it will be even harder to add specific rules if you do not understand how these settings work. We will use a simpler version in this document, with less features, to get something that you can manage and understand completely.

1.3 HTTP 101

An HTTP server is working on an HTTP request, and acts on several different levels. We need to be sure that we use the same words for the same things. It's very easy to work on HTTP web applications without learning the details, as we have a lot of good abstractions in place, but if you need to alter a web server configuration it's better to ensure that you have the right words for each piece.

We start with one important thing, the URL (or URI). For those who really need details (others can go the the schema below), a URI is a Uniform Resource Identifier on the network; a URI could be an URL (Uniform Resource Locator) or an URN with a N for Name, but a URN provides nothing to really get the requested resource. In the HTTP world our URI is a URL because it contains a protocol ( http or https ) and this is a valid way to get the final resource.

https://www.example.com/some/path/file.php?foo=bar&response=42 -------+--------------+------------------+-------------------+ | | | | | domain | query string protocol location

Note here that the query string is a very special part of a URL; both Apache and Nginx are hard to use on query string analysis. Your application should use locations and limit usage of query strings for very specific situations (like cache keys, pagination, etc). So called SEO-URL or "clean url" in the Drupal language are not just better for search engines, they are also far easier to manage on the HTTP server side.

For example, query strings are not available in mod_rewrite's RewriteRule , only on RewriteCond , and without the url-decoding phase. They are also not available in Nginx's locations .

More HTTP vocabulary: requests and responses , bodies and headers . HTTP, if you do not use the SSL layer of HTTPS, is a very simple TCP/IP protocol in clear text. You can test it with simple tools, like telnet (this won't be available with HTTP 2.0, so practice now). Let's make a simple HTTP request to identify the headers and body parts:

$ telnet 127.0.0.1 80 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. GET /foo.txt HTTP/1.1 <-- here I start typing <METHOD> <URL> <PROTOCOL VERSION> Host: www.example.com <-- request HEADERS (on per line), here I had one requested header <-- empty line, this is the end for a GET request HTTP/1.1 301 Moved Permanently Date: Mon, 29 Sep 2014 09:32:53 GMT Server: Apache/2.4.9 (Unix) Location: http://example.com/foo.txt Content-Length: 234 Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>301 Moved Permanently</title> </head><body>

After the request, which was a simple GET request without a body part (the body part is used for POST requests), we received the server response in three parts:

<PROTOCOL VERSION> <STATUS> <TEXT STATUS> <Several Lines of server response headers ... ... > BLANK LINE < The response BODY >

In this example the status is a redirection, so we have a Location header. Note that the response body could start with a first line containing numbers, in case of chunked content.

On the request side, a POST request could contain a body; a GET could not, but both could contain query strings arguments. The real difference between these methods is that no GET requests should be allowed to make any change server-side -- never, not even closing the session.

This was a very simple example. If you make web applications you should really know some of the HTTP response status codes; this is the minimum list:

- 200 Success

Success - 302 Temporary redirection

Temporary redirection - 301 Permanent redirection (never start with that, test with 302 before,)

Permanent redirection (never start with that, test with 302 before,) - 400 Client request error

Client request error - 401 Authentication required

Authentication required - 403 Access forbidden

Access forbidden - 404 Page not found

Page not found - 500 Server error (misconfiguration of the HTTP server or application unhandled error)

Server error (misconfiguration of the HTTP server or application unhandled error) - 502 Bad gateway or proxy error, which may be a timeout or error of a proxyfied server, or from a delegated service like php-fpm

Bad gateway or proxy error, which may be a timeout or error of a proxyfied server, or from a delegated service like php-fpm - 503 Gateway timeout. Almost the same as 502; in case of a timeout we should have a 503, but it's not always the case in reality (see 502).

Make it a habit to track the HTTP communication with your favorite browser's developer tool.

When working on the HTTP server configuration you will especially love the curl request extraction provided on Chrome; that will help you manage a connected session on the command line if you need some cookies.

Using the -i flag (headers) with curl, you will be able to track this full communication without a browser. This could help you debugg a hard-to-reach HTTP server (when load balancers, virtual IPs and firewalls are in play it's sometimes hard to know who you are talking to with your browser). See how we altered the Accept-Encoding header to avoid receiving a gzip compressed response on the terminal:

$ curl -i 'http://www.example.com/' -H 'Pragma: no-cache' -H 'Accept-Encoding: none' -H 'Accept-Language: en-US,en;q=0.8,fr;q=0.6,id;q=0.4' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' -H 'Referer: http://www.example.com/foo' -H 'Cookie: zorglub=37; zorglub2=37; NO_CACHE=Y; SSESS09f4b7b1aacb8517c3f60c929221eee5=CtfD9-VFa6nxoB0Xe4QFU3IiqONZmPoAOZv1VeFXpt4; has_js=1' -H 'Connection: keep-alive' -H 'Cache-Control: no-cache' 127.0.0.1

We'll see later the Keepalive feature; of course some even more advanced features exist (like tunneling or ranges), but this is just a survival kit.

One last thing to know is cookies. HTTP is a stateless protocol; every request is independent of the previous. To track session states, or persistent information on the communication protocol, the server could add a Set-Cookie header on the response.

The browser will then store this cookie information (containing one or more cookies, a name, a valid domain or subdomain, a valid path, and a duration). On the next query to the server -- if the domain name, path and duration matches -- the browser will add the Cookie header in the request.

These cookies are mostly used by applications (to load some session data on PHP for example), but they can have some impact on the HTTP server side. If you have a Vary header in the response with the Cookie keyword it means that the page content could vary based on the presence or absence of a cookie in the request. Most reverse proxies will avoid using cached pages if you have a cookie in the request.

We'll keep things simple here, but both web developers and admins managing HTTP servers should really understand the cookie's issues.

2 HTTP servers

2.1 The HTTP server mission

Both Apache and Nginx are HTTP servers. While studying the migration from one of these servers to the other we'll try to understand better the role of this piece in a web application.

A web application is based upon HTTP and XmlHTTP requests (when using AJAX), usually limited to GET, POST and HEAD requests. In a very simple first look at the interactions between the PHP code and the HTTP server we could say that the PHP code starts its job after one request has been completely received by the HTTP server. The application is then responsible for building the response content, sometimes trying to alter some HTTP headers. After that, the HTTP server takes the response and send it to the requester.

But this is really too simple. The job of the HTTP server is in fact:

- Managing all incoming requests. The main difference with the PHP part is that here we work with parallel user requests -- a lot of requests could be present at the same time.

-- a lot of requests could be present at the same time. - Handling the TCP/IP connection , sometimes with a keep-alive duration.

, sometimes with a keep-alive duration. - Identifying the requested domain name , and using the right configuration for this domain.

, and using the right configuration for this domain. - Managing the protocol (usually http or https).

(usually http or https). - Mapping the requested location (which is URL decoded) to the real filesystem directories and file tree; let's call that file mapping . Here we will encounter symbolic links and aliases .

directories and file tree; let's call that . Here we will encounter symbolic links and . - Sending back the requested file (usually this is the final step for static files). This could be done using chunks or ranges .

(usually this is the final step for static files). This could be done using or . - Managing errors and sometimes a first level of authentication (HTTP authentication, the ugly little login/password popup).

and sometimes a first level of (HTTP authentication, the ugly little login/password popup). - Managing internal redirections on some locations or events (like the index.php fallback redirector).

on some locations or events (like the index.php fallback redirector). - Launching external applications, effectively delegating the response generation (CGI, PHP modules, FastCGI connectors).

the response generation (CGI, PHP modules, FastCGI connectors). - Behaving as a proxy for another HTTP server.

- Some advanced things like load balancing , reverse proxy cacheing , etc.

, reverse proxy , etc. - Compression of the response, negotiated with the client (from the Accept-Encoding headers).

of the response, negotiated with the client (from the headers). - Usually adding some security around the application, with custom HTTP headers, dedicated modules, or even just by ensuring the application contexts (like avoiding PHP code execution from uploaded images).

In my mind the HTTP server is an important piece of the web application, especially in production environments. But it's not an easy part, and most developers just rely on 'magic' things from the beard-team to get the application live in production. My goal here is to take a progressive look at how things work in Apache and Nginx so you will be able to choose one tool for good reasons.

2.2 - Managing incoming requests

This is one of the biggest difference between Apache and Nginx. Nginx was created to solve problems encountered by old web servers like Apache in the way they handle massive amounts of incoming requests.

2.2.1 - Apache mpms, worker, prefork and event

On the Apache side, the classical way of managing incoming requests is the prefork mpm. The acronym mpm means 'multi-process manager'. With this prefork model, Apache is managing several child processes, and each process could be used to manage one HTTP request. Creating a fork process is a long task so you can pre-fork the child processes. If you know your server can handle 100 Apache processes (because you have checked the average memory size of these processes), it's effectively a good idea to always have this number of process awaiting orders or working. You could also let Apache increase or decrease the number of children, but that's a dangerous thing: if a big number of HTTP requests comes in in a small amount of time, Apache will be very slow at creating enough forks to manage all these requests.

Even with a low number of activated modules in Apache you will see that the memory footprint of these forks are big. If we take our Drupal PHP application, running with the Apache mod_php module, each fork could even take up to the PHP memory_limit setting. This setting is usually high with Drupal (128 megabytes or more), because rebuilding all wiped caches could take a lot of memory. Hopefully you should never have all of your Apache processes rebuilding the caches at the same time. But Drupal is usually quite bad at managing memory, especially if you use a lot of views , and at the end you could get an average of 30 or 40 megabytes (or more) per Apache process. So...if you need 100 processes, able to manage 100 parallel HTTP requests, that's 100*40Mb, something around 4 gigabytes.

With the prefork model you cannot manage a big number of parallel requests. Then comes the worker mpm. In this mpm Apache is managing less processes (usually 4 or 5 by default) and manage threads inside each processes (by default between 25 and 75 threads per process). This allows for more parallel requests; but in the case of mod_php you should not use this mpm. The PHP module, and especially the PHP extensions, have never been completly stable and safe in multi-threaded environments. On most distributions, activating mod_php will automatically activate the prefork mpm.

The main problem is always the memory footprint of PHP in Apache; it makes things hard to predict and hard to monitor. A good solution to that is to remove PHP from Apache and make it run as an autonomous daemon.

That's the goal of php-fpm. Some people have tried running PHP as CGI without the use of php-fpm; from my own experiences this is quite unstable and you may end up with silent crashes of PHP processes (crashed but still running with high cpu). php-fpm is not simply PHP-as-FastCGI, it's a daemon able to manage several pools of applications (with their own PHP settings) and with their own pools of processes, also able to make a graceful reload (without ending a running request). The communication between the HTTP server and the PHP server is then a socket (a UNIX socket or a network socket).

We'll use this same method with Nginx: PHP as an independent php-fpm daemon, since there is no mod_php available on Nginx.

By using php-fpm instead of mod_php you do not remove the memory problem of your PHP application. But at least this problem of memory is not anymore on the HTTP server. And in this server we do not only handle PHP pages, but also statics and even cached pages (a reverse proxy cache). Nginx can be used easily as a reverse proxy cache, and Apache also has some cache modules, which, in my own opinion, are harder to deal with. Anther solution to this cache part could be Varnish. This is especially true with Apache because we still have a big problem on the HTTP server side, managing a large number of incoming connections; not only hundreds of connections, but maybe thousands.

With Apache 2.4 an event worker was added. This mpm is inspired by the Nginx way of managing incoming requests, and should be able to manage big numbers of incoming requests. But it's a fairly new mpm, maybe not as stable as the prefork model. To understand the 'event' system or the Nginx way of managing requests we'll have a look at another problem, persistent connections.

2.2.2 - TCP keepalive

With HTTP/1.1 comes the keepalive addition on the HTTP protocol. By default, a HTTP connection is a one shot: one request, one response, then communication is closed.

But usually after this first communication you get an HTML file containing links to other resources. These are GET links that the browser will load automatically, asking for some more files, CSS style sheets, JavaScript files, images files, etc.

DOMAINE/LOCATION -----GET----------> Server | HTML PAGE <----------------------------+ | +----+ | +----------+ | | | css links -----GET-------------> Server | js links -----GET---------> Server | | IMG tags ----GET-----> Server | | | | | | | +----------+ | | | | | | IMG FILE <-----------------------+ | | JS SCRIPT <---------------------------+ | CSS FILE <-------------------------------+ | IMG url() -----GET-----------------> Server | IMG FILE <-----------------------------+

It could also contain some 30x redirections with a new page requested, or some AJAX requests.

The fact is that usually on the first seconds you will need more than just one request/response interaction between the browser and the server. Establishing a TCP/IP connection to the server is sometimes very long, so the idea is to ask the server for a keepalive mode. And the server might accept it, keeping the TCP/IP connection open and waiting for your next HTTP requests.

The server can refuse this still-open mode, and it can close it at any step. But that's a nice feature for the browser (which usually opens several connections to the server anyway, to get more things faster -- usually 3).

If KeepAlive is activated on Apache, the default duration of this mode is usually between 5 and 15 seconds. 2 or 3 seconds would be better, because to manage this mode Apache takes one of the available process (in prefork) or thread (in worker) and keeps it connected to the browser session, usually doing nothing but waiting for new requests. Especially if everything is completed in 2 seconds and the keepalive is set to 15 seconds.

If you do not have a lot of available workers and they spend their time doing nothing but waiting, this is a good way to obtain a denial of service by misconfiguration. One solution is to disable KeepAlive handling on the server.

This KeepAlive was not made to wait for the user to click on the next page.

Then comes the long keepalive modes, comet for example. Applications try to maintain a stream between the server and the browser, even trying to add some push-to-browser things -- this enables chats or dynamic server-side controlled page alterations. In these modes the KeepAlive mode of HTTP becomes important.

The whole way of managing browser to server connections with dedicated workers seems wrong to manage long living connections and also to manage high numbers of parallel connections.

2.2.3 Nginx: event based handler

The solution was to rethink the way of managing connections. The idea is to have something using low-level kernel capabilities and looping very fast on a very big number of opened connections. When something is coming from one of these connections the task is assigned to a thread; when something is ready to be sent to a connection it is sent.

But no thread is given the task of managing one dedicated connection and the long periods of silence in this connection. It's like a call center: a robot is managing the incoming calls and ensuring workers have things to do, where any silence on the line would make the robot assign the worker to a different consumer. A very efficient call center.

Nginx supports several different algorithms to manage this event-based system; you can choose one or ask for the best-Nginx-guess by not using the use keyword in the events section.

You can have several workers in Nginx, each one managing connections and launching threads. Usually it is considered a good thing to add one worker for each available CPU core on the server, but you can use auto to let Nginx decide.

Let's look at our first Nginx settings, note how some settings are scoped in the events section.

worker_processes 4; worker_rlimit_nofile 8192; events { worker_connections 4096; multi_accept on; }

Here we set 4 workers; we allow 4096 connections (to client browsers, but also to backends such as proxied servers or php-fpm). The worker_rlimit_nofile allows for most opened files, the file limit could prevent having a big number of managed connections.

The multi_accept option is there to increase the connection acceptance rate. You will not need it if you do not have high loads.

Nginx has demonstrated that event-based systems could easily handle thousands of parallel connections, and could even manage streamed connections (I do not say you should use PHP or Drupal to do streaming or push, even with Nginx).

So if there is at least one reason to study Nginx as an HTTP server it should be to check how it behaves with a high load of incoming requests. But do not forget that without any cache management, dropping the HTTP server bottlenecks could simply move the bottlenecks on the PHP/MySQL side.

2.3 PHP-FPM

With both Nginx and Apache we'll use PHP in php-fpm. We could try a very secure configuration by using the chroot mode in php-fpm, but this has a lot of impact on PHP features. We'll do it without chroot , but at least we will control the part of the filesystem that could be accessed by Drupal by enforcing the open_basedir section.

I will not alter the main php-fpm.conf file. On most distribution this file is loading pools from a directory; what you need to do is to remove the default pool in this directory and add yours (something like /etc/php-fpm.d/pool.d/www-example.conf :

Note: comments on these files are set via the ; character.

[myproject] prefix = /path/to/myproject ; network mode listen = 127.0.0.1:9000 listen.allowed_clients = 127.0.0.1 ; unix socket mode ; listen = /path/to/fpm.sock listen.owner = my_project_user listen.group = www-data ;listen.group = Nginx listen.mode = 0660 user = my_project_user group = www-data ;group = Nginx pm = dynamic pm.max_children = 100 pm.start_servers = 15 pm.min_spare_servers = 10 pm.max_spare_servers = 30 pm.max_requests = 500 pm.status_path = /status-php ping.path = /ping ping.response = pong request_terminate_timeout = 30s request_slowlog_timeout = 5s slowlog = /var/log/php/fpm_$pool.log.slow chdir = /path/to/myproject/www catch_workers_output = no env[HOSTNAME] = $HOSTNAME env[TMP] = /path/to/myproject/tmp env[TMPDIR] = /path/to/myproject/tmp env[TEMP] = /path/to/myproject/tmp env[DOCUMENT_ROOT] = /path/to/myproject/www php_value[include_path] =".:/path/to/myproject/www:/path/to/myproject/www/include" php_value[open_basedir] ="/path/to/myproject/www:/path/to/myproject/private:/path/to/myproject/tmp" php_admin_flag[file_uploads] =1 php_admin_value[upload_tmp_dir] ="/path/to/myproject/tmp" php_admin_value[upload_max_filesize] ="50M" php_admin_value[max_input_time] =120 php_admin_value[post_max_size] ="50M" php_admin_value[max_input_vars] ="1000" php_admin_value[suhosin.post.max_vars] ="1000" php_admin_value[suhosin.request.max_vars]="1000" php_admin_value[error_log] = "/var/log/php/fpm.$pool.log" php_admin_value[log_errors] = 1 php_admin_value[display_errors] = 1 php_admin_value[html_errors] = 0 php_admin_value[display_startup_errors] = 0 php_admin_value[define_syslog_variables] = 1 php_value[error_reporting] = 6143 php_value[max_input_time] ="120" php_value[max_execution_time] ="30s" php_value[memory_limit] ="128M" php_value[session.gc_maxlifetime] =3600 php_admin_value[session.gc_probability] =1 php_admin_value[session.gc_divisor] =100 php_admin_value[magic_quotes_gpc] =0 php_admin_value[register_globals] =0 php_admin_value[session.auto_start] =0 php_admin_value[mbstring.http_input] ="pass" php_admin_value[mbstring.http_output] ="pass" php_admin_value[mbstring.encoding_translation] =0 php_admin_value[expose_php] =0 php_admin_value[allow_url_fopen] =1 php_admin_value[safe_mode] =0 php_admin_value[expose_php] =0 php_admin_value[variables_order] =PGCSE php_admin_value[cgi.fix_pathinfo] =1 php_admin_value[cgi.discard_path] =0

This is a pool configuration example; using a temporary directory in the project tree and not from the shared /tmp . The open_basedir setting must include the temporary directory and the Drupal private files directory if it not on the web directory (which is really better).

We will not make too much fine tuning on the PHP side, as we have enough things to check on Apache and Nginx. But you could at least add APC (on very old PHP versions) or the new OpCache opcode. Ensure you have an ini file loaded with opcache settings:

opcache.enabled = 1 opcache.enable_cli = 1 opcache.memory_consumption = 128 opcache.interned_strings_buffer = 32 opcache.max_accelerated_files = 4000 opcache.max_wasted_percentage = 5 opcache.use_cwd = 1 opcache.validate_timestamps = 1 opcache.revalidate_freq = 30 opcache.revalidate_path = 0 opcache.save_comments = 1 opcache.load_comments = 0 opcache.fast_shutdown = 0 opcache.enable_file_override = 1 opcache.optimization_level = "0xffffffff" opcache.blacklist_filename = "" opcache.max_file_size = 0 opcache.force_restart_timeout = 180 opcache.error_log = "/var/log/php/opcache_error.log" opcache.log_verbosity_level = 1

The important opcache settings here are revalidate_freq , which means that files are checked for modification only after 30 seconds (so be patient when you alter the settings files or reload PHP); and blacklist_filename where I do not add the settings filename. This setting file is accessed on every request, so I prefer waiting 30 seconds instead of bypassing the opcode cache on every access to this file.

3 Drupal: try to get more from Apache

3.1 Some PHP files, some .htaccess files

The default Apache configuration coming with PHP applications like Drupal are .htaccess files. There are several reasons:

- Drupal should be able to run on hosted environments, where you have very little control of the HTTP server configuration.

- Drupal could run on prefixed directories or on dedicated domain names (even managing multisites based on these names)

- Drupal is built like all old PHP applications where the whole source code is inside the documentRoot

We are in a PHP application with, by default, all directories and files reachable via HTTP requests. The default security is then based on directories, and the way to add some configurations on directories is, by default, .htaccess files, which are bad. More on that later.

Another problem with a default Drupal is that you have a lot of php files available on the DocumentRoot (even with strange extensions like .inc or .module ); one of them is very important:

- index.php : The central application bootstrapper.

Others are less important:

- install.php : If you install from the browser instead of using drush site-install .

: If you install from the browser instead of using . - update.php : If you update from the browser instead of using drush updb .

: If you update from the browser instead of using . - cron.php : If you run asynchronous tasks with HTTP requests instead of using drush .

: If you run asynchronous tasks with HTTP requests instead of using . - xmlrpc.php : If you allow incoming web services.

: If you allow incoming web services. - modules/statistics/statistics.php : If you activate the AJAX statistics, this is the AJAX bootstrapper used.

A lot of other PHP files are present but have no need to be requested directly. The boostrapper scripts presented above are the only ones that should ever get requested (unless you use a very specific module).

Things are in fact worst than that. Even if most of the PHP scripts coming from Drupal are harmless, having some PHP file uploaded and executed could get you in trouble. So some other .htaccess files are added by Drupal in several places (private directory, temporary directory, public files directory), containing variations on this:

Deny from all <-- not in default/files # Turn off all options we don't need. Options None Options +FollowSymLinks # Set the catch-all handler to prevent scripts from being executed. SetHandler Drupal_Security_Do_Not_Remove_See_SA_2006_006 <Files *> # Override the handler again if we're run later in the evaluation list. SetHandler Drupal_Security_Do_Not_Remove_See_SA_2013_003 </Files> # If we know how to do it safely, disable the PHP engine entirely. <IfModule mod_php5.c> php_flag engine off </IfModule>

So as you can see a lot of things are made to prevent PHP execution there.

In my mind you have one place where PHP execution is dangerous: sites/all/libraries and sometimes sites/all/modules or sites/all/themes (and so much more). Libraries come with demos and examples; JavaScript libraries for example could come with very bad PHP files, where by bad I mean XSS vectors or even Remote Code Execution vectors.

On one hand we have a limited number of interesting PHP files. In other areas of risk for PHP executions, my solution is to alter the HTTP server configuration to only execute a very limited number of PHP files. If you are building you own application you should study opportunities to store most of your code outside of the web directory root; just keep in the directory root the real assets (js, images, css) and one index.php bootstrapper. But this is quite hard to achieve with Drupal.

One other thing that should be done is removing usage of .htaccess files. From the Nginx point of view these files are nothing; Nginx does not understand .htaccess files, and Nginx does not provide any similar files.

From the Apache point of view a .htaccess is a piece of Apache httpd configuration set on a directory, it's the same thing (almost) as a <Directory /path/to> instruction that could be set in the main configuration.

Checking the existence of dynamic .htaccess files on each directory and parent directories for every request is something that slows down the HTTP server. An efficient configuration should not be altered by per-directory files that could change anytime. T

This is done with AllowOverride None in Apache (set in <Directory /> ) so this no- .htaccess instruction starts at the filesystem root.

We will still need to read the .htaccess to understand the things inside and rewrite that for main configuration files, either for Nginx or Apache.

3.2 Analysis of the default .htaccess

# # Apache/PHP/Drupal settings: # # Protect files and directories from prying eyes. <FilesMatch "\.(engine|inc|info|install|make|module|profile|test|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)(~|\.sw[op]|\.bak|\.orig|\.save)?$|^(\..*|Entries.*|Repository|Root|Tag|Template)$|^#.*#$|\.php(~|\.sw[op]|\.bak|\.orig\.save)$"> Order allow,deny </FilesMatch>

Here, a list of file extensions that should not be allowed for direct external request (URI) is established. Here we see the problems that stem from having everything stored inside the documentRoot. This will have to be done on the Nginx side also.

# Don't show directory listings for URLs which map to a directory. Options -Indexes

This removes the Index from the autoindex module. Without this option, accessing any directory which does not contain a valid index file would generate a directory listing content. When using Apache I usually remove the mod_autoindex module; on Nginx you simply need to avoid the autoindex directive.

# Follow symbolic links in this directory. Options +FollowSymLinks

By default Apache would not follow symbolic links. This option is great; you could use it to replace the site/default directory with a symbolic link to another directory tree. sites/default will contain your settings files, with maybe some alterations, and all uploaded/contributed content in site/default/files .

It could make your life easier to manage this directory outside of the main code and-assets directories, when upgrading the code of the website. sites\default\files is also the only directory where the PHP user needs some write rights (which is the Apache user or group with mod_php, but it could be a different one with php-fpm). Moving it to a different place could help you manage the big differences between this directory and the others.

# Make Drupal handle any 404 errors. ErrorDocument 404 /index.php

This sends all 404 to drupal, so Drupal can track the 404 and make nice error messages. It can also slow down your website, launching a php-mysql dynamic application for all 404. You can maybe generate a static html error page for 404 and remove this internal redirection.

# Set the default handler. DirectoryIndex index.php index.html index.htm

This states that any access to a directory url (ending with / ) will be internally redirected to the index.php file on this directory, or index.html or index.hm if not found. The htm* fallbacks could maybe be found on some external libraries or modules added to your Drupal, but I think you could remove these fallbacks (so you would get a 404 because you do not have the Index option activated).

With Apache, be careful of the mod_negotiation module and remove the MultiViews option if you have this module. If this option is added somewhere on the main configuration, this could redirect a 404 request to a file with the same name and a known extension -- very insecure.

# Override PHP settings that cannot be changed at runtime. See # sites/default/default.settings.php and drupal_environment_initialize() in # includes/bootstrap.inc for settings that can be changed at runtime. # PHP 5, Apache 1 and 2. <IfModule mod_php5.c> php_flag magic_quotes_gpc off php_flag magic_quotes_sybase off php_flag register_globals off php_flag session.auto_start off php_value mbstring.http_input pass php_value mbstring.http_output pass php_flag mbstring.encoding_translation off </IfModule>

Here we have some PHP configuration, set by Apache. So if you use Nginx these settings should go to the php-fpm pool settings. It could also be done using Apache (or it could also be done in the Apache virtualHost).

# Requires mod_expires to be enabled. <IfModule mod_expires.c> # Enable expirations. ExpiresActive On # Cache all files for 2 weeks after access (A). ExpiresDefault A1209600 <FilesMatch \.php$> # Do not allow PHP scripts to be cached unless they explicitly send cache # headers themselves. Otherwise all scripts would have to overwrite the # headers set by mod_expires if they want another caching behavior. This may # fail if an error occurs early in the bootstrap process, and it may cause # problems if a non-Drupal PHP file is installed in a subdirectory. ExpiresActive Off </FilesMatch> </IfModule>

This part is only activated if the expires module is activated on Apache. It allows a 2-week cache for assets by default and lets Drupal generate the expiration headers for HTML content (where it will depend on cache_page settings and user cookies).

And after that we have this frightening part of Apache configuration... mod_rewrite (but mostly comments):

# Various rewrite rules. <IfModule mod_rewrite.c> RewriteEngine on # Set "protossl" to "s" if we were accessed via https://. This is used later # if you enable "www." stripping or enforcement, in order to ensure that # you don't bounce between http and https. RewriteRule ^ - [E=protossl] RewriteCond %{HTTPS} on RewriteRule ^ - [E=protossl:s] # Make sure Authorization HTTP header is available to PHP # even when running as CGI or FastCGI. RewriteRule ^ - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}] # Block access to "hidden" directories whose names begin with a period. This # includes directories used by version control systems such as Subversion or # Git to store control files. Files whose names begin with a period, as well # as the control files used by CVS, are protected by the FilesMatch directive # above. # # NOTE: This only works when mod_rewrite is loaded. Without mod_rewrite, it is # not possible to block access to entire directories from .htaccess, because # <DirectoryMatch> is not allowed here. # # If you do not have mod_rewrite installed, you should remove these # directories from your webroot or otherwise protect them from being # If your site can be accessed both with and without the 'www.' prefix, you # can use one of the following settings to redirect users to your preferred # URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option: # # To redirect all users to access the site WITH the 'www.' prefix, # (http://example.com/... will be redirected to http://www.example.com/...) # uncomment the following: # RewriteCond %{HTTP_HOST} . # RewriteCond %{HTTP_HOST} !^www\. [NC] # RewriteRule ^ http%{ENV:protossl}://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301] # # To redirect all users to access the site WITHOUT the 'www.' prefix, # (http://www.example.com/... will be redirected to http://example.com/...) # uncomment the following: # RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC] # RewriteRule ^ http%{ENV:protossl}://%1%{REQUEST_URI} [L,R=301] # Modify the RewriteBase if you are using Drupal in a subdirectory or in a # VirtualDocumentRoot and the rewrite rules are not working properly. # For example if your site is at http://example.com/drupal uncomment and # modify the following line: # RewriteBase /drupal # # If your site is running in a VirtualDocumentRoot at http://example.com/, # uncomment the following line: # RewriteBase / # Pass all requests not referring directly to files in the filesystem to # index.php. Clean URLs are handled in drupal_environment_initialize(). RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} !=/favicon.ico RewriteRule ^ index.php [L] # Rules to correctly serve gzip compressed CSS and JS files. # Requires both mod_rewrite and mod_headers to be enabled. <IfModule mod_headers.c> # Serve gzip compressed CSS files if they exist and the client accepts gzip. RewriteCond %{HTTP:Accept-encoding} gzip RewriteCond %{REQUEST_FILENAME}\.gz -s RewriteRule ^(.*)\.css $1\.css\.gz [QSA] # Serve gzip compressed JS files if they exist and the client accepts gzip. RewriteCond %{HTTP:Accept-encoding} gzip RewriteCond %{REQUEST_FILENAME}\.gz -s RewriteRule ^(.*)\.js $1\.js\.gz [QSA] # Serve correct content types, and prevent mod_deflate double gzip. RewriteRule \.css\.gz$ - [T=text/css,E=no-gzip:1] RewriteRule \.js\.gz$ - [T=text/javascript,E=no-gzip:1] <FilesMatch "(\.js\.gz|\.css\.gz)$"> # Serve correct encoding type. Header set Content-Encoding gzip # Force proxies to cache gzipped & non-gzipped css/js files separately. Header append Vary Accept-Encoding </FilesMatch> </IfModule> </IfModule>

The last part is about directly sending compressed files generated by Drupal for CSS and JavaScript files. The content is self-explanatory.

You have one rule on HTTP authorization headers for CGI; this can be suspended if you do not use HTTP Authorization. You also have two rules about setting a protossl env variable, but I think this kind of rules is not needed in most cases. You could comment it out.

The most important part was this one:

# Pass all requests not referring directly to files in the filesystem to # index.php. Clean URLs are handled in drupal_environment_initialize(). RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} !=/favicon.ico RewriteRule ^ index.php [L]

A rule is always an aggregate of 0 or more RewriteCond followed by a RewriteRule . So this is just one rule and it states:

IF the requested file is not a real file on the filesystem AND IF the requested file is not a real directory on the filesystem AND the request uri is not /favicon.ico THEN rewrite everything to our main boostrapper index.php

Note that previous versions were redirecting to index.php?q=<original location> , but this has been removed. The q get argument is now emulated in includes/bootstrap.inc with this:

// When clean URLs are enabled, emulate ?q=foo/bar using REQUEST_URI. It is // not possible to append the query string using mod_rewrite without the B // flag (this was added in Apache 2.2.8), because mod_rewrite unescapes the // path before passing it on to PHP. This is a problem when the path contains // e.g. "&" or "%" that have special meanings in URLs and must be encoded. $_GET['q'] = request_path();

The request_path function takes the q argument, if it exists, or gets the REQUEST_URI environment variable (fed by the HTTP and PHP layers) with removal of ? and a URL decoding. The comment explains that this allows for management of locations containing & or % .

In my mind you should not have locations containing such characters; the % character is used for url encoding. The & is allowed in locations, but if you configure your Drupal to use paths containing these characters you are a strange person. You will have problems on several places.

It's like using http:// in the location, or even <script> : it could work, but sooner or later one piece of the whole system will break your request. So if you ever have issues in the Drupal router because you use & and % in the location the best way of fixing that will be to fix the application.

If you do not use clean URLs in drupal, the requests will look like:

A - http://www.example.com/index.php?q=my/path

If you do use clean URLs the requests are:

B - http://www.example.com/my/path

Which is rewritten by mod_rewrite to

B1 - http://www.example.com/index.php

And then PHP emulates an incoming URI of :

B2 - http://www.example.com/index.php?q=my/path

These two steps (B1 and B2) could be done in one step (directly rewriting to B2) if you do not have the problem of & and % characters in your locations.

Drupal is also catching requests to missing assets and redirecting these requests internally to index.php . This is used to generate the sites/[default]/files/styles/* resized images. On the first request these files does not exists; the request is redirected to index.php, the files are generated via gd , then the response is sent back and the new file saved on disk. future requests will fail the RewriteCond %{REQUEST_FILENAME} !-f tests and be served directly by the HTTP Server.

And...that's it. We have now analyzed most of the features of Drupal for an HTTP server.

3.3 Application settings

Drupal could work without bothering too much on the HTTP server side. It's built for that, like WordPress (which is maybe the best example of such behavior). Take a subdirectory of your default document root, upload the code, add an ugly chmod -R 777 (seriously, never do that), request your server on a valid IP or DNS name, with the subdirectory, and it works. But that's not enough for a serious production installation.

Your job is to make things work well, but only as expected!

The default settings will work, but if you do nothing you will encounter two sorts of big problems. The first one is a performance problem; the second one is about security.

Drupal, as with many PHP applications, can guess at things; but it will be far more secure if you edit and enforce the settings. This will avoid some cases of Drupal working in unexpected ways.

You have several ways of editing settings in Drupal. The most obvious one is to feed the content from the backoffice forms, which will feed the variables table. Your problem then is that these settings are recorded in the database, and when you install another version of your website in another context (production, qa, development, preproduction...) you will have to alter some of these settings.

So the right way of storing any contextual setting is to use the settings file. This file is generated after the installation and is in <docroot>/sites/default/settings.php (or, if you use the multisites feature, it is <docroot>/sites/<site directory>/settings.php ). Usually I add an include on the bottom of this file to another local.settings.php to deal with code version control systems.

The settings.php file is a generated file, containing the database credentials, your salt key for passwords, and some other default settings, usually not stored in a version control system.

Any variable content set in a settings file will override the database and backoffice forms values. Theses files can be used for contextual variables, but also for any variable that should not be altered via the backoffice.

This is an example of such a settings file (do not just copy paste this to your production server). Some variables have to be set in $conf and some others are global variables. Key names of the $conf array are the variable names (the name used via drush vget , drush vset , or via the variable_get() function in the code).

// set jpeg compression level $conf['image_jpeg_quality'] = 95; // disallow the poor-man cron, we do it via drush $conf['cron_safe_threshold'] = 0; // Activate the caches for blocks and anonymous pages $conf['cache'] = 1; $conf['block_cache'] = 1; $conf['cache_lifetime'] = 0; // this will be in Cache-Control: public max-age $conf['page_cache_maximum_age'] = 21600; // aggregation and compression of js, css and pages $conf['preprocess_css'] = 1; $conf['preprocess_js'] = 1; $conf['page_compression'] = 0; $conf['js_gzip_compression'] = 0; $conf['css_gzip_compression'] = 0; // reverse proxy settings $conf['reverse_proxy'] = true; $conf['reverse_proxy_header'] = 'HTTP_X_FORWARDED_FOR'; $conf['omit_vary_cookie'] = FALSE; // filesystem $conf['file_directory_path'] = 'sites/default/files'; $conf['file_public_path'] = 'sites/default/files'; $conf['file_private_path'] = '/path/to/project/private'; $conf['file_temporary_path'] = '/path/to/project/tmp'; $conf['file_directory_temp'] = '/path/to/project/tmp'; $conf['file_chmod_directory']=02770; $conf['file_chmod_file']=0660; umask(0000); // NO_CACHE cookie from cookie_cache_bypass_advanced module $conf['cookie_cache_bypass_adv_cache_lifetime']=60; $conf['cookie_cache_bypass_adv_cookie_path']='entire_site'; $conf['cookie_cache_bypass_adv_set_time']='after_validate'; // 1 means some errors get to the end user, 2 means all errors, 0 none. $conf['error_level'] = 0; // default domain and prefix: YOU MUST to alter that $base_url = 'http://www.example.com'; $cookie_domain = '.example.com'; // drush hack $conf['base_url'] = $base_url;

Using settings to manipulate your site configuration is as natural as using the drush command line tool instead of the backoffice interfaces. It does not mean everything should be set on the command line and with configuration files, but all the SysOps tasks should be available from these tools.

One of the very important settings set in this example is the $base_url variable. This variable will be reused when generating the links in your HTML pages. So, for example, to link to your CSS and JS files.

It will also be used when generating the one-time password for people using the 'forgot my password' link. Every production website should have this variable set; check this page for a list of bad things that could be done on your Drupal if you do not enforce this setting. Setting error_level to 0 is also a must on a production website to avoid information disclosures issues.

3.4 VirtualHosts, domain names, path prefix

Your Drupal website is now building response pages with the right name. But by default, any HTTP request that reaches your server IP could reach your Drupal installation. You need a Virtualhost.

By default you have one VirtualHost on an HTTP server, responding to any requests. The problems you may encounter are:

You could be handling several websites on the same server. You want to have the website on the root URL, not on a path prefix (which is usually only used on dev environments). You would like to have one centralized working configuration, independent of any other HTTP application running on the server. You want the website to work only for expected domain names.

If your website is www.example.com , it should only be reached on this name. Failure to ensure that will lead to security problems (mostly the same attacks as the ones linked on the $base_url settings link, and some others like the last SA_CORE_2014_003).

The third point is also quite important. If you look at a default Debian-packaged Apache HTTP server, when you add some packaged LAMP application like phpmyadmin you will find some global-scope configuration files added to Apache. Aliases on /phpmyadmin are added on the global scope of the Apache configuration files. So every website running on this server has the /phpmyadmin alias.

This file is included from the main Apache configuration; it should be included from one specific VirtualHost and be available on one domain name (maybe only from localhost).

3.4.1 Configuration scopes

But what is a VirtualHost? You have several levels of scope for configuration settings in Apache (and the same is true for Nginx), and some configuration settings are added, inherited, or overridden at each scope. One of these levels is the VirtualHost.

The VirtualHost is defined by the name and port used. You can define different VirtualHosts on different IPs, domain names (DNS records), or for different ports. If your site is available on both HTTPS and HTTP you have at least two different VirtualHosts, on different ports. The goal is to alter some parts of the configuration at this level.

If you look at Apache or Nginx configurations, you have levels that go from globally-shared settings to very specific settings.

For Apache you have:

- Server config : The global configuration.

: The global configuration. - VirtualHost : You work on a subset of valid IP and/or domain names and/or ports.

: You work on a subset of valid IP and/or domain names and/or ports. - Location : You work on the URI, on the location part.

: You work on the URI, on the location part. - Directory : You work on the filesystem, the real filesystem tree.

: You work on the filesystem, the real filesystem tree. - .htaccess (which is a subset of Directory)

(which is a subset of Directory) - File: You work with final files.

On Nginx you have:

- main : This is the same as Server config in Apache.

: This is the same as in Apache. - events : Where you have the connection management.

: Where you have the connection management. - http : Which is a global level for all thing related to HTTP.

: Which is a global level for all thing related to HTTP. - server : This is the equivalent of VirtualHost .

: This is the equivalent of . - location: Where you work on the location part of the URL (so without the domain and the query string).

If you take any Nginx parameter in the documentation you will find the Context where this parameter is available (here in this example, Context: http, server, location ).

If you do the same on the Apache side you also have the context of this parameter validity (here Context: server config, virtual host, directory, .htaccess ).

In fact, you also have some other contexts in Nginx, like mail for a mail proxy, or nested locations , if , or if in location . The way things are inherited from these contexts in not so simple; once you understand the basics of Nginx, you will learn a lot by reading more about inheritances. There is, for example, this blog post from Martin Fjorvald, which is a good complement to ifIsEvil and PitFalls from the Nginx documentation.

Using the global scope to define a very specific setting is a bad thing (like in our Debian PHPMyAdmin example), because it makes other websites on the same server behave in an unexpected way. If you want to ensure the websites run only in the expected way, you have to make some effort to keep this configuration tree simple and logical.

If you look closely at the Nginx contexts you will see that there is no directory level. Apache configurations are usually very directory-centric, based on subtrees of the filesystem; this is especially true with .htaccess files.

Nginx configuration are location-centric. The URL is used quite extensively, and this is one of the things you should understand to start thinking about your configuration in the right way. Note that using location-based configuration can be achieved in Apache; if you plan to move to Nginx later on, you can already start to think about your Apache configuration in this mode.

3.4.2 Default VirtualHost

A good thing to do is to add both a default catch-all VirtualHost and a real VirtualHost, that handles the application at the right domain name. The default VirtualHost will catch all the bad requests (with empty or bad host headers); it could also catch your localhost traffic, part of your monitoring traffic, and internal requests for Apache, like the internal dummy connections that Apache sends to its child processes for graceful reloads.

For Apache to ensure that this VirtualHost is the first loaded (in alphabetical order of the file name) we generally use a 00-default file:

<Virtualhost *:80> ServerAdmin webmaster@localhost ServerName _default_ DocumentRoot /var/www/default LogLevel warn ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined DirectoryIndex index.html # starting from filesystem root directory <Directory /> # no .htaccess files AllowOverride None # and eveyrthing is forbidden by default Require all denied </Directory> # and after the DocumentRoot we allow access <Directory /var/www/default> # Apache 2.4 syntax Require all granted </Directory> </virtualHost>

The *:80 means this VirtualHost will listen for incoming requests on all IPs, on port 80. With Apache 2.4 you do not need to declare a matching NameVirtualHost instruction. If you look at the default VirtualHost from a distribution package you will certainly find a lot more things inside. Usually you do not need these things.

You will also usually find a DocumentRoot pointing at /var/www and not /var/www/default . You can keep this setting, but only if you not use /var/www as the base directory for other websites. The document root is the starting point of your web server; the location / is a direct link to your document root. If you have a website stored in /var/www/www.example.com/www with a matching VirtualHost having a document root set to this directory, but you keep the default VirtualHost document root in /var/www... the locations will collide. And this is usually unexpected. And unexpected things lead to unexpected leaks.

The Nginx version of a default VirtualHost is:

server { listen 80 default_server; server_name ""; root /var/www/default index index.html; access_log /var/log/Nginx/access.log; error_log /var/log/Nginx/error.log info; }

You can see that one very important difference in the syntax is the presence of ; at the end of each instruction.

By the way, if you try to build Nginx and Apache configurations at the same time, on the same server, do not forget that only one of these two HTTP servers can bind to port 80.

3.4.3 Connect an Apache 2.4 VirtualHost to php-fpm with mod_proxy_fcgi

We can now add our dedicated VirtualHost.

We'll start with the Apache 2.4 version. Note the usage of variables defined in the global scope that could be set on another file; this is one of the new features in Apache 2.4. This will not be available on Nginx, not with variables defined in the global scope, but at least you can define variables in the server context and use these variables in the location contexts.

Define my-project-port 80 Define my-project-path /path/to/project Define my-project-domain www.example.com Define my-project-docroot ${my-project-path}/www Define my-project-var-path ${my-project-path}/var <VirtualHost *:${my-project-port}> ServerName ${my-project-domain} ServerAdmin webmaster@${my-project-domain} DocumentRoot ${my-project-docroot} ErrorLog /var/log/apache2/my-project-error.log CustomLog /var/log/apache2/my-project-access.log combined

This is the base of a VirtualHost: hostname, document root, and some logs.

<Directory /> AllowOverride None Require all denied </Directory>

Here we enforce the fact that everything is forbidden by default, starting from the root of the filesystem. And we remove support for .htaccess files.

# DocumentRoot Directory <Directory ${my-project-docroot}> Require all granted # ignore all Drupal's .htaccess # they are all replaced with local instructions AllowOverride None # Follow symbolic links in this directory. Options +FollowSymLinks -Indexes -Multiviews # Set the default handler DirectoryIndex index.php </Directory>

And this is a directory of the document root, with access granted to web files, symbolic links followed, and three dangerous options removed (indexes, generating automatic listing of directory contents, and multiviews which map files not found to files of the same name with known extensions).

#.svn & .git directories must be avoided!! RedirectMatch 404 /\.svn(/|$) RedirectMatch 404 /\.git(/|$) <FilesMatch "\.(engine|inc|info|install|make|module|profile|test|po|sh|.*sql|markdown|md|theme|tpl(\.php)?|xtmpl)(~|\.sw[op]|\.bak|\.orig|\.save)?$|^(\..*|Entries.*|Repositoriy|Root|Tag|Template)$|^#.*#$|\.php(~|\.sw[op]|\.bak|\.orig\.save)$"> Require all denied </FilesMatch>

This is the same content as what could be found in the Drupal .htaccess file. With some more extensions, you could add more.

Always test that these exclusions are really executed. Note that they are not set inside the Directory section, but in the whole VirtualHost scope. When used in the .htaccess file they are only valid for the document root directory and its subdirectories. Here if you add an alias to a directory outside of the document root, these instructions will still be valid.

# Customized error messages. ErrorDocument 404 /index.php ErrorDocument 403 /index.php

Here we connect some error pages to Drupal; you could also use some static error pages.

# Requires mod_expires to be enabled. <IfModule mod_expires.c> # Enable expirations. ExpiresActive On # Cache all files for 2 weeks after access (A). ExpiresDefault A1209600 <FilesMatch \.php$> # Do not allow PHP scripts to be cached unless they explicitly send cache # headers themselves. Otherwise all scripts would have to overwrite the # headers set by mod_expires if they want another caching behavior. This may # fail if an error occurs early in the bootstrap process, and it may cause # problems if a non-Drupal PHP file is installed in a subdirectory. ExpiresActive Off </FilesMatch> </IfModule>

This part is taken from the .htaccess. It sets some default expirations rules on assets.

Now comes the difficult task. We need to connect PHP requests coming in to Apache to the php-fpm FastCGI daemon running on port 9000. With Apache 2.4, things are simpler than the convoluted actions required on Apache 2.2.

You need to use mod_proxy_fcgi. You could try to use a socket-based connection if you have a very recent Apache 2.4 version, but at least the network socket mode is stable. The classical way of doing it is to use an expression of this form:

# We will not use that line, it's an example, see below, we'll use mod_rewrite ProxyPassMatch ^/(index.php(/.*)?)$ fcgi://127.0.0.1:9000/www/$1

It may succeed, but, if you have some complex rules, mixing alias, rewrite rules, file instructions, and finally this proxy, you will quite certainly have some issues with the order of execution of all these rules. The priority orders in Apache are far from obvious; they're not based on the declarative order.

There's a page in the Apache documentation about the precedence order. But even with this documentation it's quite hard to really understand which directive is applied before the Proxy mapping. Worse, the Proxy directive is placed on top priority.

So using the ProxyPassMatch instruction you will experience difficulties if you want to add some restrictions on the PHP execution.

And here I would like to show you an example of a PHP restricted access mode. To secure a Drupal installation, I would like to only allow the execution of index.php .

I would also like to internally use the index.php?q=/my/path form, but only internally; external web users will not have this possibility (only the /my/path mode). This is a good trick to ensure that any later rule based on the location could be added (like preventing /admin or /node/*/edit or /user* on a specific domain).

Drupal has an internal router which is able to fix a lot of strange mistakes on the URI, and preventing usage of the q argument will prevent any abuse of this argument, like index.php?q=/////%61dmin . To do these improvements we will use not just one rewrite rule -- redirecting missing pages to index.php?q=$1 -- we will use at least 3 rules. One preventing the q argument usage, one to restrict the list of authorized PHP files, and one doing the rewrite to the index file.

If we use these rules and a ProxyPassMatch , the proxy rule will be applied before mod_rewrite. So we need to remove the ProxyPassMatch rule and instead add a fourth rule using mod_rewrite to make the proxy call to php_fpm via the [P] tag -- but I do not like the short form of mod_rewrite tags because I'm a human being, so I'll use [proxy] .

You need to enable mod_proxy_fcgi and mod_rewrite ; telling mod_rewrite to use the proxy will execute mod_proxy_fcgi, so the ProxyPassMatch instruction is not a requirement. You can activate mod_rewrite debug options (which are different on Apache 2.4; you need to use LogLevel alert rewrite:trace6 instead of RewriteLogLevel) to trace the 4 rules' executions and loops. You may even find bugs in my rules, but at least you will have the total power of control on the proxy order execution.

Here's the final set of rules (which are not in the Directory section but directly in the VirtualHost scope):

<IfModule mod_rewrite.c> RewriteEngine on ######### RULE 1 # cleanurl is activated so ALL urls # MUST be accessed on /toto/titi and MUSN'T be accessed on index.php?q=/toto/titi # main reason is that applying url rules (like restricting /admin access) is far # easier in the cleanurl form than in parameter form. # WARNING: must allow real internal redirect of /toto/titi to q=/toto/titi # (done in rule 2) so the rule apply only if the rewriting process is starting # (no internal redirect) RewriteCond %{ENV:REDIRECT_STATUS} ^$ #detect non-blank QUERY_STRING (some parameters are present after the ? RewriteCond %{QUERY_STRING} . [nocase] # we prevent any query with a q= parameter RewriteCond %{QUERY_STRING} (^|&|%26|%20)(q|Q|%71|%51)(=|%3D). [nocase] # 403 FORBIDDEN ! RewriteRule .* - [F,L] ########## RULE 2 # deny direct access to php files which aren't index.php # (like an injected phpinfo.php, or worst) # This is also a protection against xmlrpc.php or update.php abuse # and also against files coming from libraries RewriteCond %{ENV:REDIRECT_STATUS} ^$ RewriteCond %{REQUEST_FILENAME} -f RewriteCond %{REQUEST_FILENAME} .*\.php RewriteCond %{REQUEST_FILENAME} !(/index\.php|/another-authorized-php-file\.php) RewriteRule .* - [F,L] ########## RULE 3 # cleanurl handling # for things which aren't real files or directories then # take the given url and give it to index.php?q=... # All url that didn't match ALL previous rewriteCond are still there # squeeze real files or directories, if they really exists # then Drupal won't be called RewriteCond ${my-project-docroot}%{REQUEST_FILENAME} !-f RewriteCond ${my-project-docroot}%{REQUEST_FILENAME} !-d # do not handle the favicon with Drupal bootstrap RewriteCond %{REQUEST_URI} !=/favicon.ico # do not redirect direct access to index.php on index.php?q=index.php RewriteCond %{REQUEST_URI} !=/index.php # put everything still there to Drupal index.php # [L/last]= stop rewriting here for matching rules # [QSA/qsappend]=Appends any query string created in the rewrite target # to any query string that was in the original request URL RewriteRule ^/(.*)$ /index.php?q=$1 [qsappend] ########## RULE 4 # PHP-FPM proxy for index.php # instead of using # ProxyPassMatch ^/(index.php(/.*)?)$ fcgi://127.0.0.1:9000/www/$1 # from mod_proxy, we use the [P/proxy] tag of mod_rewrite, because mod_proxy # would prevent other mod_rewrite rules to be applied RewriteRule ^/(index.php(/.*)?)$ fcgi://127.0.0.1:9000/www/index.php [last,proxy] </IfModule> </VirtualHost>

This is a working Apache 2.4 + php-fpm configuration. It could be used with all Apache mpm (prefork, worker, event). The configuration is a good starting point for a secure way of running Drupal.

Other things can still be added (reverse proxy caching, security headers management, etc.) By showing you the Apache version, my goal is to demonstrate that most things could also be done with the "ancestor". It may be harder or less documented, but if you encounter sysops which only use Apache because they have their ways of working with it and they have their specific powerful modules added in this HTTP server (mod_security, mod_macros, env variables and Defines, etc), you can still find some ways of running advanced PHP configurations.