Are you managing a server running a blog or content website and expecting or experiencing a massive number of requests?

I have been in this situation recently myself, running a WordPress instance on a single virtual machine that suddenly got over 30,000 requests in just a single day (hint: it’s one you probably know if you’re reading this).

How did I manage this without the website going down or becoming unbearably slow? With the help of the amazing tool called Varnish, which is available in Fedora!

Introducing: Varnish

Varnish is a caching daemon that sits between your visitors and the web daemon itself. For every request that comes in, it looks if it already retrieved a cached version of that page.

If it does, it serves that; if it doesn’t, it will request the page from the backend web server, serve that to the visitor, and cache the page for when the next visitor comes by requesting the same page.

How does Varnish work?

Varnish almost seems like magic and can improve your site’s performance massively, but there are some things to take into account.

Amongst others:

It will not cache any requests that had cookies in them.

It will deem /mypage?counter=1 as a different page than /mypage?counter=2, so hitting the second will not result in a cache hit for the first one.

How do you tell Varnish that the page has changed and it needs to refresh its cache?

First things first: Varnish is an amazing tool if your website has a lot of public and static content, it is less well-suited to websites where the majority of requests are authenticated and should serve different results for different people, because of the fact that it caches the entire web page.

If you want caching for websites that require users to be signed in or on very dynamic websites, Varnish will be less of an ideal fit and may even increase the load. In this case, you need caching that’s more specific to the website itself. For example, things like Memcached work better in these situations.

However, one case where Varnish is ideal is with blogs, where most people that visit your website will not be logged in and will be reading (mostly) static content.

Getting started with Varnish

So, how do you set this up? Very easily – you start by installing Varnish!

$ sudo dnf install varnish

Next up, you need to make sure it is used. For this article, I’m going to assume you are running the website on the same server as your Varnish setup and you are using Apache as the backend server.

You now have two ways to use Varnish:

Allow it to handle requests directly on port 80 — which proxies to Apache on port 8080.

Set up two sites with Apache: One that listens on port 80 and proxies to Varnish on port 8080, which proxies to Apache on some other port.

I’m going to focus on the first one since that’s slightly more tricky and leave the second one, which has the advantage of enabling SSL, as a challenge to you, the reader (basically you use

ProxyPass http://localhost:6081/

in Apache, and add a second virtual host listening on port 8080 to Apache where you serve the actual application).

Configuring Apache for Varnish

To set up Apache, go to /etc/httpd/conf/ , and open up httpd.conf . Search for the line saying ” Listen 80 “, and change that to ” Listen 8080 “. Note: 8080 could be any random port in essence, but I’m picking 8080 because that’s marked by the SELinux policy as “secondary httpd port”, which makes sure that Varnish is allowed to connect and Apache is allowed to listen. If you pick another port, you’ll need a custom policy to allow those two steps. Next up, we set up Varnish. Go to /etc/varnish/ and open varnish.params . Search for VARNISH_LISTEN_PORT=6081 , and change that to VARNISH_LISTEN_PORT=80 . Now, you will want to enable Varnish ( systemctl enable varnish.service ), to make sure it starts on boot of the system. To take both port changes into effect without any noticeable downtime, you can restart both services at the same time. systemctl restart httpd.service varnish.service (you will want to make sure httpd is the first of the two to make sure varnish doesn’t crash on start because Apache is still listening on port 80)

Now, at this point you have set up Varnish with a default setup, which actually works quite well!

Testing Varnish

You can verify that it works by doing a

curl -i http://yoursite.com

: you should see an X-Varnish header with one number behind it. This is the Varnish request ID, which can be used for debugging.

If you perform the same request a second time (hit arrow key up and enter), you should see that the X-Varnish header now has two numbers. The first one is the request ID of the current request, and the second one is most likely the same number as the first request. That’s the number of the request that Varnish has cached and is now responding for with the cached result.

If you see two numbers on the second request, that means Varnish cached the request and Apache was not hit for this request. Was this easy or not?

At this moment, you should have an actually working Varnish instance that will cache all unauthenticated requests, and speed up your static website considerably!

Customizing Varnish

You might want to customize some parts of the Varnish handling flow. You do that by opening

/etc/varnish/default.vcl

, which is actually a program written in Varnish Configuration Language. This has a few subroutines that get fired during different parts of the request handling.

Ignoring arguments and cookies

One of the main parts you may want to handle here is making sure that for some pages query arguments or cookies are ignored to determine if a page should be served from cache (for example because they’re static content and don’t depend on either but your framework still sends it).

For example, if you’re running WordPress, it’s fine to always cache

/wp-includes/

, since that’s just static JavaScript and CSS files. To tell Varnish about this, add the following in sub vcl_recv:

if (req.url ~ "^/wp-includes/") { unset req.http.cookie; set req.url = regsub(req.url, "\?.*", ""); }

That will tell Varnish to ignore cookies and query parameters, so even if you request

/wp-includes/page.js?timestamp=1

, it will return the cached entry even if the cache was recorded for

/wp-includes/page.js?timestamp=2

.

(Actually it just removes all cookies and query parameters from the URL before checking if it’s already in the cache. Note that this also means the query argument and cookies are not available to the backend).

Not caching specific pages

Another common thing might be that you have some pages that you never want to get served from the cache. For example, the WordPress cron page is not something you want cached.

For that, we can do in vcl_recv:

if (req.url ~ "^/wp-cron.php") { return (pass); }

This will tell Varnish to always pass the request on to the backend, and never serve it from cache.

Clearing Varnish cache

Now, the last thing is that you might want your application to tell Varnish to clear its cache. For example, WordPress can tell Varnish that a new article is published, so it should purge its caches and generate a new cache that includes the new article.

Since we don’t want to allow these requests from external sources (since that would enable anyone to clear the cache, defeating the entire purpose), we will first setup an Access Control List (ACL) named “purge”. You can name it anything, but this always makes sense to me.

Put the following before any of the “sub” blocks in the VCL file:

acl purge { "localhost"; "127.0.0.1"; }

Now to actually use it and allow the localhost to purge the cache, add the following to

vcl_recv

:

if (req.method == "PURGE") { if (!client.ip ~ purge) { return(synth(405,"Not allowed.")); } return (purge); }

After doing that, the localhost machine can send a purge request. Try it with ”

curl -I --request PURGE http://localhost/

. You should get “HTTP/1.1 200 Purged“, and the next request will again be freshly requested from the backend (hint: Varnish HTTP Purge is a WordPress plugin that can perform these PURGE requests automatically upon changes).

Now you have a Varnish caching server that will cache any static requests and requests for static content, and should speed up your site considerably. You can always tweak the Varnish VCL file a lot more, but this should at least get you started with a basic cache server.

Featured image credit to 33Hops.com.