One of the key goals we set ourselves when we developed the new iPlayer was that it would have to be fast to use. We understand that any delay in getting you to the video is frustrating as the site is just a jumping off point into TV and Radio content.

But how do we make things fast? Displaying a web page in the browser contains many steps, some we can control some we can't. Time spent for the request and response travelling over the network we can't control, but we can control how long the pages take to generate and how large they are. We also have a degree of control over how long those pages can take to render in your browser.

We had our work cut out for us on the new version of iPlayer.

Personalised websites require much more processing power and data storage

The current site uses one back-end service that we pull data from to build the pages. The new site uses many more, and we both post and pull data from them.

This means that every returning user gets a different homepage. There's already a small amount of difference between each homepage on our current site (your recently played) but the new site is driven much more by your favourites, recommendations and friends; they're key parts of the experience and they have to be fast.

We started developing in PHP

The BBC is standardising on PHP as its web tier development tool. Our current site is developed using Perl and Server Side Includes, and it's something that's well understood, but our new web tier framework (based on Zend) means that teams can share components and modules. In fact, the team responsible for the social networking functionality develop modules that anyone within the BBC can integrate into their site easily.

This does come at a cost though: the usage of a framework sometimes introduces delay in generating a page as it needs to get hold of resources to do so. In some cases this is necessary, especially if there's an element of personalisation, but in others our web tier is just repeating the same tasks.

All this against a growing demand

The site will have to support a massive amount of page views and users every day, on average 8 million a day for 1.3 million users. Previous versions of the site were able to grow into this demand; we'll have to hit the ground running from day one.

This graph shows our growth over the last year in terms of monthly page views.

So how do we do this?

One of the first things we can do is optimise the time it takes to generate the page.

Although changing architectures can be risky, we were confident that the one we moved to would enable us to meet all the challenges. At the heart of page generation is a PHP and customised Zend-based layer called PAL. This system then needs to integrate with our login system, BBC iD, our programme metadata system (Dynamite), our social networking systems, a Key Value data store and a few others. The homepage alone for a logged-in user with friends requires 15 calls across these services. Even if each of those calls take a few milliseconds, we can spend a second or two just collecting the information required, which would push us well out of our 2.5s target.

We proved our architecture before we built it

At the start of re-architecting iPlayer, we did what we could to eliminate guesswork. We developed a number of architectures based on our requirements, and then built prototypes of three of them; all built to serve the homepage, which we then tested against some basic volumetrics. This gave us plenty of data about how many requests we could serve a second and CPU loads, which we could then weigh up against other softer factors, like how our dev team could work with it.

We actually ended up going for the one which offered us a good balance between these factors, as this enabled us to be the most flexible in building pages, rather than constraining what we could with the site just to squeeze the extra speed out.

We cache a lot

Caching means storing a copy of the data in memory so subsequent requests for that data don't have to do the expensive things such as database queries.

It also allows us to get around any delays introduced by our framework starting up, as there's no such delay when delivering from cache.

Caching has its problems though. The data may have changed in the underlying system (programmes become available to play for example) but the change won't be reflected in our cache. This means we can only cache for seconds or minutes, but with the millions of page views we get, it can still make a crucial difference.

Data caching We cache the data returned from the services. We use Memcached for this. Sometimes we share data between pages.

We cache the data returned from the services. We use Memcached for this. Sometimes we share data between pages. HTML caching We also cache the resulting HTML for a short time. When you're hitting a page, it's highly likely you're just seeing the cached page. We use Varnish for this. Caching in this way is nothing new, but Varnish has a few tricks up its sleeve that we use which I'll explain later.

We broke the page into personalised and standard components

If you look at our homepage, many of those components are the same for everyone, but some are just for you. With traditional page caching in some reverse HTML caches, it's not possible to do this; so we break the page up. The main build of the page is cached; then when the page loads we use XHR and Ajax to load in the personalised components. Varnish gives us the ability to control the caching at a low-level like this. Every time we generate a page or a fragment, we can tell Varnish how long we want to cache it for. The main bulk of the homepage doesn't need caching for long to get some benefit, but your favourites we can cache for longer (although still only for a few minutes), and we know when you add a new favourite so we can clear out the cache and replace it with the new content. This means as you browse the site, the page loads quicker and your experience is smoother.

We use loads of servers

After we've optimised all we can using a single server, we then scale horizontally using multiple servers joined together in a pool. None of our web servers store any state about who you are and what you're doing, so your request can go to any server at any time.

We also serve pages out of two locations (or data centres). This gives us a higher degree of resilience to failures; we can lose an entire data centre and still be able serve the site.

We load tested the site before we launched

We're able to track how the site is used, so this gives us the ability to produce detailed volumetrics of how we think the new site is going to be used. Some of it is estimation, but it's always backed up with data. We can then produce detailed load tests, so we can simulate usage of the site. This enables us to find and resolve any problems we may experience under load, before we go live.

The end result

We're not 100% there yet (this is a beta after all) but from this sample 24 hours of monitoring data you can see that, apart from a couple of spikes, we're doing well at keeping to our target of 2.5 seconds. (We were also able to track down the spikes to some misbehaving components on the platform).

We're currently working hard behind the scenes at making sure we can continue to serve at this speed as usage increases, spreading the load across our infrastructure.

At the end of this though, we hope the result of our efforts is that you won't notice a thing: it'll just work.

Simon Frost is Technical Architect for BBC iPlayer .