The hardware experiment – London colocation

By David Mytton,

CEO & Founder of Server Density.

Published on the 10th October, 2013.

Recently we’ve been reviewing the infrastructure that powers our server and website monitoring service, Server Density, and as a result we have started an experiment looking into buying and colocating our own physical hardware.

Currently, the service is run from 2 data centers in the US with Softlayer and we’re very happy with the service. The ability to deploy new hardware or cloud VMs within hours or minutes on a monthly contract, plus the supporting services like global IPs is very attractive. However, we’re now spending a significant amount of money each month which makes it worth considering running our own hardware.

In particular, the large servers which power our high throughput time series MongoDB databases for our graphing are very expensive when you project the cost out over a long period of time. We’re processing over 25TB of inbound data and working with that volume and making the graphs fast requires lots of RAM and big SSDs, both of which are very expensive when billed monthly.

What about the cloud?

Cloud infrastructure like EC2 or Rackspace Cloud is perfect for a number of use cases. It’s great for startups who want cheap (in the short term) servers and don’t know their workload patterns. It’s also great for elastic workloads and scaling quickly. However, our use case is completely different – we have a consistent level of traffic all day, every day and it only grows. It doesn’t fluctuate because servers are constantly sending us data all the time. This means it’s very easy to predict our workload and the flexibility the cloud offers isn’t necessary.

What about dedicated?

We currently have a mixture of dedicated servers (for our databases, which need high memory and guaranteed disk i/o performance) and VMs (for web servers and other processing tasks such as alerting which is usually CPU bound). These are managed by us but rented from Softlayer so we don’t need to deal with networking, failed hardware, etc. However, if you calculate the cost out more than a month you can actually buy the full cost of the server after around 6 months, and given lifetime of servers is usually more than that there is a significant cost saving.

So why choose colocation?

If you have a consistent workload then you can always buy significantly higher spec hardware at a much lower cost than renting it monthly. It counts as an asset and so there are also tax benefits. You get much more control over your infrastructure.

There are downsides

Renting servers from the likes of Softlayer or Amazon removes a lot of the “old school” sysadmin work. As soon as you manage your own setup there are quite a few things you need to consider:

What happens in an emergency when hardware fails? You need someone to physically repair/replace the hardware.

We will design our infrastructure so multiple servers can fail without needing immediate replacement. We already have redundancy on both server and data center levels so will do the same thing here. We’ll be able to fail over to an entirely separate data center if everything fails but will have redundancy on the server level too.

We’ll also consider the most common failure scenarios and ensure they can be fixed by remote hands rather than needing to be physically at the data center – things like hot swappable disks and power supplies make this easy, ensuring we keep spares on site.

And finally we’ll be fairly close to the facilities so can send people there if absolutely necessary. We’re looking at data centers just 3 miles from our London office, some a bit further away on the other side of London (12 miles away) as well as nearby European facilities in Amsterdam and Frankfurt.

Can you deal with all the extra technical requirements of the supporting infrastructure, in particular networking.

Network problems are the worst to debug because they’re often transient and can very easily cause massive problems. You’re responsible for this. We have in-house expertise, with several of us having prior hardware experience. We also have support contracts with key vendors so we can always escalate issues when necessary.

If you need to scale suddenly, can you provision new capacity in time, and do you have the financial resources to make the hardware purchases in one go?

This is more difficult but given our predictable demand, we don’t think this will be an issue. That said, we are looking at having either our primary or secondary data center also offer us rentable servers which can be introduced to the rack or at least private network on short notice. We can rent hardware from them on a short term basis whilst we ramp up our own hardware capacity.

Experimenting with one London colocation server

Given the importance of our infrastructure, we have decided to start the experiment with a single server to run our internal tools.

The old setup

Right now we have the following servers at Softlayer powering some internal stuff:

Build master (buildbot): VM x2 CPU 2.0Ghz, 2GB RAM – $89/m

Build slave (buildbot): VM x1 CPU 2.0Ghz, 1GB RAM – $40/m

Staging load balancer: VM x1 CPU 2.0Ghz, 1GB RAM – $40/m

Staging server 1: VM x2 CPU 2.0Ghz, 8GB RAM – $165/m

Staging server 2: VM x1 CPU 2.0Ghz, 2GB RAM – $50/m

Puppet master: VM x2 CPU 2.0Ghz, 2GB RAM – $89/m

Total: $473/m USD

It’s also worth mentioning that Softlayer include 1TB public data transfer by default, but these are all mostly doing internal private network traffic which is free anyway.

The colocation replacement server

We have purchased a Dell 1U rack server to replace these 6 servers:

Dell PowerEdge R415 Rack Chassis

x2 AMD Opteron 4280 Processor (2.8GHz, 8C, 8M L2/8M L3 Cache, 95W), DDR3-1600 MHz

32GB Memory for 2CPU (8x4GB Dual Rank LV RDIMMs) 1600MHz

x4 1TB, SATA, 3.5-in, 7.2K Hard Drive (Hot-plug)

1 Redundant Power Supply (2 PSU) 500W

Total: £2,066 GBP ($3,346 USD).

We also purchased rack rails and an additional x2 disks for spares. The plan is to virtualise the server and run multiple VMs from the one machine.

Our own RAID

To achieve our goal of server level redundancy and the ability for remote hands to fix as much as possible, the disks are hot swappable but we will also run with RAID10. However, the only RAID card Dell offers has an admin interface which requires Silverlight. We didn’t want to have a core part of the system rely on proprietary plugins which will most likely be end of life soon, so ordered the server without a RAID controller and opted to buy our own instead – Adaptec RAID 6405E SATA4 4 Channel Storage Controller PCI-Express – for £123 ($199 USD).

The Adaptec card comes with cables but I found they were too long! Next time, we will probably order cables from Dell but this time ended up ordering a shorter 30cm SAS/SATA internal cable instead.

KVM and setting things up

Servers are obviously designed to be used without any keyboard, mouse and monitor for the majority of their life which means you need to have these available for the initial setup. We didn’t have any non-Mac displays in the office so had to buy a cheap Dell TFT before @devopstom suggested a KVM Console to USB 2.0 Portable Laptop Crash Cart Adapter which simulates everything you need onto a Mac, Linux or Windows system.

Total costing

The total cost of the server is therefore £2,189 ($3,544 USD) which, based on replacing those 6 servers at $473/m, means we will break even on the hardware after 8 months. And since the Dell server is much higher spec, we will be able to fit more on there and/or give more capacity to the existing tools.

London colocation pricing

I found most colocation prices for London don’t charge you for the rack – everything is based around networking and power. The former is easy to figure out based on existing stats we have but power is more difficult – you actually have to buy the hardware and run different loads on it to figure out what you need using something like this Energie Power Meter.

Power

To make things more complex, some providers quote in Amps and others in kWh. They’re essentially the same (technically they’re not but for comparison you can just consider them the same).

Power pricing gets complex and is charged differently for each facility. For example, Telecity charge as follows:

Minimum 1.74kWh @ £677.92/m + £8.09/m capacity reservation charge. Above 2kWh, facilities management charge = £250/m/kWh + usage = £0.185/m/kWh + £55.11/m/kWh capacity reservation charge. e.g. 3kWh =

– FCM = £250 * 3 * 12

– Usage = £0.185 * 24 * 365.25

– CRC = £55.11 * 3 * 12

= £20,917.71/year

The Dell server (above) at idle draws around 0.57A but you have to test a real workload because if you go over the allocated amount you will likely be shut down. We’ve not got that far yet so I don’t have any figures for now – this will be in a followup blog post.

Locations for London colocation

There are a number of big name providers, plus quite a few smaller companies who resell space in the larger facilities and occasionally have their own. There seem to be 3 key sites in London:

West London / Heathrow, including Acton (conveniently close to our office!)

Central, around Holborn and the City

Docklands

Where you pick depends on things like:

How quickly can you reach the data center in an emergency? Do you need to send someone on-site to fix things? Are they coming from your office or from home? What public transport links are there, and what happens out of hours when trains etc might not be running?

Whether you have any strict latency requirements e.g. being close to the City/London stock exchange where milliseconds count for real time trading.

Networking

All colocation packages charge based on a minimum committed bandwidth. They’ll usually give you multiple 10/100/1000 ports which can burst but you pay based on a known monthly minimum. Pricing decreases based on the amount committed but generally looks like this:

10-50Mbps: £20-25/Mbps/month

51-100Mbps: £15/Mbps/month

100-150Mbps: £13/Mbps/month

One company (Coreix) offered us significantly lower rates starting at £50/m for 150Mbps going up to £1,500/m for 1000Mbps. This is suspiciously low compared to other providers so am unsure what to think about it.

These are prices for the networking products that data centers offer – this is usually a multi-homed product across multiple links/providers but you are free to work directly with the transit vendors that exist in each data center. If you have large traffic requirements, want to choose a specific transit provider or just want control over all the networking, this is an option. Without any custom needs and just moving into the world of colo, choosing a package from the data center provider is a good choice for us.

Inter data center metro fiber

All the big data centers are connected via a “metro” fiber ring so you can have multiple facilities, as we plan to do. We already get this from Softlayer for free as part of their private network – one of their great features – but with colo you have to pay for it. Pricing is based on committed usage and we do a large amount of internal traffic with database replication and processing of monitoring payloads.

Examples of pricing are 100Mbps for £150-350/m and 1000Mbps for £750/m, although again Coreix were much cheaper quoting us £350 for 1000Mbps.

Choosing a provider

We got quotes from Telecity and Equinix as the two big players, plus Andrews & Arnold (who also provide our office connectivity but are very expensive), Coreix and 4D as smaller players. The big guys own multiple facilities and the smaller ones actually have 1 of their own, and then resell space in Telecity. Coreix are suspiciously cheap compared to everyone else, by quite a large margin.

We’re likely to pick one of the big players for our core x2 data centers but for this experiment will host the internal tools server with a different provider. This gives us some vendor redundancy and the big guys only sell by the quarter or half rack at a minimum. Our tools server only needs 1U for this experiment, but ultimately we’ll be purchasing at least 1 full rack in each data center.

I’ll be following up this post in a month or so once we’ve deployed the server, with some final costings and anything else I learn. If you’d like to be notified of that, you can .