Moving about 35 servers, something like 2000 pounds of computer hardware 50 blocks, doesn’t seem like that big of a thing. However, in our geek microcosm moving to a new colocation facility was a year long adventure with lessons learned, arguments, designs and redesigns, a hurricane, and many weekends of preparation that somehow resulted in a setup that we are all very proud of. We always want to share our experiences, so this post has some of the lessons we learned, and in the next post have all the technical detail of our new facility.

Lesson 1: Be Prophets

Sometimes you run your systems, and sometimes they run you. If you are not always looking forward, they eventually will ruin you and your job can seem like a post-apocalyptic nightmare. What your bottlenecks will be in capacity planing isn’t always obvious. Our network design was going to break down or get ugly unless we went from cabinets to a cage, and farther down the road our provider had limited power for our growth.

We did make the time to look to the future. We realized our capacity limits months before we started to feel the pressure, but because the process took a year there still was the feeling of the systems running us. They are cruel task masters – anyone that has seen Terminator or the Matrix know this. When they rule you, you end up in a reactive position and you start to lose control. The move was a reminder that this can’t be allowed to happen, we need to stay vigilant and look to the future.

Lesson 2: Beware the Purveyors of Colocation Space

They are tricksters and they want your gold. Some of them are masters of space and power and manipulate them to create the illusion of a good deal. Although a bit of an oversimplification, as a customer you generally care about a metric like “dollars per year to host a server”. However, colocation facilities generally don’t bill that way. They bill on space, power, and internet. So they do things like:

Offer cheaper square footage, but require more square footage per rack (Although more space can mean more room in a cage to move around in)

Only provide lower power options (i.e. 120/20 instead of something like 208/30) so you need more racks for the same amount of servers (Which will cause a bigger price difference over time with growth)

Force you to grow in increments of multiple racks so you are paying for space (And maybe even power) you don’t need as you grow

Put in some initial fees and initial higher prices just so they can remove or lower them to make you think you are getting a deal

If you model the costs based on your needs, accounting for growth, and look at the total cumulative costs you can see through their illusions.

Lesson 3: Holistic Design and Reality

We are control freaks. We decided to own everything inside our cage including the racks and PDUs. Short of building our own facility, this gave us a blank slate for the genesis of our perfect facility:

There are constraints of what is actually available to buy, and how much you can actually know what you are getting

Team members have different visions

Each choice you make along the way affects all the other choices. For example, Vertical PDUs mean you need a place on the rack to put them, certain types of cable management and cable arms might also take up that space. We discovered this as we went through a few different passes of various equipment that we had to return because it all didn’t work together.

Our biggest error with this was not making one person ultimately responsible for the physical design. Choices need to be made and not everyone’s ideas can be reconciled with each other and the constraints of reality. For a holistic design, eventually someone has to reconcile reality with what everyone wants or you end up with a bunch of individually well thought out pieces that don’t fit together (as well as a bit of frustration.)

Lesson 4: If it isn’t Right, Tear it Down and Do it Again

“One of my most productive days was throwing away 1000 lines of code.”

–Ken Thompson

When something isn’t right and you decide to move forward regardless then you may have to live with it for a long time. Even worse, it can create ripples forcing you to make further bad choices in other areas and cause broken window syndrome. The discpline to take a step backwards is a quality that is easy to respect and hard to have.

When the wiring wasn’t great, even though it took a lot of time: “AGAIN!” When we purchased the wrong stuff, we returned it and started “AGAIN!” In both these instances we did this more than once. It is often a hard call to make, but it will pay in the long run if you can summon your inner drill sergeant. Eventually in order to maintain momentum you have to move forward, but a rule that can help: If you are moving forward out of laziness then it is the wrong call.

Lesson 5: Disasters Really do Happen

“Superstorm Sandy” hit the week before we were supposed to move doing significant damage to the current facility. It did manage to stay up the whole time, but only because of a bucket brigade of people carrying diesel fuel. Oddly for us it was more like “Serendipitous Sandy”:

Gave us time to rectify some mistakes

Gave us confidence in our secondary datacenter since we failed over to it before the storm

Gave us time during our move since we were comfortable running out of our secondary facility

Mostly it was a good taste of reality, disasters happen and you better be prepared.

Lesson 6: It is Better When the Hosting Provider Owns the Building

If your building isn’t owned by the datacenter then there are more likely to be conflicts of interest and complications. We don’t know if our new provider is going to renew their lease, so we might be moving again. Also, during Sandy there were some issues with building access since the building owner wasn’t too worried about the needs of it’s customers’ customers. Don’t overlook this factor.

Lesson 7: Pace Yourself

The actual move itself is exciting but also exhausting. We set Alex Miller on the task of finding great physical movers and he delivered with Morgen Industries. They were fast, on time, personable, and flexible. They handled de-racking, moving, and re-racking the servers. Having that part handled by movers made it so we had more energy for configuring, cabling, and fixing problems that came after that. It played a major part in ending up with a great result.

When moving in we had shifts. You can only fit so many people in the space so this seemed like the most efficient method. However, the second shift spent it’s whole shift learning what the first shift figured out. So it is better to leave yourself the time to make sure there is overlap. A move is often a chance to get things right so make sure you pace yourself in a way that will allow your team to get it right.

Lesson 8: Value Craftsmanship

Moving was in many ways a chance to start over and fix issues that arose from rapid organic growth over the past few years. The whole team is proud of our result. It might make us weirdoes, but when cabling looks this neat it is a sexy and sleek piece of art. I am proud of the team and we are all proud of the work we did. That has real value, even if it isn’t easily measured. In the next post I’ll get into all the detail of the beautiful result of our chaotic journey.