09 June 2008

It's quite possible to make your living from the internet without really considering how it's constructed. I came across this talking with my friend Aaron. He's a bright guy but like many people he assumes the net is done "with satellites or whatever", or never think about it.

The general structure of the internet pretty similar to the global air network. Picture those glowing arcs connecting cities on the back of airline magazines. Then recall the hour you spent in line before boarding, the hour spent driving to the airport, the two hours from the big hub airport to the smaller city you really want to go to, and how relieved you are to be traveling now instead of last week when a storm in Chicago somehow messed up flights to Los Angeles. That's basically how your data feels, too.

The internet is, in fact, a series of tubes. Very fragile tubes.

The long-haul cables that run the global economy are less than 2 inches in diameter, buried under railroad tracks, highways, or ocean sediment. Within densely-settled areas the network is relatively redundant. But between countries and across oceans and mountains, data flows through an uncomfortably small set of bottlenecks.

Any minor disaster can damage large portions of the 'net. In 2004 the Miami/Sao Paulo traffic was suddenly re-routed through Washington DC, then New York, then across the ocean to Brussels(!), then back to SP. As late as 1998, a train wreck or wildfire in northern Florida could cut off large parts of Latin America for days. This year Suez suffered multiple failures and re-routing flooded the already overloaded Europe-Asia network for several weeks.

Even on a good day you can see the problem. Look at how a packet of data might travel from San Francisco to Hong Kong:



(start)

Folsom St, San Francisco (1 mile)

Pine St, San Francisco (2 miles)

Pine St, San Francisco

Oakland, CA (10 miles)

Sacramento, CA (80 miles)

San Jose, CA (120 miles)

Oakland, CA (40 miles)

San Jose, CA (40 miles)

Hong Kong (7,000 miles)

(finish)



That poor little packet of data rattled all around California, looking for an uncongested cable over the Pacific. For each hop, a decision is made to send it on to some other place that may have better luck. The system works pretty well under stress which is good because stress is there all of the time.

The internet is a map of trade volume between cities.

This is actually true for all forms of high-volume transport, so there is a lot of history to learn from. Infrastructure is insanely expensive and slow to build even though it almost always pays off in the long run. Short hops between financial/military/political/industrial hubs tend to get built up first. Just look at how many ways there are to travel between New York, Boston, and DC, for example.

Or look at area codes. At the time the precursor to the modern phone system was built, dialing a 1 took 1/9th the time of dialing a 9. Silly, but true. So there was a premium on lower numbers. New York's area code was 212, DC 202, Los Angeles 310, Chicago 312. El Paso, Texas? 915. Anchorage, Alaska? 907. Miami was definitely not a hub at the time, but it was important to the Navy and Air Force. Miami's code is 305.

So if several factors of demography, geography, and politics align, there may form a route of sufficient capacity between two points. If not, too bad. It takes years to build up demand, more years to begin the project, and more and more years to finish it. There are bribes, labor riots, sabotage, political chicanery, etc. The first trans-continental rail link in the US was completed in 1869. Twenty years earlier, people had been hijacking ships in Louisiana to sail around South America and crash-land ashore at San Francisco.

The same dramas play out when cables are planned and laid. The connection between Seattle and Tokyo is excellent. Ciudad Mexico and Dallas? Fairly new and really fast. New York to London? World-class. But try to get an email from Barcelona to Bangalore, and often you'll find that it routes through America. Companies are scrambling to build up Europe-Asia links. They've been scrambling since the late 1980's, and it will be some time before they get there.

The internet is not a magic leprechaun.

Even if there weren't these human problems there are still fundamental ones. Let's imagine a perfectly balanced world-wide network. You have an internet business based in San Francisco. Your people are in SF, your technology suppliers are in SF, most of your customers are in the United States. You have a small but growing customer base in Hong Kong and China. We have a perfect 'net, so there are no silly congestion problems and there is just as much bandwidth across oceans as between cities. Rack space in HK is twice the price as in SF.

So what is the no-brainer place to put your next server farm, hire people to maintain it, set up office space, pay property and business taxes, etc? Correct. Hong Kong.

No matter how good the internet gets it will never be faster than a small fraction of the speed of light.

Recently I was looking at the server logs of a site located in the US. The response times had a very high variance, which indicates a severe bottleneck somewhere. After a lot of poking around, trying to find the problem within the server farm, I had the bright idea to segment the logs by source country. And there it was: the response times were all over the map because the users were all over the map. From the perspective of the 'net, Stockholm and Singapore are just down the block while India and Sao Paulo are past the moon.

Rules of thumb:

- Light takes about 100 milliseconds to travel 10,000 miles and back.

- A network packet on a good route may take 3 to 10 times that long.

- A network packet has not "arrived" until a return acknowledgment has been sent all the way back.

- The longer and more complicated the route from here to there, the higher the chance (often more than 10%!) that a packet will get lost.

- The larger the file you send, the more packets it has to be chopped up into, the more likely one will be lost and have to be re-transmitted. All else being equal the transit time of a file increases more than linearly to its size.