The ThousandEyes team spent several days last week at Interop monitoring the health of InteropNet, the volunteer-built network that powers the conference. We set out to instrument InteropNet in order to monitor the health of services used by conference attendees and vendors. Over the course of the week while the network was up and humming, we gathered performance data and dug into network problems. Let’s take a look inside InteropNet to see how it performed.

Instrumenting InteropNet

Gathering performance data for InteropNet was a relatively simple exercise. Our in-house InteropNet guru and ThousandEyes solution engineer Ken Guo installed six agents to generate tests and record performance data. These six agents, virtual appliances running on Dell servers, were distributed across InteropNet. Four were located on VLANs that served the exhibit hall and conference rooms (PEDs). Two were located in InteropNet data centers, one in Sunnyvale and the other in Las Vegas. With these six agents deployed, we were able generate a view of the InteropNet topology.

Figure 1: InteropNet topology with agents in green and network interfaces in blue. Larger blue circles represent devices with multiple interfaces. Links between devices are shown in gray, with thicker lines representing more paths from the agents to external services.

Figure 2: Using the same InteropNet topology as Figure 1, here we can see the locations of agents as well as interfaces in blue grouped by the device they are associated with (Avaya and Cisco switches and routers) leading to primary ISP CenturyLink (Qwest).

Testing InteropNet Performance

We set up a number of tests that actively probed services on InteropNet, including:

From InteropNet to key services: mobile app, registration server, social media sites, Salesforce.com and Webex

From InteropNet to the LV and SFO data centers as well as EWR and DEN edges

From POPs around the US to the Interop registration site, website and CenturyLink circuits

Local DNS resolver

Authoritative DNS server for Interop.com and Interop.net

Interop BGP prefixes

Troubleshooting Critical Services

While at Interop we were on the lookout for service interruptions. One that we noticed occurred with Salesforce, popular with sales and BD folks at the show. We noticed two periods where Salesforce was unavailable from InteropNet, each lasting up to 10 minutes in length.

Figure 3: Salesforce.com is unavailable from InteropNet locations in Las Vegas and San Francisco.

The lack of Salesforce availability corresponded with high levels of packet loss and latency on the path between InteropNet and Salesforce. Packet loss averaged 57% and latency jumped to 2 seconds.

Figure 4: Salesforce availability is impacted by high packet loss and latency.

When we drilled into the path visualization between InteropNet and Salesforce.com we immediately saw the culprit. InteropNet’s primary ISP, CenturyLink, peers with Comcast Business Network en route to Salesforce data centers in California. The two spikes in packet loss coincide with traffic dropping on the San Jose edge between CenturyLink (Qwest) and Comcast.

Figure 5: From InteropNet on the left, packets are being lost en route to Salesforce.com as they transit from CentruyLink to Comcast Business.

We reverse back in time by 30 minutes to see how these nodes were performing when availability was unaffected. At this point we see that CenturyLink and Comcast Business are peering in San Jose without issue.

Figure 6: Under normal conditions traffic transits the same CenturyLink and Comcast Business nodes en route to Salesforce.com NA1, on the right.

From this information, we can conclude that the two service interruptions of Salesforce.com on InteropNet were caused by changes occuring at the peering point between CenturyLink and Comcast in San Jose. In this particular case, the network hiccups occurred when most attendees were not likely using the show network. But having visibility allowed the InteropNet team to monitor for problems as they arose throughout the week.

Interop Show Network: Viewing InteropNet’s Autonomous System

We also monitored Border Gateway Protocol (BGP) routing to InteropNet over the course of the conference in order to gain visibility into any routing issues that might occur. BGP defines the preferred routes that traffic will take from networks around the internet to InteropNet, as identified by Interop Show Network Autonomous Systems (AS 290 and 53692). These two networks have routes via CenturyLink (Qwest) (AS 209), the primary ISP, to the rest of the Internet.

Figure 7: The Interop Show Network (AS 290 and 53692) is connected via BGP routes to CenturyLink (Qwest) (AS 209), the primary ISP, which then peers with dozens of networks to make InteropNet reachable to locations around the globe, in green.

It’s a Wrap

By now InteropNet has been torn down, only to be rebuilt again next year. In the end, InteropNet performed beautifully. Performance to key applications was speedy. Service interruptions were minimal. We had a blast helping out the InteropNet team build a network from scratch!