Sometimes a user with performance issues will proudly present me with a traceroute and point to a particular hop in the network and accuse it of being the problem because of high latency on the link. About 1 time in 1000 they are correct and the link is totally saturated. The other 999 times, well, let me explain.

Traceroute Output

Here’s a typical traceroute I might be sent by a user (IPs and hostnames are altered to protect the innocent):

$ traceroute www-europe traceroute to www-europe (18.9.4.17), 64 hops max, 52 byte packets 1 gateway (57.239.196.133) 11.447 ms 18.371ms 25.057 ms 2 us-atl-edge (137.16.151.202) 13.338 ms 20.070 ms 19.119 ms 3 us-ga-core (57.239.129.37) 103.789 ms 105.998 ms 103.696 ms 4 us-nyc-core (57.239.128.189) 107.601 ms 103.116 ms 103.934 ms 5 us-east-core (57.239.13.42) 103.099 ms 104.215 ms 109.042 ms 6 us-east-bb1 (57.239.111.58) 107.824 ms 104.463 ms 103.482 ms 7 uk-south-bb1 (57.240.117.81) 106.439 ms 111.156 ms 104.761 ms 8 uk-south-core (57.240.117.61) 103.408 ms 104.430 ms 103.277 ms 9 uk-london-core (57.240.132.178) 131.883 ms 104.071 ms 104.161 ms 10 uk-london-edge (99.88.4.133) 104.642 ms 105.685 ms 106.011 ms 11 www-europe (18.9.4.17) 103.465 ms 103.630 ms 104.228 ms 1 2 3 4 5 6 7 8 9 10 11 12 13 $ traceroute www-europe traceroute to www-europe (18.9.4.17), 64 hops max, 52 byte packets 1 gateway (57.239.196.133) 11.447 ms 18.371ms 25.057 ms 2 us-atl-edge (137.16.151.202) 13.338 ms 20.070 ms 19.119 ms 3 us-ga-core (57.239.129.37) 103.789 ms 105.998 ms 103.696 ms 4 us-nyc-core (57.239.128.189) 107.601 ms 103.116 ms 103.934 ms 5 us-east-core (57.239.13.42) 103.099 ms 104.215 ms 109.042 ms 6 us-east-bb1 (57.239.111.58) 107.824 ms 104.463 ms 103.482 ms 7 uk-south-bb1 (57.240.117.81) 106.439 ms 111.156 ms 104.761 ms 8 uk-south-core (57.240.117.61) 103.408 ms 104.430 ms 103.277 ms 9 uk-london-core (57.240.132.178) 131.883 ms 104.071 ms 104.161 ms 10 uk-london-edge (99.88.4.133) 104.642 ms 105.685 ms 106.011 ms 11 www-europe (18.9.4.17) 103.465 ms 103.630 ms 104.228 ms

Look! the user cries, The link from atl-edge to ga-core is clearly all messed up because the latency goes from 20ms to 106ms!

Oh No It Doesn’t

Isn’t it amazing that the link in question apparently adds 90ms of latency, yet the link between hops 6 and 7 (the jump from east coast USA to the United Kingdom) appears to show no latency increase at all? In fact, isn’t it odd that the latency for every hop from 3 onwards is about the same?

I know that many people reading this will already know why this is, but for those who do not (and there’s no shame in that), this is indicative of there being an MPLS network in the path, and the MPLS Provider Edge (PE) is the router at hop 2.

Why?

Remember that one of the benefits of MPLS networks is that the network core (the Provider, or P routers) doesn’t have to know anything about the routes at the edge. The two things the P routers need to know are 1) where all the other MPLS-capable routers are (usually via OSPF or IS-IS) and 2) where to forward incoming MPLS frames based on the incoming labels. They are relatively dumb switches, and this which allows them to move traffic around faster than a native IP router could. So what’s the problem?

Traceroute relies on sending packets with an incrementing TTL; when the TTL expires, the router on which it expires will usually send back an ICMP message to the sender warning that the TTL expired in transit, and that’s how traceroute finds out about each hop in the network. Looking at the MPLS diagram above, what happens when the TTL expires on a P router? The P routers have no knowledge of the edge networks, so how could it route an ICMP packet back to a source it doesn’t know about? MPLS labels are one-way to the destination and there’s no return path included, so the P router does the only thing it can: it snags the outgoing label it was going to use and creates a new MPLS frame containing the ICMP TTL Expired message, and this frame gets switched all the way to the destination PE router (PE-B in this case).

PE-B receives the frame, looks at the ICMP message within it and looks at the destination address, which is my PC. As a PE router, it knows how to get to my PC (which label to use to send it into the MPLS network again), packages the ICMP packet up inside MPLS and sends it back into the MPLS network.

In other words, any ICMP TTL Expired messages generated within the MPLS network actually flow to the far side of the MPLS network and then back again, which is why they all have a similar TTL, and why in this example all thes TTLs are large (because in this case they would have to cross from US to UK then from the UK to US in order to get back to my PC):

If you’ve not seen this before it can be very confusing. As a result I’ve seen time wasted on troubleshooting links which actually have no problems, all thanks to traceroute.

Side note: Not all MPLS networks will push the incoming packet’s TTL into the MPLS frame, so the TTL will not always expire in the middle of the MPLS network. An MPLS network may therefore be seen as a single hop by the ICMP packet, so insight will not always be available into the internal nodes in an MPLS network.

Share this: Twitter

Facebook

LinkedIn

Reddit

