The Basics of Troubleshooting – Part 2 – Traceroute

This is another blog post in my series called Interviewing for an IT Job. If you have not read the series announcement and my previous posts, please do so.

Index of Related Posts:

1. Interviewing for an IT Job

2. What You Need to Know When Interviewing For a Job in IT

3. What to Expect When Going Through the Technical Interview

4. What You Should Know about Headhunters and Recruiters

5. Tips for Networking Success

6. 5 Tips for Successful Webcam Interviews

7. The Basics of Troubleshooting – Part 1 – Ping

8. The Basics of Troubleshooting – Part 2 – Traceroute

9. The Basics of Troubleshooting – Part 3 – Firewalls

10. The Basics of Troubleshooting – Part 4 – NAT

11. The Basics of Troubleshooting – Part 5 – PAT

12. The Basics of Troubleshooting – Part 6 – 1:1 NAT

13. The Basics of Troubleshooting – Part 7 – Port Forwarding

Last week I introduced the most basic technique and usually the first step in troubleshooting a device that cannot be reached by a user. We used the networking utility Ping, which in most cases can give you the first clue about the problem at hand.

It is very important to understand that although Ping is a great tool for troubleshooting, however, it is not foolproof, and as I will demonstrate in my articles, ping may not reach the intended destination due to a number of other limitations (e.g. firewalls, MTU, ICMP filtering, etc.).

Today we are going to add a twist to the problem I presented last week and I will demonstrate how to use the networking utility Traceroute.

Disclaimer

The scenarios I am going to use for these posts will vary in complexity and they may not be as accurate as you may come across in your organization.

If you are a seasoned network engineer, systems administrator or help desk engineer you may find these troubleshooting scenarios very simple.

Keep in mind that the examples I am going to use here have the sole objective to expose you to the basic methodology of troubleshooting.

In each scenario, I am going to emphasize what are the logical steps that I expect from a candidate or support engineer when troubleshooting a particular problem. In most cases there will be more than one logical path that will help you identify the problem. That is OK. I just want to expose you to the tools, concepts, naming conventions and technologies that you should know to be successful in this career.

It’s Troubleshooting Time!

We Are Back at Acme Corporation

Acme has a few users connected to their Local Area Network (LAN). For this scenario we are going to work with a laptop that has a fixed IP address of 192.168.1.200. Acme has a Windows file server. Its IP address is 192.168.1.10 and its Fully Qualified Domain Name (FQDN) is app.acme.com. In this very simplistic network, all devices are connected to a single network switch. Acme’s router’s IP address is 192.168.1.1 and its FQDN is router.acme.com. There is a firewall between Acme’s LAN and the Internet. We are going to connect to Google’s servers. The IP address of the server we are going to use here is 173.194.115.18. Its FQDN is www.google.com. The Internet is represented by a cloud. One of Acme’s employees works from home.

Assumptions

For this first troubleshooting scenario, we are not going to take the firewall (5) into consideration. We are going to assume that all traffic from the LAN can reach the Internet (7) and vice versa.

We are not going to worry about protocols, PAT, NAT, etc.

Acme LAN users should be able to ping all devices on the LAN and WAN (Wide Area Network, in this case the Internet).

You should be able to troubleshoot all the issues on your own.

We are just going to identify where the problem may be located.

You are working from the same office as Acme’s user. Your PC, running Windows, is connected to the same switch.

For these scenarios, we are going to assume Google has only one server (173.194.115.18).

Troubleshooting Case

For this case, assume that you are working as a Help Desk support engineer. You are at your desk, the telephone rings and you need to help the user on the other end. Here we go…

John, one of Acme’s employees explains that last night he was working on a report for his boss. He was using Google to research data about widgets and everything was working great, but this morning, when he got to the office, he could not connect to Google’s search page at http://www.google.com. He tells you that he can connect to the XYZ application, which runs off the server app.acme.com. It’s 8 AM and he needs to get the report ready for a meeting at 10 AM. He needs your help.

So, what are you going to do first?

Before you start troubleshooting any issue, do the following:

Make sure you truly understand what the user told you. If necessary, ask more questions about the problem.

Write down all the important details, such as IP addresses, URLs, application and server names, and times when the events took place.

Let’s get started…

In this case, the user gave you some important information:

He was able to access www.google.com the night before.

He cannot reach the same URL this morning.

However, he can connect to Acme’s server on the LAN.

I would expect you to do the following:

Open a DOS window.

Type the following command: ping www.google.com. Ping is a great networking utility that can be used to test if a device can be reached.

Based on the picture above, what have we learned?

We know that the Domain Name System ( DNS ) resolution is working. DNS translated the URL www.google.com to the IP address 173.194.115.18.

( ) resolution is working. DNS translated the URL www.google.com to the IP address 173.194.115.18. We know that we cannot ping that IP address, as we are repeatedly getting the Request Timed Out message.

What you can do next, is to ping another website or IP address on the Internet. For example, you could try to ping www.yahoo.com.

That worked. We can see the reply messages from 98.138.252.30.

We have just confirmed that we can reach the WAN, however we cannot get to Google’s server at 173.194.115.18.

Last week, I told you that for the initial troubleshooting during my technical interview, I would be satisfied if you told me the server could not be reached and concluded that the server was unavailable

Here is the Twist

John, the Acme employee who called you about the issue where he could not reach Google’s server, has just called you back with more information. He says, “I was talking to Mary, the sales manager, and explained to her that I was a bit late with my report because I could not get to Google to finish it. She told me that she was working on a presentation and was using Google at that time without any issues.”

So, let’s look at the information once more (use Acme’s network diagram for your reference):

John is at Acme’s office (#1). He can connect to Yahoo but not to Google.

Mary (#8) is working from home. She can connect to Yahoo and Google.

You are located at Acme’s office (#1), like John. You tried to ping Google’s server, and got the infamous “Request Timed Out” message, indicating that you could not reach Google. However you were able to ping Yahoo’s server.

What’s Up With That?

I told you before that we are keeping this scenario very simple, so we are not dealing with firewalls, payloads, MTU sizes, etc. But before we move forward, I want to make you aware of how the networks that are a part of the Internet are connected to each other.

As you may know, to connect to the Internet you need an Internet Service Provider (ISP). Think about your Internet provider at home: Verizon, AT&T, Time Warner, Charter, etc.

When you connect to your ISP, your computer becomes a part of this large network called the Internet. Behind the scenes though, your ISP is also connected to other networks and ISPs through Internet Exchange Points (IX or IXP), which will interconnect and exchange Internet traffic between those networks.

So when you think about the Internet and how information travels from your computer to Google servers and back, you need to understand that there are many routes that can take that data from one end to another. Take a look at Sprint’s backbone in the US and notice the number of routes from coast to coast. These routes add redundancy to Sprint’s backbone, so if one route is unavailable for whatever reason (e.g. fiber cut), the traffic is re-routed through another route.

So back to our troubleshooting, just because you cannot ping Google’s server from your location, it does not mean the server is down; especially if you can ping the server from a different location (remember Mary?).

That’s when traceroute can help us troubleshoot the issue.

Let’s see how it works. Once again I am assuming there are no firewalls and I should be able to reach the server using ICMP. Google’s server IP address is 173.194.115.18.

I am going to type the following command on my Windows PC:

tracert 173.194.115.18



If you are using a Mac or Linux, type traceroute 173.194.115.18.

From the picture above, we can see that the packets are leaving my PC, reaching Verizon’s network and then timing out after reaching hop 5, which is a part of Verizon’s backbone.

So traceroute confirmed what we saw when we tried to ping Google’s server from Acme’s office.

As Mary reported, she was able to get to the server from her home, so what we should do is to traceroute the server from another location. There are many free tools on the Internet. For this example I am going to use Central Ops (http://centralops.net/co/).

Go to http://centralops.net/co/, click on Traceroute and type in the IP address 173.194.115.18 in the “to” box and click “go”.

As you can see, the trace is complete and the server (172.196.115.18) is responding.

Notice that Central Ops uses a different ISP (networklayer.com) and their ISP can reach Google.

When you use traceroute, you may even see the same ISP being used by different locations, but the routes may be completely different. It is not uncommon for data centers to have two or more ISPs connected to their networks (e.g. AT&T and Verizon) for redundancy purposes.

Wrapping Up

As you can see, when troubleshooting connectivity issues, especially for devices located on the Internet, if you just ping the device, you may miss the big picture. It is important to know that traceroute can give a lot more detail about the route taken by the packets leaving your computer towards the target device.

Traceroute may also help you identify whether the issue is in fact a routing issue with your ISP.

Resource List

Below is a list of links to important concepts and information that you should be familiar with.

Local Area Network (LAN) – http://en.wikipedia.org/wiki/LAN

Wide Area Network (WAN) – http://en.wikipedia.org/wiki/Wide_area_network

Fully Qualified Domain Name (FQDN) – http://en.wikipedia.org/wiki/FQDN

Domain Name System (DNS) – http://en.wikipedia.org/wiki/DNS

Uniform Resource Locator (URL) – http://en.wikipedia.org/wiki/URL

Router – http://en.wikipedia.org/wiki/Router_(computing)

Network Switch – http://en.wikipedia.org/wiki/Network_switch

Firewall – http://en.wikipedia.org/wiki/Firewall_(computing)

Ping – http://en.wikipedia.org/wiki/Ping_(networking_utility)

Nslookup – http://en.wikipedia.org/wiki/Nslookup

Traceroute – http://en.wikipedia.org/wiki/Traceroute

Ping-of-Death – http://www.cert.org/advisories/CA-1996-26.html

Denial-of-Service (DoS) Attack – http://en.wikipedia.org/wiki/Denial-of-service_attack

Network Address Translation (NAT) – http://www.cisco.com

What’s Next?

In my next article, we are going to talk about firewalls and go through another troubleshooting scenario.

Cheers!

Fabio.