Need relief for your DNS headaches? First, it helps to understand how the domain name system works under the hood.

DNS is the Internet's phonebook. Whenever you type in or click a human-readable web link (such as hpe.com), your web browser calls on a domain name system (DNS) resolver to resolve its corresponding Internet Protocol (IP) address.

DNS is essential. Without it, there is no Internet. Period. End of statement.

DNS is not just for browsers, though. If it runs on the Internet—Slack, email, you name it—DNS works behind the scenes to make sure all the application requests hook up with the appropriate Internet resources. Whether a website, email link, or FTP site, it has an IPv4 address or its IPv6 address equivalent, and the 13 DNS master root servers track them all. These authoritative DNS servers hold the addresses for every Internet-connected device in the world.￼

People and companies alike reach DNS via DNS servers. To preserve privacy, many are now turning from their ISP-based DNS servers to public ones, including Cisco OpenDNS, Cloudflare’s brand new 1.1.1.1 service, or Google Public DNS.

Unfortunately, DNS is not the prettiest of systems. "Nothing about DNS is as simple as it might look," according to Geoff Huston, chief scientist at the Asia-Pacific Network Information Centre, a regional Internet registry. As a result, when something goes wrong with DNS, network connectivity often goes haywire. And system administrators lose sleep.

How DNS works

DNS is structured in a hierarchy using different managed areas called “zones,” with the root zone at the top.

What does that mean for you, as an end user or a system administrator? Let's start with a simple example of you going to a website using your PC browser. After you click on a web link, your browser calls for the page's IP address. The browser accomplishes this by forwarding the request to your DNS recursive server. This server can run on a local server using a program such as BIND or Dnsmasq; your Internet service provider's DNS server; or a DNS service like Cisco OpenDNS, Cloudflare 1.1.1.1, or Google Public DNS. You, your IT department, or your ISP decide which one is used.

As the term recursion suggests, the recursive DNS server checks to see if it already has a cached DNS record from a higher level DNS server and if that address still has a valid time-to-live (TTL). The TTL defines the length of time (in seconds) for which a DNS record is valid and kept in cache; all DNS addresses have TTLs. The bigger the TTL value, the longer the DNS resolver holds the information. Because the Internet is constantly changing, even the most stable IP addresses (such as mail records) have relatively short TTLs: between an hour (3,600 seconds) and a day (86,400 seconds).

Your local recursive DNS server may be only the first server to be checked as the web browser does its best to connect you to your destination address. If your recursive server doesn't have the address in its cache, it sends the request further up the line to a DNS root name server. If the answer isn't there, the request is forwarded to the top-level domain (TLD) server. DNS servers are also constantly updated and interacting with each other.

Finally, if there's still no answer, the authoritative DNS master root server is accessed. These servers hold, and are the final authority for, DNS resource records. DNS is also the world’s largest distributed database.

Be aware that this is a simplification of the process. To really understand the subject, I recommend you study Cricket Liu and Paul Albitz's classic book, "DNS and BIND."

All this works great when it works—which is why you can type in a URL and arrive at a website seemingly immediately. But when it doesn’t work, well, that’s why you’re reading this article.

How DNS doesn't work

DNS has problems. First, it has no security built in, to speak of.

The Internet is filled with garbage sites—and thus DNS is, too. "Most new domain names are malicious," says Paul Vixie, the primary creator of DNS.

DNS can also be used to spread malware, most commonly in the form of DNS cache poisoning. In this type of attack, a DNS server falls into an attacker’s control. The attacker inserts bad information in the DNS data cache. Then, when you try to go to a site, the DNS response sends you to a bogus version of the site, which then infects your system with malware.

Adding insult to injury, DNS poisoning can spread. For example, if your ISP’s DNS is infected, the polluted DNS entry will spread to your local recursive DNS server and all the other DNS servers that rely on it for their own DNS records.

Fortunately, there is a fix for DNS cache poisoning. DNS Security Extensions (DNSSEC) queries whether an upstream DNS address is valid and digitally signs DNS data to ensure it stays valid. To eliminate cache poisoning once and for all, DNSSEC must be deployed at each step in the lookup process, from root zone to final domain name.

However, even though the software has been around for more than a decade, according to a 2017 APNIC study, DNSSEC has been deployed by only 1 percent of .com, .net, and .org domains. And of those, says the report, "39% of the domains use insufficiently strong key-signing keys; and although 82% of resolvers in our study request DNSSEC records, only 12% of them actually attempt to validate them.”

In short, even with DNSSEC, you can't trust DNS records. You need to work with your upstream ISPs and DNS providers to make sure DNSSEC is configured properly, every step of the way.

That’s not all. For one thing, DNS is historically underprovisioned. That's one reason why distributed denial-of-service attacks on DNS, such as the 2016 assault on Oracle's Dyn DNS provider, are so devastating.

Worse still, as one Internet backbone provider pointed out, DNS is the Internet's single point of total failure. We also have no choice but to use DNS, so we're stuck. We have to make it work despite any technical weaknesses.

Tracking down DNS problems

Many people accuse DNS for any and all Internet problems. That's because "web browsers tend to blame DNS when it's not DNS," says book author and DNS expert Liu. "It's often unfairly blamed. You must make sure it really is DNS and it's not just that your Internet connection is down." Otherwise, he points out, "you will end up on wild goose chases."

So when there's an Internet problem, look elsewhere, to begin with. Start with simple steps such as the classic "Are your cables connected?" You should also check to see if other people are having the same problem. For that, sites such as DownForEveryoneOrJustMe are handy.

Not sure where to start with hybrid cloud security? Yep, we have a dummies guide for that. Read it now

If you suspect a DNS problem with one of your own websites, find out if your site can be reached by users from other locations. For example, if you can't reach your site using your own DNS servers but others access it just fine, your problem may well be with your DNS. But if no one can reach the website or it resolves to a site that has nothing to with your company, chances are you haven't renewed your domain name. This is a dumb mistake, but it happens to even the smartest companies.

Once you're sure it truly is a DNS problem, you must work out exactly where it is.

Nslookup, the quick-and-dirty command to see if DNS is working on your current machine, shows the hostname and IP address of the DNS server configured for your computer. If this works, you at least know your local DNS is alive and well.

The primary tool for tracking down DNS trouble is the BIND dig command. It helps you see a named site's IP address records, view the query route a DNS server uses to get answers from an authoritative nameserver, and diagnoses other DNS problems. (There are other tools. While Liu says he uses dig most of the time, he also likes Neustar UltraTools DNS tools.)

To use dig properly, "think about how DNS works," Liu says. "There are so many components involved in DNS: local recursive DNS, DNS forwarder, and more. If any component goes wrong, it breaks. There are lots of different possibilities, and you have to trace it down step by step.

"You need to think like a DNS server. Walk your way up the DNS steps," Lui says. But, he warns, "This can be very tedious."

Start by looking through the logs. Liu uses grep on Linux; on Windows, he uses Event Viewer. But use whatever tool works best for you.

If something goes wrong, look in the DNS server configuration files. BIND servers have text-based config files with persnickety syntax. “It's very easy to make mistakes. I've been doing BIND for 30 years and I still make mistakes," Liu says.

Windows and Linux are somewhat different. “Many Windows administrators forget about DNS entirely, and they forget that it needs to be managed,” says Liu. While it is simple to set up DNS in Windows, troubleshooting problems isn't easy. “You must make sure the data is correct and kept up to date,” he cautions. “When the chips are down, [Windows admins] discover just how complicated things really are."

One way to prevent problems is to document the existing configuration. All IT departments should keep accurate records of their DNS setup. "Many folks make the job more cumbersome than it has to be, because they don't have the documentation on how DNS is set up in their organization,” say Liu. For instance, the documentation should make it clear: Is DNS configured to use more than one forwarder? Is there any forwarder? If you take the time to document the system when things are working fine, when you do encounter a problem, you won’t have to figure out as much in the heat of the “But the Internet isn’t working!” moment.

DNS trouble will happen. It's just a matter of when. But with the right steps, you'll be able to find the problem and resolve it.

DNS troubleshooting: Lessons for leaders