Google have very recently changed their behaviour when a Name Server doesn't respond nicely to a UDP request for a CNAME record. So now if the response is not "nice", Google does not get the CNAME record and when you ask Google servers for it - it's not found.

Google are helping white-list servers in the mean time, and hopefully are going to come to some longer term solution since it looks to be affecting lots of people.

The Google Public DNS Discussion Group is where you can see the problems and is where you request help on the white listing discussed below.

Symptoms

The problem will typically show itself in one of these ways:

customers saying your web site is not found

customers can't ping or look up (eg nslookup) your server name (eg myserver.mydomain.com)

parts of your software are logging errors like "server not found"

nslookup or dig are returning errors like "SERVERFAIL" or "Warning: Message parser reports malformed message packet"



The errors / indicators above cover a variety of problems, so we next need to confirm whether the problem is the specific case mentioned - Google's recent changes to DNS.

Confirm the Cause is Google's Change

Below, replace "myserver.mydomain.com" with the name you are having trouble with.

1. First test: Can Google DNS see your server (DNS entry)?

Linux/Windows

nslookup myserver.mydomain.com 8.8.8.8

The command above will lookup your server name using Google DNS server 8.8.8.8. Google should know your server name (especially if this problem has just turned up).

If you get an error like this:

*** google-public-dns-a.google.com can't find myserver.mydomain.com: Non-existent domain

then we need to carry on. If you don't get an error, your problem lies elsewhere and unfortunately this blog probably won't help you.

2. Second Test - Can other DNS servers see your server?

The following commands will test your domain against other well known servers. Google has a great page about diagnosing DNS issues where I found these server names.

nslookup myserver.mydomain.com 4.2.2.1 nslookup myserver.mydomain.com 4.2.2.2 nslookup myserver.mydomain.com 208.67.222.222 nslookup myserver.mydomain.com 208.67.220.220

If those commands successfully lookup your server name, then the problem IS GOOGLE SPECIFIC - carry on to try to fix the problem below.

If the commands also have errors, then your problem is not Google-DNS specific. You'll need to go back to some fundamentals about your server name entry in DNS (has it been deleted or changed?). If LOTS of DNS servers can't see your server name then there's a broad issue.

Fix the Problem - NOW!

Given you might be stressed if you've reached this point, let's get straight to fixing it. You should DO BOTH FIXES so it doesn't occur again.

1. Immediate fix - short term

Google are being very friendly about helping out people affected by the problem, so they will "white-list" the necessary name servers. This will take less than 2 hours to take effect.

You have 2 steps to do:

i) find your name servers

If your problem record is "myserver.mydomain.com" then you want to find the Name Servers for the domain "mydomain.com". You can do this by running the commands below, or going to your service provider web site (who ever you signed up to provide you with your internet domain name) and looking there.

nslookup -type=SOA mydomain.com

Which gives us

Server: local.lookup

Address: 10.1.1.1 Non-authoritative answer:

docmosis.com

primary name server = my.nameserver.com

responsible mail addr = support.mydomain.com

serial = 2015010901

refresh = 86400 (1 day)

retry = 7200 (2 hours)

expire = 3600000 (41 days 16 hours)

default TTL = 86400 (1 day)

ii) ask Google (nicely) to white list the primary name server (bold above).

Go to the Google Public DNS Discuss Group forum and post there asking for help. Our issue was resolved within the hour via this forum.

Then just watch for the fix to take effect.

2 Longer Term Fix

This is a longer term fix because most service providers are not very responsive to such changes (we're still waiting). The correction to a probably long standing problem might be fairly hard for the service providers, but hopefully being driven by Google will help them get on with it. In the mean time, the short term fix above by Google is exceptionally important to restoring normal operation.

We need to collect information to talk to the responsible service provider to have them fix the issue. We can use the NSLOOKUP (Windows and Linux) or DIG (linux) commands to get the information:

Using "nslookup" (windows or linux) - remember to replace the 2 names with yours:

nslookup myserver.mydomain.com my.nameserver.com

results in

;; Got SERVFAIL reply from 117.55.229.73, trying next server

or, using :dig" on linux:

dig +norecurse myserver.mydomain.com. @my.nameserver.com

results in

;; Warning: Message parser reports malformed message packet. ;; Truncated, retrying in TCP mode.

Send the above error information to your service provider (the owners of the name server). They should hopefully take it seriously that Google aren't liking their implementation of DNS and do something about it.

3. Alternative Longer Term Fix

If you don't see much joy from your service provider in addressing the long term fix (remember you can use the commands above to test every now and then whether they have changed anything) you could also change service providers. Amazon Web Services have a DNS service called Route53 which provides the ability to have an "alias" to one of the Load Balancers without using a CNAME record. Using Route53, you would never have been exposed to this particular issue.

Changing service providers is non-trivial so do your reading and decide for yourself. I'll certainly be looking because our service provider has been disappointing resoloving this urgent outage issue.

Conclusion

Google's April 2015 changes to DNS processing have been a disruptive (certainly for me), but at least they have provided a responsive way to work around the issue.

The longer term fix is for you to convince your DNS service provider to respond to UDP protocol requests as expected by Google. Remember you can always switch to a DNS service provider that doesn't experience the problem.

Resources:

Google Guide to Diagnosing DNS issues

Google Public DNS Discussion Group