Amazon Route 53 is a highly available and scalable Domain Name Service. As such, it efficiently translates names such as aws.amazon.com into IP addresses. Route 53 also includes a very handy failover feature. Once enabled, this feature performs health checks at regular intervals, and then switches to a backup site if the primary one appears to be unresponsive.

Today we are improving the health check model with an option for more frequent checks and additional control over the failover threshold.

More Frequent Checks

Route 53 activates each health check at 30 second intervals by default. You now have the option to reduce this interval to 10 seconds. With this faster check activated, Route 53 can detect unhealthy endpoints and initiate the failover process more expeditiously.

Failover Threshold Control

You can now specify the number of consecutive health check observations that are required for Route 53 to confirm that an endpoint has switched from a healthy to an unhealthy state or vice versa. You can vary this value from 1 to 10 observations as desired (the default value is 3 observations).

Choosing Intervals and Thresholds

You should take several factors into account as you choose the intervals and settings that are appropriate for your application.

Each health check is currently performed from a dozen or so locations (this number could conceivably change in the future). If you switch to a 10 second interval, each of your servers will receive 60 to 70 checks per minute.

Lower values for the failover threshold will result in a faster response when an endpoint becomes unavailable. However, low values may also initiate a failover when a server is overloaded for a short period of time.

In order to make sure that your users are routed to healthy endpoints on a timely basis, set the TTL (Time to Live) of the DNS records to 60 seconds (shorter times are allowed but may increase the number of DNS lookups and your AWS bill).

Putting all of the factors together, the total failover time can be computed as follows:

Failover = TTL + (Interval * Threshold)

If you have a TTL of 60 seconds, checks at 10 second intervals, and a threshold of 5, the failover process will take 110 seconds:

Failover = 60 + (10 * 5)

Console Support

You can activate and manage both of these features from the AWS Management Console:

There is a small additional charge for fast interval health checks. See the Route 53 pricing page for additional information.

— Jeff;