Transient errors are intermittent errors caused by a short lived outage of a specific resource or service. For example, a network route might be unavailable for a few seconds or milliseconds, a web service may be experiencing high load and sending intermittent HTTP 503-Service unavailable messages or a database you’re trying to access might be in the process of being moved to a different server and hence unavailable for a few seconds.

For many transient errors, it makes sense back-off and retry the current operation after waiting a few seconds. The back-off strategy employed could be one of the following:

Retry Immediately: Retry the failed operation immediately without waiting.

Retry at Fixed Intervals: Retry the failed operation after waiting a fixed amount of time. That is, the wait period between subsequent retries is fixed.

Retry with Exponential Back-Off: Exponentially increase the wait time between subsequent retries. Eg. Retry after waiting 2, 4, 8, 16, 32… seconds etc.

Why Exponential Backoff ?



Exponential back-off is super critical when communicating with any web service or any cloud based service like Windows Azure. If the cloud service provider is already experiencing transient issues, immediate retries from multiple client code tends to further worsen the situation. Sometimes this overload of requests leads to a Denial of Service (DoS) type situation for the service. To guard against such situations, many services will throttle clients who makes too many requests within a certain span of time. Using an exponential back-off ensures that each client calling into the service code gives enough breathing room for the service to recover.

Some exponential back-off algorithms also adds a randomly calculated delta to the back-off time. This ensures that if many clients are using the same back-off algorithm, their retry times have a lower probability of coinciding. For example, instead of using just the raw exponential backoff time which retries at 2, 4, 8, 16 seconds etc, the formula adds a random +/- 20% delta such that the back-off might happen at 1.7, 4.2, 8.5, 15.4 seconds.

So how do we Implement it Retry with Exponential Backoff ?



I’m going to show you three ways of incorporating exponential back-off in any code where retries are needed. This post will detail the use of a home grown retry logic with exponential back off that I’ve been using for a while. Subsequent posts will show how to do this via readily available libraries in a more sophisticated way. The benefit of using the home grown recipe is that you do not need to install any additional dependencies. Just copy paste the code snippet below and you’re all set.

Scenario:

We’re going to request the homepage of https://microsoft.sharepoint.com . This page requires a valid claims token – so the request I going to throw a HTTP 403 – Forbidden response. This response, while expected in this case is a nice way to simulate errors which we can retry.

Client code requirements:

We need to try the operation 3 times. The code should back off exponentially, i.e., the wait time between retries should increase exponentially. Eg. The first retry happens after 2 seconds, the second after 4 seconds, the third after 8 seconds and so on.

Client Code



The following code below creates the HTTP Request:

static void ExecuteHTTPGet(string requestUri) { HttpWebRequest request = (HttpWebRequest)WebRequest.Create(requestUri); request.KeepAlive = false; request.Method = "GET"; // This line will throw an exception if the HTTP GET fails HttpWebResponse webResponse = (HttpWebResponse)request.GetResponse(); int requestStatus = (int)webResponse.StatusCode; webResponse.Close(); }

Notice that we’re not catching any exception that might be thrown by the client. Catching the exception and retrying the operation will be delegated to our Retry logic.

Custom Retry Logic With Exponential Backoff



// Enum representing the back-off strategy to use. Required parameter for DoActionWithRetry() enum BackOffStrategy { Linear = 1, Exponential = 2 } // Retry a specific codeblock wrapped in an Action delegate static void DoActionWithRetry(Action action, int maxRetries, int waitBetweenRetrySec, BackOffStrategy retryStrategy) { if (action == null) { throw new ArgumentNullException("No action specified"); } int retryCount = 1; while(retryCount <= maxRetries) { try { action(); break; } catch(Exception ex) { if (maxRetries <= 0) { throw; } else { //Maybe Log the number of retries Console.WriteLine("Encountered exception {0}, retrying operation", ex.ToString()); TimeSpan sleepTime; if(retryStrategy == BackOffStrategy.Linear) { //Wait time is Fixed sleepTime = TimeSpan.FromSeconds(waitBetweenRetrySec); } else { //Wait time increases exponentially sleepTime = TimeSpan.FromSeconds(Math.Pow(waitBetweenRetrySec, retryCount)); } Thread.Sleep(sleepTime); retryCount++; } } } }

Here we first define an enum to specify the back-off strategies available. Based on the values in this enum, we’ve structured the code inside the catch() block of DoActionWithRetry() to modify the wait time for each subsequent retry. Notice how the formula uses a combination of retryCount and the base wait time specified to calculate the exponential wait time.

//Wait time increases exponentially sleepTime = TimeSpan.FromSeconds(Math.Pow(waitBetweenRetrySec, retryCount));

Putting it all together

So now that we have the operation we want to execute and a generic retry block, let’s use them in our main function:

static void Main(string[] args) { try { DoActionWithRetry(() => { ExecuteHTTPGet("https://microsoft.sharepoint.com"); }, 3, 5, BackOffStrategy.Exponential); } catch (Exception ex) { //At this point you can either log the error or log the error and rethrow the exception, depending on your requirements Console.WriteLine("Exhausted all retries - exiting program"); throw; } }

The code will retry the HTTP GET request on the url for 3 times and will throw an exception if it encounters a failure the fourth time around. When the number of retries has been exhausted, it’s typically recommended to log the exception and then terminate the thread/ application.

And that’s it !!!



Stay tuned for the next post which’ll show how to do this in a fancier way 🙂