Azure SQL transient errors

Handling Azure SQL transient errors in .NET Core using Polly

One thing people will notice when migrating to using cloud services are intermittent errors occur at a higher rate than you’re used to when running on premise.

These errors are often called transient errors and can be mitigated by just retrying the operation. While some operations might be fine to retry in all cases i.e. fetching data, others like creating orders, deducting money, etc. might not or at least need a little more fine grain control. Also, some errors makes no sense to retry and as a result it’s better to fail fast.

Writing the code to handle can easy turn to spaghetti code and it isn’t code you want to copy paste as it’s easy to get wrong and hard to maintain — so you really do want a framework to assist you with these scenarios.

Introducing Polly

This is how the authors describe Polly:

Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner.

Which really sums up what it does, it lets you create policies using a fluent API, policies which then can be reused and can range from being generic to target very specific scenarios.

Polly is available as NuGet package and works with both old .NET and shiny .NET Core, which will be what I’m using in this post.

Also worth noting Polly is open source available on GitHub.

Polly’s also member of the .NET Foundation.

Creating policies

Defining a policy with Polly is really straightforward in the Polly namespace you Policy class which is what you utilize to create your policies

Above we’ve via the Handle<TException> method defined a policy that if a TimeoutException occurs it’ll wait 3 seconds and retry the operation 5 times before failing on the 6th time which is defined using the WaitAndRetry method.

Because the delay of each retry attempt in this case is retrieved using a Func<int, TimeSpan> delegate it’s really easy to implement more advanced scenarios like a sliding delay

to let our policy handle more exceptions we can use one to many Or<TException> method after the Handle<TException> call

you can also make your policy even more granular by inspecting the exception with an exception predicate

In this case we’ll fail fast if it’s a .tmp file that’s not found. You can also set policies based on result values, besides retries there’s also support for circuit-breaker, timeout, fallback, composing policies together and more, so the policy “engine” is very flexible.

Using policies

We now have our policy in place and want to put it into good use, here the policies Execute method comes in play

You can also create asynchronous policies to and then we can utilize the policies ExecuteAsync method

It’s almost too good to be true, there’s a lot happening under the hood — but in my opinion “hidden” under a very easy to understand and maintainable API.

Azure SQL policies

When is comes to SQL server there’s a few errors known to be safe to retry, so we explicitly look for those, so if we begin with it could look something like this:

where SqlRetryCount is just a constant for how many retries to do and ExponetialBackoff if a method that exponentially increases the delay between each attempt

TimeoutException:s will always be retried but SqlExceptions will be passed to AnyRetryableError method for assessment if it’s retryable or not

AnyRetryableError will iterate all errors and with the RetryableError method check if it’s an error known to be retriable and here is where the magic happens

we will switch on SqlException error number and use a few known constants to determine if it should fail fast or retry.

Taking the policy for a spin

In this case, I’m going to do a simple API that extends SqlCOnnection with WithRetry methods, i.e. open SQL connection

giving us an open sync and async method, which in its simplest form usage could look something like this

combining with a simple SQL to .NET object mapper like Dapper results in pretty clean code

the above code will open the sql connection using the retry policy, but the query will be executed without retry, which could be what you want for some operations, but for a select it’s often safe/what you want to retry the whole operation. For that we add an overload that lets us execute code within the policy boundary

this will open the connection and then invoke and return the result of Func you pass to it all within the scope of policy and just small refactoring of the calling code

and async variant usage could look something like this

Conclusion

Polly is really powerful, yet really easy to get started with and also fairly easy to retrofit into an existing application — so I definitely think you should take it for a spin.

I’ve created a complete sample repository on GitHub with the code from this post, thanks for reading! ❤