How to handle errors and exceptions in large scale software projects (With good practices and examples) By Filip | | 9 min. ( 1818 words)

“I didn’t run into any bugs in testing, so there are no bugs…right?”

Unfortunately, large scale software is too complex to be bug free; no matter how much testing you do.

You simply cannot test for all the different ways your users are interacting with your application. Therefore, it’s important to understand the differences between errors and exceptions in your application, and the correct ways to handle them so you can take a proactive approach to monitoring errors and maintaining a healthy application for both your development team and your end users.

Raygun lets you detect and diagnose errors and performance issues in your codebase with ease It takes minutes to add Raygun into your software. Be alerted to issues affecting end users and replicate problems 1,000x faster than using logs and incomplete information from users. Learn more and try Raygun free for 14 days

The problem with just testing

Even with the most thorough testing process, you are still only testing specific situations and have your own bias that comes into play.

Imagine suddenly thousands of users are using your application in different ways than you or your team even thought of; they will almost certainly run into something you didn’t during testing.

How to handle errors in your application properly

Simply put, bugs can lead to both errors and exceptions. Errors and exceptions are terms that have different meanings depending on who you ask.

The main question should be how can you better handle these errors and exceptions so they don’t have negative consequences.

Firstly, let’s look at some definitions, and why the differences are important.

Errors and exceptions - what’s the difference?

Some programming languages have their own definitions for errors and exceptions, but I’d like to define the differences:

Note: The examples and specifics in this article are from .Net but the key principles are not language specific

Errors

Programming errors where there is no way to recover/continue gracefully and usually need a programmer to step into and change the code to make the fix. Errors can sometimes be turned into exceptions so that they can be handled within the code.

Error handling best practices

Errors can usually be avoided with simple checks and if simple checks won’t suffice errors can also turn into exceptions, so that the application can handle the situation gracefully.

Exceptions

Take advantage of language specific semantics and represent when something exceptional has happened. Exceptions are thrown and caught so the code can recover and handle the situation and not enter an error state.

Exception handling best practices

Exceptions can be thrown and caught so the application can recover or continue gracefully. Unhandled exceptions (which are errors) can also be logged so they are looked at by a developer to fix the underlying error.

Example one

A user error; where the user enters the wrong data is not exceptional and does not need to be handled with an exception but can still result in an error/unrecoverable state. The code should have simple checks to stop this from happening without an exception. You should have front-end and back-end validation instead and for this example, only throw an exception as the last defence.

Example two

A file won’t open and is throwing FileLoadException, or FileNotFoundException. This is an exceptional situation and should not break your application. Your application should be able to handle this, as this can happen for a number of reasons and because of that, you must anticipate this.

“It is an error to not handle an exception.” Henning Thielemann

So now we have defined errors and exceptions, there are some easy to follow processes that are great for handling errors, which I’ll go into below.

What can go wrong will go wrong…at least once

So; if I catch every exception _my code will be __errorfree right?

As I mentioned earlier, not all errors result in an exception. The main problem with this conclusion is you don’t know what is going wrong. There could be a number of issues with your code and by catching the exception and doing nothing with it, you lose this information.

Don’t just catch every exception and continue as if nothing has happened.

The purpose of the catch block is to handle the situation where applicable.

What not to do: catch ‘em all

In the below example, the email object may be corrupted since we don’t know where or which exception was thrown. It’s highly likely this will cause problems if this variable is used outside of this try catch block later in the code.

ProcessedEmail email; try { email = ProcessEmail(rawEmail); var automaticResponse = GenerateAutomaticResponse(email); Reply(automaticResponse); } catch (Exception e) {} Save(email);

How to code the application to recover by itself

Throwing and catching exceptions is a great way to let the application recover by itself and prevent it from running into an error state.

If you know which type of exceptions might be thrown, it is better to be explicit within the catch block as each different type of exception will mean the code has unforeseeably stopped for a different reason. (We talk a little about architecting software errors for better error reporting here.)

Be specific with the exception type so you can provide feedback to the user (if applicable) and handle other situations more gracefully as you know exactly what has failed.

Why is it important to specify which type of exception to catch?

Depending on how your program continues, certain exceptions can corrupt data or behave in an unexpected way. This leads to errors down the road for the application.

If you know exactly which exception has occurred, you should know which steps to follow to recover.

Or, if you are unable to recover, you should know how to handle this situation gracefully.

So, can it recover?

A lot of the time, the exception has enough information to know what has gone wrong, and within the catch block you can sometimes recover from the error state. You can do this by fixing some data, data re-fetching, or even asking the user to try again.

You can catch exceptions but sometimes the application still can’t continue because the data it was relying on has been corrupted from an unrecoverable way or it was expecting the data to be in a different way.

Example

What about an OutOfRangeException on an Array? How can a program recover from this? This is an example of an error being turned into an exception. Your application expects the data to be in a certain way but this hasn’t happened. Although recovery isn’t always possible, it’s now possible to not enter the error state and handle the situation gracefully. If this is logged, a developer can fix this by adding some simple checks before the Array is accessed or change how it is accessed.

How to handle unhandled exceptions

There are exceptions you won’t expect, usually represent an error in the code.

You can log unhandled exceptions which aren’t caught by your code as most languages provide methods to do this (e.g .Net’s Application_Error and JavaScripts global on_error handler).

Any unhandled exceptions represent errors. Your code did not expect this, therefore was unable to recover or handle the situation gracefully.

It’s a good idea to log these so you are able to fix the cause. This way, errors won’t get constantly thrown as exceptions, and should be exceptional. If they do happen, you want to be know about them so you can catch and handle them.

Error logging can help by capturing these errors

Having a place where you can view these logged errors/exceptions is key to debugging but also in prioritizing what to fix and when.

Furthermore, you don’t want to be relying on screenshots and more information from already frustrated users. Error logging can also allow your team to be proactive when something goes wrong and actually contact the users affected.

This is so they know you are fixing the problem which will not only boost your customer relationship, but you can also fix the errors before other users run into them.

Example

An error in the code creating multiple incorrect billing charges is usually more important than an error which fails to display a specific details page, even if the details page error happens more often.

Ultimately, you want your application to run into as little as possible but when it does run into exceptions, you want to know about it.

Only 1% of users report errors, so that’s a lot of errors that are still out there in the wild.

A partial solution

Writing some code to save the exception and stack trace to a file or sending it via email so you are notified as the error occurs, are possible partial solutions.

Example

One user is running into thousands of exceptions. One hundred users are also encountering a less frequent error. Which one is more important? Without knowing the specifics of the error, the one that affects more users is more important.

Using the stack trace of the exception should help locate where the error might be and you should be able to either reproduce it or read the code to understand what went wrong.

Sometimes this still isn’t enough and the problem needs investigating further. If this happens, add more information to the exception before it is logged, comprising of context specific details (such as account IDs or specific object states) that will allow you reproduce the error locally.

Time to fix your errors

Now, you should have caught all of the errors and exceptions, and logged the unhandled ones…now what?

Depending on the scale of your application, noise from error notifications is a problem.

You can do some smart things with the email filtering/grep which can be useful to group and separate errors into different folders/files.

This can help but is only a partial solution to the issue of noise.

Years ago, I personally went down this path but quickly realized there are a number of reasons why this is only a partial solution. The trouble was, I was still unaware of which errors were affecting users the most. I was focused on the most thrown errors rather than the most detrimental to the application/user experience; and because of this, I never really had a clear view of what was going wrong.

I had no visual representation of what was going on, but had to run manual queries to figure it out, which was quite time consuming.

Conclusion

Errors and exceptions will always be thrown for large scale software. Handling your errors properly will define you as a software team create better processes around exceptions and errors.

Good applications contain code that will recover from exceptions when possible. Handling and logging exceptions is very important to the health of your software!

Further reading

Troy Hunt: Error tracking done right

Software errors and crashes

Error vs. exception