Today we have another guest post by Phillip Johnston about error logging. Phillip is an embedded systems consultant specializing in bare-metal firmware development in C and C++. Embedded Artistry is a website dedicated to teaching others about C, C++, embedded systems, and development processes. You can contact him via email, or on Twitter.

Logging Horrors

I was recently reading Timeless Laws of Software Development by Jerry Fitzpatrick when I encountered a mind-jarring section about error logging and its downsides (page 45, “Minimize Error Logging”). I have been a long-time proponent of error logging, and I’ve written many embedded logging libraries over the past decade.

While I initially was skeptical of Fitzpatrick’s attitude toward error logging, I started paying closer attention to the log files I was working with as well as the use of logging in my own code. What I noticed wasn’t pretty.

This quote from a Stack Exchange post neatly summarizes my general experience with error logs:

I recently had to investigate a field issue for our large enterprise application. I was horrified by the logs that I had to comb through in an attempt to find the problem and at the end of the day the logs did not help at all identifying/isolating the bug.

A Recent Experience

I experience this horror on a regular basis. Consider this simple log sample captured on the Snapdragon Flight drone:

[08500/02] 03:03.626 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.626 HAP:4133:41100 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.626 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.626 HAP:4133:41416 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.627 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.627 HAP:4133:41731 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.627 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.627 HAP:4133:42060 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.627 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.627 HAP:4133:42400 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.628 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.628 HAP:4133:42728 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.628 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.628 HAP:4133:43037 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.628 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.628 HAP:4133:43346 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.630 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.630 HAP:4133:43665 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.630 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.630 HAP:4133:43993 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.630 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.630 HAP:4133:44343 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.630 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.630 HAP:4133:44660 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/02] 03:03.630 HAP:4133:I2C_Write failed: 285212872 num_bytes_written 0 0414 i2c.c [08500/00] 03:03.630 HAP:4133:44968 Error: i2c write failed. Reported -1 bytes written 0270 I2CDevObj.cpp [08500/00] 03:03.630 HAP:4133:44999 Initialized 1 Sensor (max supported: 16) 0130 sensor.cpp [08500/00] 03:03.630 HAP:4133:45031 --- Sensors Detected --- 0133 sensor.cpp [08500/00] 03:03.630 HAP:4133:45056 Sensor 0: switchAddr=0x70, switchChannel=0 0139 sensor.cpp

If you’re a new developer working on this project for the first time, or a client who is testing my software, it is reasonable to think that something terrible has happened. Since the log file looks so concerning, you might spend an hour or two investigating these errors before realizing that these I2C write failures are expected and acceptable.

This particular program supports dynamic detection of connected sensors using an I2C write. If the sensor is not connected, the write will fail and we will move on to the next slot. Even worse, the same I2C error spew occurs if you try to run an I2C bus sweep to detect connected devices. The log becomes practically worthless in such a situation, as each I2C sweep call would generate up to 128 I2C error prints. The obvious answer is to disable the log statement, but the underlying I2C layer was provided to us as a pre-compiled library.

This example is hardly a shining beacon of useful error logging, but is indicative of typical logging experiences.

The Dark Side of Error Logging

Logging is a seductive technique. We cannot always debug failures when they occur, and customers and teammates offer us vague descriptions of the problems they run into. Logging is the natural step to bridge this gap. We can create an impartial reference that allows us to reconstruct the sequence of events leading to an error. Once that ball is rolling, why not log as much information as possible? You never know what will be useful, especially when you’re trying to debug a tricky 1-in-100,000 problem.

Consider this: how many times have you successfully identified and fixed a critical issue using existing log statements? I can’t think of a single case where existing log statements were useful enough to debug a problem – and in such cases, the natural reaction is to add additional logging.

Just because a technique is cherished, doesn’t mean it’s wise.

–Jerry Fitzpatrick

Disadvantages

One of the primary disadvantages of error logging is that it adds additional code, and we want to minimize the code we write and maintain. We typically add conditional logic or exception handling blocks in order to detect and log errors. Logging code is unattractive and breaks up the flow of a function, making code more difficult to parse at a glance. Collectively, these factors contribute to an increase in our program size and clutter.

Logging isn’t free. There is always a runtime and log storage cost. In some cases, especially on embedded systems, log statements can create or hide timing-related bugs. Even if logging calls are disabled in release builds, the error detection logic is rarely disabled. Complex inputs to logging functions may still be evaluated even if the results won’t be logged.

Log files are often ignored by developers and users. If a log file indicates an obvious failure or bug, someone has to actively look at the log to notice it. This is why we should prefer assertions and unit tests: they will catch and report failures in an automated fashion, ensuring errors will be fixed quickly.

Once a log statement has been added, it is rarely removed. Many log statements become irrelevant or misleading due to code changes. Our logs end up full of red herring statements that send us chasing false errors or unrelated problems. Teams will fight back against removing statements with vague arguments such as, “You never know when we might need that information”.

Software maintenance is a fight against disorder and clutter, and our log files are no different. Without active push-back against over-logging, our log files become unapproachable, indecipherable, and send us on snipe hunts.

Common Anti-Patterns

Aside from general disadvantages with error logging, I’ve observed a variety of anti-patterns across the numerous code bases I’ve worked on. The three most common are:

Log-and-throw Logging unexceptional or uninteresting failures Senseless log statements

For a more detailed dive into exception handling anti-patterns, check out this post by Oracle.

Log-and-throw

The most common (and most annoying) anti-pattern is log-and-throw, which is applicable both to error logging and exception handling.

Consider this exception example:

try { someFunction(); } catch (…) { log_error("Exception occurred!"); throw; }

Or this return example:

int r = someFunction(); if(r < 0) { log_error("SomeFunction returned error code: %d

", r); return r; }

The general flow of both cases is to check for an error or exception, log that it occurred, and then re-throw the exception or return the error. In most cases, the caller of the function will detect an error, log it, and re-throw or return. This often happens all the way up the call stack. Each additional error check and log statement adds code bloat and duplicated information in the logs.

If we cannot handle an exception, we should not catch it. Additionally, we should either log the error or throw/return it, but never both. This ensures that each error is logged only once.

If you’re looking for a quick way to reduce the lines of code in your programs, eliminate duplicated error detection and log statements.

Logging Unexceptional or Uninteresting Errors

It’s difficult to get entire teams to agree on definitions for FATAL , ERROR , WARNING , INFO , and DEBUG . This is especially true if there is nobody enforcing proper use of logging levels.

In many cases, I observed that developers will always log at whatever levels are shown by default. This is the lazy workaround for changing the logging level during runtime, and it results in decreased log usability.

Additionally, not all errors are created equally. In the I2C write example above, failures are acceptable and expected. By logging that case as an error, the library’s log statement communicates a different level of seriousness than our code does. The I2C write failure should have been demoted to DEBUG , INFO , or WARNING .

By setting realistic expectations for utilizing different logging levels and enforcing the standard, you can keep your logs clean and usable.

Senseless Log Statements

While error logging has its uses, I regularly encounter vague log statements such as:

log_error("Auth failed"); log_info(""Status: %d", resp->status_code"); log_error("Unknown state: %d", state);

While these log statements make sense within the context of the function they’ve been added to, they are rarely understandable when reading through the log. Often I need to do a search through the code base to find out where a specific string is used. Common such strings such as “status” or “Unknown state” are often re-used throughout the code base, making it difficult for developers to understand which case they are hitting.

Make sure that your log statements provide a complete and unique statement that can be used debugging. It is helpful to provide additional context, such as the function or file name where the log function was called.

Error Logging Guidelines

I’ve painted a bleak picture of error logging, but there is no denying that it is a useful tool in our toolkit. However, we need to collectively adopt guidelines for a healthier approach to error logging.

Here are my error logging guidelines:

If it’s worth saving to a log file, it’s worth adding an assertion, unit test, or showing in the user interface. Prefer automated and visible solutions over logging whenever possible. Keep your logs small and simple. Log only the most obvious or critical failures. Only allow a minimal amount of logging into your mainline code. Brutally push back on logging during code reviews. Only allow statements in master when need or usefulness is demonstrable. Focus on only using logging for controlled experiments. When the experiment is finished or the bug is fixed, delete the logging code. Log each exception or error only once, at the highest level where it is handled. Add a last-chance exception handler at the entry point of your program for cases where you did not catch an exception. Create sensible and understandable log statements. These messages should be useful on their own and not require hunting through the code base to find context. All logging should be disable-able. Logic and input arguments must not remain in the program if logging is disabled.