I just celebrated a year at a my customer. When I arrived the project was in a good shape with a few rough edges. There was a solid code review proces, a stable Jenkins server, motivated people, a reasonable amount of integration tests and a product owner who is very involved and very often in our vicinity. We serve thousands of users and our analist has the confidence to do a release at the busiest time of day.

Clean up your logging and keep it clean

In my first week I was amazed by the thousands of daily error messages in the production logging and the fact that nobody was screaming in panic about it. Most people were used to these amounts as it grew historically (not because people didn’t care, it just happened). The problem with this phenomenon is that small warnings are lost in the sea of errors. New features were introduced, logged some errors and nobody noticed. The customer noticed eventually. When your feedback loop is this long it is harder to fix the error since the features you created are not fresh in your memory anymore and could be ‘contaminated’ with code from other features.

My goal was to reduce the error logging to 0 and make sure the team is on the alert when error messages show up on the dashboard. Our logging is collected in a central place, distributed with Kibana and displayed on a large monitor in the team room. This made things very easy (When you are not collecting your logs in a central place it really is time to start now. Products like Logstash and Graylog are great tools to make this happen.) .

Now you have to create some structure in the error messages. Start with a top 10 of error messages and create tickets for them. This usually solves 80% of the problems. I also created a page with ‘known errors’ for these log messages with a ticket number (or an explanation of the solution). A lot of errors are recurring on the long term (even if you’re pretty sure they aren’t. Yes, also on your project), so this will be a good investment.

Repeat this cycle until the amount of errors is approaching zero.

Now it is time to prevent this from happening again by appointing a ‘developer of the day’.

Appoint a developer of the day

At my previous customer a developer of the day was introduced. At first it might seem like a waste of resources but it has several advantages :