We all know (I hope) how important it is to comment code we develop and maintain. We do it to make it easier to scan and understand it. To be able to test and modify it. For our colleagues and ourselves when we get back to it a few months later.

I want to make a case that documenting data structures (tables and columns) in our applications databases is more important, but even more overlooked than documentation of code. Here are a few reasons why I believe so.

Database goes beyond implementation project

From the organization point of view application code is considered just within implementation project and team. A database, on the other hand, will be used in multiple team initiatives and projects. Its organizational span is much wider.

More access points

Think about it - application code will be "accessed" mostly by developers and a compiler. However, the database will be accessed not only by the application but by multiple other programs, reports, spreadsheets, ETLs, ad hoc queries, BI systems, etc.

More users

When you think about it, code and its documentation are used almost exclusively by developers. They are the one who needs to understand it. For most of the rest of stakeholders application, you are developing is a black box. If something goes wrong, users, testers, PMs or anyone else doesn't look into code in search for a reason but call development/maintenance guys.

It is slightly different in the case of data. There are much more people and applications that need to access it directly in the database. There are support consultants, business analysts, data analysts, developers in other teams (DWH/BI) to name a few.

Users are more scattered: organizationally and geographically

There is a good chance that a development team is a small group of people sitting in one room and chatting over coffee every day. But when you consider all the people accessing data they might not even know of their own existence. Those people span across the organization in various IT and business departments, outside vendors and consultants. This is also true physically - those people can be on different continents.

And in time

Data users are not only scattered in space but also in time. Many applications take few months (sometimes years) to implement and once they are deployed code, and functionality gets frozen. There are some external features developed (reporting etc.), but it happens that core code is gone. So is the team. But it's a different story with the database - there's a good chance that reporting and analytics will be developed throughout its entire lifetime. And even longer - some databases after applications get decommissioned are kept online for reporting and reference.

End of application is not end of data

When your application faces the end of its life, and you are migrating to something new, you may dump your old application with its code to the garbage. You can't do the same with data. Not only data owners will need to access data long after this application is gone, but they will also most likely need to migrate its data to the new system. And this requires a good understanding of the data model.

Summary

I hope that this article made you think. Think, and check if you have the documentation of your databases. If not, then it's never too late to start documenting them. Check out Dataedo - this lightweight tool will get you started in minutes.

PS. If you don't have comments in your code, it is a reason to worry as well.

Get a free database documentation tool