We ran tickgit against source code from commit 9bf52c2. The CSV output was then imported into SQLite to run queries against. Note that the tool only finds TODOs in the tree of the checked-out commit; it will not account for TODOs that were added and subsequently removed. Therefore, the numbers reflect only the TODOs still “live” in the code, at the commit.

Totals (for 9bf52c2)

2,380 TODOs across 1,230 files from 363 distinct authors

TODOs across files from distinct authors 460 TODOs with an assignee e.g. // TODO (patrickdevivo) Fix the ...

TODOs with an assignee e.g. 489 TODOs were added in 2019 so far

TODOs were added in 2019 so far 860 days (or 2.3 years) is the average age of a TODO

days (or years) is the average age of a TODO The oldest TODO is from Jun 6, 2014 (from “First commit”)

The most recent TODO is from Dec 9, 2019

This file has the most TODOs at 33

deads2k has added the most (current) TODOs (git blame) at 147

This commit added the most TODOs (that are still in the source) at 64

Summaries

Files with the Most TODOs

Top TODO Count by Author

Commits Adding the Most TODOs

TODO Count by Year Added

To produce similar results, try tickgit todos --csv-output to get raw TODO data. We used SQLite to query for the above summaries.

Conclusions and Questions

These results are from a fairly off-the-cuff look at what TODO comments in the Kubernetes source code look like. We get a sense of the top TODO creators, which tracks more or less with the top contributors to the project.

We also see that for “large” source code, developer behavior around TODO comments doesn’t seem to be out of the norm, there’s just more of it.

An important observation is that there are more TODO comments than there are Github issues. This is interesting, in that it indicates a significant amount of latent “work”…or to-do items, which are not easily accessible unless you spend time in the source code itself.

Core contributors likely have a good idea of their area of the codebase and strong intuitions about their own TODOs and “latent work.” This is fairly opaque to outside observers, though. Github issues (or other public ticket trackers) are more easily accessible to those not “in the weeds” of the project.

As most developers understand, software projects “live and breathe.” There’s frequent change, continuous improvement, constant imperfection and lots of discussions. Workflow and process are very important because good code requires continual reflection. We see a part of this in action through the use of TODO comments in the Kubernetes source. Without a benchmark, though, an average TODO age of 2.3 years does seem quite high. Those closer to the code will be much better able to pass judgment; perhaps it would be interesting to see how this source code compares to that of other big open source projects.

A more in-depth analysis of a codebase’s TODOs might involve a look at all of the TODOs in the history, not just the ones currently in the source code.

What’s the rate at which TODOs are closed over time?

What’s the average lifetime of a TODO comment?

How do popular codebases compare to one another?

Does it Matter?

TODO comments typically cover the type of work that might be too small for a ticket, but important enough to note and describe in a code comment (though plenty of TODOs will reference issues/tickets). Since they are part of the code, they are often “closer” to the work that needs to get done. They are easy to add, but, it seems, just as easy to lose (there are 1.8k+ TODOs added prior to 2019 still in the Kubernetes’ source).

We hope that by creating a tool that surfaces metadata about code, we can make it easier for software developers to get work done, in projects of any size. Surfacing TODOs is just one piece of that.