How old is our kernel?

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

April 2005 was a bit of a tense time in the kernel development community. The BitKeeper tool which had done so much to improve the development process had suddenly become unavailable, and it wasn't clear what would replace it. Then Linus appeared with a new system called git; the current epoch of kernel development can arguably be dated from then. The opening event of that epoch was commit 1da177e4, the changelog of which reads:

Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!

The community did, indeed, let it rip; some 180,000 changesets have been added to the repository since then. Typically hundreds of thousands of lines of code are changed with each three-month development cycle. A while back, your editor began to wonder how much of the kernel had actually been changed, and how much of our 2.6.33-to-be kernel dates back to 2.6.12-rc2, which was tagged at the opening of the git era? Was there anything left of the kernel we were building in early 2005?

Answering this question is a simple matter of bashing out some ugly scripts and dedicating many hours of processing time. In essence, the " git blame " command can be used to generate an annotated version of a file which lists the last commit to change each line of code. Those commit IDs can be summed, then associated with major version releases. At the end of the process, one has a simple table showing the percentage of the current kernel code base which was created for each major release since 2.6.12. Here's what it looks like:

In summary: just over 41% 31% of the kernel tree dates back to 2.6.12, and has not been modified since then. Our kernel may be changing quickly, but parts of it have not changed at all for nearly five years. Since then, we see a steady stream of changes, with more recent kernels being more strongly represented than the older ones. That curve will partly be a result of the general increase in the rate of change over time; 2.6.13 had fewer than 4,000 commits, while 2.6.33 will have almost 11,000. Still, one has to wonder what happened with 2.6.20 (5,000 commits) to cause that release to represent less than 2% of the total code base.

Much of the really old material is interspersed with newer lines in many files; comments and copyright notices, in particular, can go unchanged for a very long time. The 2.6.12 top-level makefile set VERSION=2 and PATCHLEVEL=6 , and those lines have not changed since; the next line ( SUBLEVEL=33 ) was changed in December.

There are interesting conclusions to be found at the upper end of the graph as well. Using this yardstick, 2.6.33 is the smallest development cycle we have seen in the last year, even though this cycle will have replaced some code added during the previous cycles. 4.2% of the code in 2.6.33 was last touched in the 2.6.33 cycle, while each of the previous four kernels (2.6.29 through 2.6.32) still represents more than 5.5% of the code to be shipped in 2.6.33.

Another interesting exercise is to look for entire files which have not been touched in five years. Given the amount of general churn and API change which has happened over that time, files which have not changed at all have a good chance of being entirely unused. Here is a full list of files which are untouched since 2.6.12 - all 1062 of them. Some conclusions:

Every kernel tarball carries around drivers/char/ChangeLog, which is mostly dedicated to documenting the mid-90's TTY exploits of Ted Ts'o. There is only one change since 1998, and that was in 2001. Files like this may be interesting from a historical point of view, but they have little relevance to current kernels.

Unsurprisingly, the documentation directory contains a great deal of material which has not been updated in a long time. Much of it need not change; the means by which one configures an ISA Sound Blaster card is pretty much as it always was - assuming one can find such a card and an ISA bus to plug it into. Similarly, Klingon language support (Documentation/unicode.txt), Netwinder support, and such have not seen much development activity recently, so the documentation can be deemed to be current, if not particularly useful. All told, 41% of the documentation directory dates back to 2.6.12. There was a big surge of documentation work in 2.6.32; without that, a larger percentage of this subtree would look quite old.

Some old interfaces haven't changed in a long time, resulting in a lot of static files in include/ . <linux/sort.h> declares sort() , which is used in a number of places. <include/fcdevice.h> declares alloc_fcdev() , and includes a warning that " This file will get merged with others RSN. " Much of the sunrpc interface has remained static for a long time as well.

. declares , which is used in a number of places. declares , and includes a warning that " " Much of the sunrpc interface has remained static for a long time as well. Ancient code abounds in the driver tree, though, perhaps unsurprisingly, old header files are much more common than old C files. The ISDN driver tree has been quite static.

Much of sound/oss has not been touched for a long time and must be nicely filled with cobwebs by now; there hasn't been much of a reason to touch the OSS code for some time.

has not been touched for a long time and must be nicely filled with cobwebs by now; there hasn't been much of a reason to touch the OSS code for some time. net/decnet/TODO contains a " quick list of things that need finishing off "; it, too, hasn't been changed in the git era. One wonders how the DECnet hackers are doing on that list...

So which subsystem is the oldest? This plot looks at the kernel subsystems (as defined by top-level directories) and gives the percentage of 2.6.12 code in each:

The youngest subsystem, unsurprisingly, is tools/ , which did not exist prior to 2.6.29. Among subsystems which did exist in 2.6.12, the core kernel, with about 13% code dating from that release, is the newest. At the other end, the sound subsystem is more than 45% 2.6.12 code - the highest in the kernel. For those who are curious about the age distribution in specific subsystems, this page contains a chart for each.

In summary: even in a code base which is evolving as rapidly as the kernel, there is a lot of code which has not been touched - even by coding style or white space fixes - in the last five years. Code stays around for a long time.

(For those who would like to play with this kind of data, the scripts used have been folded into the gitdm repository at git://git.lwn.net/gitdm.git).

Note: this article has been edited to fix an error which overstated the amount of 2.6.12 code remaining in the full kernel.

