Version-string schemes for the Java SE Platform and the JDK

(This is a long note on a complex topic that's inherently difficult to discuss. If you wish to reply, please first read all the way through to the end.) In my proposal to adopt a strict six-month release cadence I suggested that, going forward, the version strings of feature releases be of the form $YEAR.$MONTH [1][2]. Thus next year's March release would be 18.3, the September release would be 18.9, and so on each year. Not everyone likes this proposal, which isn't surprising -- discussions of version-string schemes, much like those of language syntax, often tend to degenerate into bike-sheds [3][4]. That's due, in part, to the use of version strings -- across the software industry and for several decades now -- to encode multiple not-quite-orthogonal axes of information, which can answer different but often related questions: - Compatibility -- "Will my code break if I upgrade to this release?" - Significance -- "How different is this release from what I have now?" - Security -- "Does this release contain new security fixes?" - Support -- "For how long will this release be supported?" - Identity -- "On exactly which build was this bug reported against?" - Time -- "When did this release ship? How far behind am I?" Convention dictates that the principal part of a version string, i.e., the version number, be a sequence of numerals separated by period characters. (Let's ignore, for now, optional information such as pre-release status and build numbers.) Convention also dictates that version numbers be pointwise totally ordered, that they increase monotonically over time, and that the version number of a feature release be a prefix of the version numbers of its update releases. Given these conventions and a strict, time-based release cadence, which of the above axes are both important and appropriate to encode into version numbers? Which are practical to encode into version numbers? Which should have more weight, i.e., be encoded in the earlier numerals of version numbers, and which should have less weight, i.e., be encoded in the later numerals? Some considerations for each axis, in turn: - Compatibility is obviously important -- it's one of the core values of the Java Platform, after all -- but it's problematic in at least two respects and hence not a sound basis for version numbers. First: Compatibility is, itself, multi-dimensional and therefore difficult to encode into a simple sequence of numerals. What counts as an incompatible change? Some cases are obvious, e.g., a language change after which some old source files no longer compile, a JVM change after which some old class files are no longer valid, or an API change that removes an existing module, package, type, or element thereof. Many cases are, however, less than obvious, e.g., a language change after which some previously-rejected source files do compile, a bug fix that changes the element order of an array returned by an API, an enhancement that allows a command-line option to accept some previously-rejected arguments, or an optimization that removes an internal API. It might be practical to encode compatibility information into a two- or three-numeral version number for something as simple as a single library whose only interface with the outside world is its API [5]. It's far from clear how to do that, though, in a way that's easy for everyone to understand for something as complex as the Java Platform itself, and implementations thereof. Second: The compatibility of a particular release with any of its predecessors depends upon the set of features in that release. In a time-based release model, however, the set of features is not known until late in each release cycle, after the final feature is merged. This complicates discussions of any specific release and the tracking of changes in JIRA and related systems. If, e.g., we use the leading numerals of version numbers to encode compatibility in the usual way, with the first numeral increasing only when incompatible changes are made, then would the March 2018 release be version 9.1, or 10? We can't know until some time in December 2017, when the release closes for stabilization. We could address this problem by establishing secondary, time-based labels for releases, but that would be awkward and could lead to even more confusion. - Significance is even harder to measure than compatibility, and like compatibility it depends upon the set of features in a release and hence can't be known until late in a release cycle. The best we can do for significance is insist that, over time, differences in version numbers roughly reflect differences in release content. An increment of the first numeral of a version number should indicate a greater amount of change than increments of later numerals. - Security is important, but the security level can't be encoded in one of the earlier numerals of a version number since it evolves at a rate that's unpredictable relative to all the other axes and would therefore violate the monotonicity constraint. (JEP 223 [6] solved this problem by using the third numeral of a version number to record the security level of a release within a particular major-release family, resetting that number only at the next major release. That scheme was, however, designed under the assumption of multi-year major releases, each of which could have several simultaneous update-release lines. If security fixes are routinely delivered in one stream of update releases per feature release, as envisioned in this new model [7], then there's less reason to encode the security level in the version number.) - The support lifetime of a release is useful information, but it's not appropriate to encode that into the version number of the Java SE Platform or the JDK. The version number should be identical in all implementations of a given release, but the support lifetime of a release may vary from implementor to implementor. Oracle might choose, e.g., to offer support to its customers for twenty years on releases three years apart, but another implementor might offer support for ten years on releases two years apart. - Identity is important, especially for use in bug reports, but it need not be encoded in the version number itself. It's reasonable to ask that bug reports include the full version string, so it suffices to include a build identifier or other implementation information after the version number itself (e.g., 9+181, the full version string of JDK 9 GA). These considerations leave us with the final axis, time, as the leading candidate for the primary basis of Java SE and JDK version numbers. This would be a departure from past releases, in which we've used version numbers that roughly encode both compatibility and significance. It is, however, a better fit for a strict, time-based release model since the version number of any particular release is known well in advance. The compatibility level of a release would still be indicated by the length of its version number, since we'll continue our long-standing practice of making obviously-incompatible changes only in feature releases. The security level of an update release would, similarly, be reflected in its version number as a whole, since a later release will always contain more security fixes than its predecessor. The main remaining question, then, is that of how to encode time in version numbers: As an absolute value, derived from the date of the release, or as a relative value, measuring the amount of time since the previous release of the same type? In the abstract, absolute times have three attractive properties: - Absolute times reflect release dates, so they make it clear to all involved -- both developers of the JDK and users of the JDK -- that these are time-based releases. There can be no question of delaying a release in order to add "just one more feature" to it. - Absolute times make it easy to figure out how old a release is, so that as a user you can understand how far behind you are. Relative times require you to know what the time units are, and when these time-based version numbers were adopted. - Absolute times are independent of the release cadence. If in a few years we switch to an even faster cadence, say every three months, then an absolute scheme would need no change but a relative scheme would need to be revised with a new time unit and starting point. Now, at last, for some concrete alternatives: (A) Absolute: $YY$MM, padding the month number with a zero as needed, and $YY$MM.$AGE for update releases, where $AGE is the number of months since $YY$MM. (B) Absolute: $YY.$M as proposed, without padding the month number, and $YY.$M.$AGE for update releases, where $AGE is as above. (C) Relative: $N, where $N is the number of half-years since JDK 9 GA (September 2017) plus nine, and $N.$AGE for update releases, where $AGE is as above. ($AGE is more useful than another incrementing counter since it leaves room for emergency update releases without having to renumber subsequent update releases that are already in development.) Examples of these alternatives, for the next two feature releases and their first two update releases: (A) (B) (C) GA (March 2018) 1803 18.3 10 First update (April) 1803.1 18.3.1 10.1 Second update (July) 1803.4 18.3.4 10.4 GA (September 2018) 1809 18.9 11 First update (October) 1809.1 18.9.1 11.1 Second update (January) 1809.4 18.9.4 11.4 Some pros (+) and cons (-) of each alternative (some of these points are subjective, but so it goes with this topic): (A) $YY$MM, with $YY$MM.$AGE for updates (+) Has the three advantages of absolute times. (+) The `Runtime.Version` API introduced by JEP 223 can be adapted fairly easily. Code that parses raw version strings would need little or no change (as long as it already does so correctly!). (-) On the surface 1803 is an enormous leap from 9, is likely to cause confusion, and has connotations of being very old [8]. (B) $YY.$M, with $YY.$M.$AGE for updates (+) Has the three advantages of absolute times. (+) Similar to some other significant platforms, e.g., Ubuntu Linux, and less shocking in appearance than (A). (-) People unfamiliar with the scheme could conflate 18.3 and 18.9 as being minor releases of JDK 18, which isn't the case. There is some evidence of similar confusion around Ubuntu releases [9]. (-) The logical "major" version number is now a pair of numbers, year and month. We could mitigate this in the `Runtime.Version` API by encoding the year and month as $YY$MM in the existing major number, and adding new methods that return the year and month. Code that parses raw version strings will likely require change, including code not just in the JDK itself but in existing tools and CI systems [a]. (C) $N, with $N.$AGE for updates (+) The most straightforward and least-surprising option, and familiar from other rapidly-evolving projects such as Firefox and Chrome. (+) The `Runtime.Version` API can be adapted very easily, and code that (correctly) parses raw version strings would need no change. (-) Lacks the three advantages of absolute times. (-) If we ever switch to an even faster cadence then we could eventually have very large version numbers, as in (A). In the limit we could wind up in a situation like that of CoreOS, whose latest stable release is numbered 1520.6 [b]. These are three plausible alternatives; there are countless others, but I suspect that many if not most are minor variants of these three. To mention just two examples: - We could simplify our grandchildren's lives and represent the year with four digits rather than two. That would, however, lead to even longer version numbers. - We could zero-pad the month number in (B) so as to be exactly like Ubuntu ($YY.$MM) which might make it a bit more obvious that JDK 18.03 isn't an update release of JDK 18. This would only work, though, so long as we never ship a feature release after September in any particular year. (Ubuntu ships in April (04) and October (10), so zero-padding really only helps them half the time.) * * * If you've read this far, my question to you now is not the question that you might expect. Please don't say which version-number scheme you prefer for Java SE and the JDK. Instead, please only communicate any additional information that's relevant to the choice of such a scheme. In particular: - Are there additional pros and cons to the alternatives listed above? - Are there additional alternatives worth considering, and if so what are their pros and cons? - Are there specific experiences with other projects or products that can inform this choice? In order to discourage this from devolving into another version-numbering bike-shed discussion I'll give much greater weight to your first reply to this message than to any other, so please think and write carefully before you post. I'll also ignore replies-to-replies -- if you really want to argue with someone else about one scheme vs. another then I won't stop you, but I don't think that's a useful use of most readers' time. Finally, I'll heavily discount replies that quote more text from this message than add new text of their own, so please quote just the text that's actually needed to provide context for your reply. In a week or so I'll summarize any new information received, and then make a specific proposal. - Mark [1] https://mreinhold.org/blog/forward-faster [2] http://mail.openjdk.java.net/pipermail/discuss/2017-September/004281.html [3] http://bikeshed.org/ [4] https://wiki.haskell.org/Wadler's_Law [5] http://semver.org [6] http://openjdk.java.net/jeps/223 [7] https://mreinhold.org/blog/forward-faster#Proposal [8] https://en.wikipedia.org/wiki/1803 [9] http://mail.openjdk.java.net/pipermail/discuss/2017-September/004429.html [a] http://mail.openjdk.java.net/pipermail/discuss/2017-September/004352.html [b] https://coreos.com/releases