We need more keywords, captain!

This document proposes a possible move that will buy us some breathing room in the perpetual problem where the keyword-management tail wags the programming-model dog. ## We need more keywords, captain! Java has a fixed set of _keywords_ (JLS 3.9) which are not allowed to be used as identifiers. This set has remained quite stable over the years (for good reason), with the exceptions of `assert` added in 1.4, `enum` added in 5, and `_` added in 9. In addition, there are also several _reserved identifiers_ (`true`, `false`, and `null`) which behave almost like keywords. Over time, as the language evolves, language designers face a challenge; the set of keywords imagined in version 1.0 are rarely suitable for expressing all the things we might ever want our language to express. We have several tools at our disposal for addressing this problem: - Eminent domain. Take words that were previously identifiers, and turn them into keywords, as we did with `assert` in 1.4. - Recycle. Repurpose an existing keyword for something that it was never really meant for (such as using `default` for annotation values or default methods). - Do without. Find a way to pick a syntax that doesn't require a new keyword, such as using `@interface` for annotations instead of `annotation` -- or don't do the feature at all. - Smoke and mirrors. Create the illusion of context-dependent keywords through various linguistic heroics (restricted keywords, reserved type names.) In any given situation, all of these options are on the table -- but most of the time, none of these options are very good. The lack of reasonable options for extending the syntax of the language threatens to become a significant impediment to language evolution. #### Why not "just" make new keywords? While it may be legal for us to declare `i` to be a keyword in a future version of Java, this would likely break every program in the world, since `i` is used so commonly as an identifier. (When the `assert` keyword was added in 1.4, it broke every testing framework.) The cost of remediating the effect of such incompatible changes varies as well; invalidating a name choice for a local variable has a local fix, but invalidating the name of a public type or an interface method might well be fatal. Additionally, the keywords we're likely to want to reclaim are often those that are popular as identifiers (e.g., `value`, `var`, `method`), making such fatal collisions more likely. In some cases, if the keyword candidate in question is sufficiently rarely used as an identifier, we might still opt to take that source-compatibility hit -- but names that are less likely to collide (e.g., `usually_but_not_always_final`) are likely not the ones we want in our language. Realistically, this is unlikely to be a well we can go to very often, and the bar must be very high. #### Why not "just" live with the keywords we have? Reusing keywords in multiple contexts has ample precedent in programming languages, including Java. (For example, we (ab)use `final` for "not mutable", "not overridable", and "not extensible".) Sometimes, using an existing keyword in a new context is natural and sensible, but usually it's not our first choice. Over time, as the range of demands we place on our keyword set expands, this may well descend into the ridiculous; no one wants to use `null final` as a way of negating finality. (While one might think such things are too ridiculous to consider, note that we received serious-seeming suggestions during JEP 325 to use `new switch` to describe a switch with different semantics. Presumably to be followed by `new new switch` in ten years.) Of course, one way to live without making new keywords is to stop evolving the language entirely. While there are some who think this is a fine idea, doing so because of the lack of available tokens would be a silly reason. We are convinced that Java has a long life ahead of it, and developers are excited about new features that enable to them to write more expressive and reliable code. #### Why not "just" make contextual keywords? At first glance, contextual keywords (and their friends, such as reserved type identifiers) may appear to be a magic wand; they let us create the illusion of adding new keywords without breaking existing programs. But the positive track record of contextual keywords hides a great deal of complexity and distortion. Each grammar position is its own story; contextual keywords that might be used as modifiers (e.g., `readonly`) have different ambiguity considerations than those that might be use in code (e.g., a `matches` expression). The process of selecting a contextual keyword is not a simple matter of adding it to the grammar; each one requires an analysis of potential current and future interactions. Similarly, each token we try to repurpose may have its own special considerations; for example, we could justify the use of `var` as a reserved type name because because the naming conventions are so broadly adhered to. Finally, the use of contextual keywords in certain syntactic positions can create additional considerations for extending the syntax later. Contextual keywords create complexity for specifications, compilers, and IDEs. With one or two special cases, we can often deal well enough, but if special cases were to become more pervasive, this would likely result in more significant maintenance costs or bug tail. While it is easy to dismiss this as “not my problem”, in reality, this is everybody’s problem. IDEs often have to guess whether a use of a contextual keyword is a keyword or identifier, and it may not have enough information to make a good guess until it’s seen more input. This results in worse user highlighting, auto-completion, and refactoring abilities — or worse. These problems quickly become everyone's problems. So, while contextual keywords are one of the tools in our toolbox, they should also be used sparingly. #### Why is this a problem? Aside from the obvious consequences of these problems (clunky syntax, complexity, bugs), there is a more insidious hidden cost -- distortion. The accidental details of keyword management pose a constant risk of distortion in language design. One could consider the choice to use `@interface` instead of `annotation` for annotations to be a distortion; having a descriptive name rather than a funky combination of punctuation and keyword would surely have made it easier for people to become familiar with annotations. In another example, the set of modifiers (`public`, `private`, `static`, `final`, etc) is not complete; there is no way to say “not final” or “not static”. This, in turn, means that we cannot create features where variables or classes are `final` by default, or members are `static` by default, because there’s no way to denote the desire to opt out of it. While there may be reasons to justify a locally suboptimal default anyway (such as global consistency), we want to make these choices deliberately, not have them made for us by the accidental details of keyword management. Choosing to leave out a feature for reasons of simplicity is fine; leaving it out because we don't have a way to denote the obvious semantics is not. It may not be obvious from the outside, but this is a constant problem in evolving the language, and an ongoing tax that we all pay, directly or indirectly. ## We need a new source of keyword candidates Every time we confront this problem, the overwhelming tendency is to punt and pick one of the bad options, because the problem only comes along every once in a while. But, with the features in the pipeline, I expect it will continue to come along with some frequency, and I’d rather get ahead of it. Given that all of these current options are problematic, and there is not even a least-problematic move that applies across all situations, my inclination is to try to expand the set of lexical forms that can be used as keywords. As a not-serious example, take the convention that we’ve used for experimental features, where we prefix provisional keywords in prototypes with two underscores, as we did with `__ByValue` in the Valhalla prototype. (We commonly do this in feature proposals and prototypes, mostly to signify “this keyword is a placeholder for a syntax decision to be made later”, but also because it permits a simple implementation that is unlikely to collide with existing code.) We could, for example, carve out the space of identifiers that begin with underscore as being reserved for keywords. Of course, this isn’t so pretty, and it also means we'd have a mix of underscore and non-underscore keywords, so it’s not a serious suggestion, as much as an example of the sort of move we are looking for. But I do have a serious suggestion: allow _hyphenated_ keywords where one or more of the terms are already keywords or reserved identifiers. Unlike restricted keywords, this creates much less trouble for parsing, as (for example) `non-null` cannot be confused for a subtraction expression, and the lexer can always tell with fixed lookahead whether `a-b` is three tokens or one. This gives us a lot more room for creating new, less-conflicting keywords. And these new keywords are likely to be good names, too, as many of the missing concepts we want to add describe their relationship to existing language constructs -- such as `non-null`. Here’s some examples where this approach might yield credible candidates. (Note: none of these are being proposed here; this is merely an illustrative list of examples of how this mechanism could form keywords that might, in some particular possible future, be useful and better than the alternatives we have now.) - `non-null` - `non-final` - `package-private` (the default accessibility for class members, currently not denotable) - `public-read` (publicly readable, privately writable) - `null-checked` - `type-static` (a concept needed in Valhalla, which is static relative to a particular specialization of a class, rather than the class itself) - `default-value` - `eventually-final` (what the `@Stable` annotation currently suggests) - `semi-final` (an alternative to `sealed`) - `exhaustive-switch` (opting into exhaustiveness checking for statement switches) - `enum-class`, `annotation-class`, `record-class` (we might have chosen these as an alternative to `enum` and `@interface`, had we had the option) - `this-class` (to describe the class literal for the current class) - `this-return` (a common request is a way to mark a setter or builder method as returning its receiver) (Again, the point is not to debate the merits of any of these specific examples; the point is merely to illustrate what we might be able to do with such a mechanism.) Having this as an option doesn't mean we can't also use the other approaches when they are suitable; it just means we have more, and likely less fraught, options with which to make better decisions. There are likely to be other lexical schemes by which new keywords can be created without impinging on existing code; this one seems credible and reasonably parsable by both machines and humans. #### "But that's ugly" Invariably, some percentage of readers will have an immediate and visceral reaction to this idea. Let's stipulate for the record that some people will find this ugly. (At least, at first. Many such reactions are possibly-transient (see what I did there?) responses to unfamiliarity.)