Towards a plan for L10 and L20

I’ve been pulling together a rough draft of high-level requirements for the L10 and L20 milestones. These are mostly through the lens of the programming model, but they indirectly affect VM requirements (usually in obvious ways.) (Note to observers: this is mostly capturing the active plan, rather than the rationale and justification for these decisions. These will follow in a separate document.) # Towards requirements for L10 and L20 milestones The last year has been a tremendous one for Project Valhalla; the L-World prototype has proven more successful than we could have hoped, and building on the success of L-World, the end-to-end plan for specialized generics with gradual migration compatibility has started to come into full view. It's time to put a stake in the ground and talk about some deliverables. As L-World started to look like it was going to succeed, we identified three buckets, into which features could be sorted, as a structuring mechanism: - L1 -- First usable prototype of L-World; now delivered - L10 -- First preview-able milestone; this would include the ability to declare value classes which can be flattened into arrays and objects, and instantiate _erased_ generics over values - L100 -- Specializable generics over values, and migrating key generic classes (e.g., Collections and Streams) These constituted not so much a plan, as a recognition that there were several significant phases ahead of us. Let's put some meat on these phases, and carve things a little finer. ## L10 -- First previewable delivery Nontrivial language and JVM features typically go through a round of [_preview_][jep12], in which they are part of the specification and implementation but behind a flag and with the possibility of refinement before they become a permanent part of the platform. For a feature as significant as value types, we will surely want to Preview it before finalization. A preview feature should be complete; in the context of Valhalla, this means that a complete and self-consistent programming model is needed; features should not look like they are "bolted on". The main feature of L10 is the ability to declare and use value classes; we can think of the remaining features as the "closure" of adding this feature to Java, which is to say, all the other things we have to add in to arrive at a sensible, self-consistent, understandable programming model. First and foremost, a value class is a class, and is declared like one: ```{.java} value class Point { int x; int y; Point(int x, int y) { this.x = x; this.y = y; } } ``` (A modifier is not the only option; this could be indicated by a choice of supertype as well; syntax is all TBD.) Value types are non-nullable; each declared value class `V` gives rise to two types: `V`, and its nullable counterpart, `V?`. The value set of `V?` is the union of the value set of `V`, with the singleton set `{ null }`. For a value class `V`, we get the following subtype relationships: V <: V? <: ValObject <: Object In addition, if value class `V` implements interface `I`, then `V <: I`. Value classes have some restrictions, which are enforced by both the compiler and runtime: - They cannot extend any other class - They are implicitly final - Their fields are implicitly final - Their instances are are non-nullable - They cannot be synchronized on (as well as other identity-sensitive operations, such as Object.wait()) The type `V` is translated as `QV;`; the type `V?` is translated as `LV;`. #### Object model We add two new "top" types to the object model; `java.lang.RefObject` and `java.lang.ValObject` (names to be bikeshod.) All "regular" classes ("identity classes") are subtypes of `RefObject`; all value types are subtypes of `ValObject`. (Interfaces can not be subtypes of either `RefObject` or `ValObject`, but interfaces can be implemented by both value classes and identity classes. It is often helpful to think of `Object` as being an "honorary" interface.) Whether `RefObject` is a class or interface is also currently an open issue. While it is disruptive to retrofit new top types into the hierarchy, there are several good reasons for doing so. The first is that the object hierarchy is a powerful pedagogical tool -- not only is "everything an object", but "everything is an `Object`." Users learning the language, upon learning that there are identity objects and value objects, will see this division prominently reflected in the root types of the object hierarchy. Another reason is that there will likely be behavior which is common to all identity objects, or to all value objects, and these types represent a sensible place to declare that behavior, using tools (static methods, final methods, etc) that users already understand. The final (and perhaps most important) reason is that it lets us talk about this important distinction in the language. Having these as types means users can dynamically test whether something is a value object or identity object when they need to: ``` if (x instanceof RefObject) { ... } if (y instanceof ValObject) { ... } ``` Methods that rely on identity can declare this in their signature: ``` void m(RefObject o) { ... } ``` And generic classes that only make sense to be instantiated with reference types can do so in the standard way: ``` class Foo<T extends RefObject> { ... } ``` For all the same reasons, we will want to reflect nullability in the type system (such as with an interface type `Nullable`, which would be implemented by all identity types, plus nullable value types.) Unlike primitives, and unlike earlier Vahalla designs, there _are no box types_. `V?` is not a box for `V`; boxes serve to connect non-Object values to `Object`, but values _already_ are `Object`s. #### Intrinsic operations Being objects, value types inherit all the members of `Object`, and we must provide sensible default behaviors for them. For identity objects, the default behavior for `equals()`, `hashCode()`, and `toString()` are identity-based (identity equality, identity hash code, and the name of the class appended with the identity hash code); for value objects, they should be state-based. We define a relation over all values (identity classes, value classes, and primitives), called _substitutability_, as follows: - Two identity instances are substitutable if they refer to the same object. - Two primitives are substitutable if they are `==` (modulo special pleading for `NaN`, as per`Float::equals` and `Double::equals`). - Two value instances `a` and `b` are substitutable if they are of the same type, and for each of the fields `f` of that type, `a.f` and `b.f` are substitutable. We then say that for any two values, `a == b` iff a and b are substitutable. The default implementation of `Object::equals` for value classes implements `a == b`, as it does for identity classes. Similarly, we define a total _substitutability hash code_ function, as follows: - For an identity instance, it is the value of `System::identityHashCode`; - For a primitive, it is the value of the `hashCode` method of the corresponding wrapper type; - For a value, it is constructed deterministically from the substitutability hash codes of the value's fields. The method `System::identityHashCode` should return the substitutability hash code for value arguments; as for reference classes, the default `Object::hashCode` for value classes also returns the substitutability hash code. (If we were starting clean, we might prefer separate API points for identity and substitutability, and then a merged API point.) Certain operations that are nominally allowable on all objects are forbidden (and result in runtime exceptions): synchronization, and `Object::wait` and friends. It is an open question what we should do for weak references to values that contain references to identity objects; perhaps weak references are restricted to `RefObject`. Values are instantiated with instance creation expressions: `new V(...)` (though such expressions are not necessarily translated in the same way as for identity classes.) Value classes have constructors, and these constructors are written like constructors for reference types. Because all fields are final, they must initialize all the fields of the class. #### Mirrors and reflection For each value class `V`, there are two reflection mirrors: a standard mirror (for `V`), and a nullable mirror (for `V?`). The latter is used for reflection over members who use `V?` in their signature; the method `Object::getClass` returns the standard mirror for all instances of `V`. Similarly, mirrors for `V[].class` and `V?[].class` are needed. A value class name `V` can appear on the RHS of `instanceof`; both `V` and `V?` can be used as cast targets. We will likely want a reflective method `Class::isValueClass`. Fields, methods, and constructors for value classes can be reflected using existing abstractions. #### Arrays Arrays are covariant; if `T <: U`, then `T[] <: U[]`. The subtyping relations above therefore give rise to their array counterparts: V[] <: V?[] <: ValObject[] <: Object[] The array type `V[]` is translated as `[QV;`; the array type `V?[]` is translated as `[LV;`. #### Nullable types Nullable value types (`V?`) are nullable in the same sense that all reference types are -- `null` is a member of their value set, but the dereference operators can throw NPE when applied to a null operand. (We realize there is likely to be a highly-vocal constituency who are really hoping that null-freedom would be enforced by the compiler (and therefore that we'd introduce null-safe operators such as `?.`) Our decision here is not out of ignorance that this is potentially desirable, nor out of ignorance that doing it this way makes it even harder to achieve the null-safe nirvana that such users long for.) #### Serialization Value classes are classes; to not be able to opt into serialization like other classes would be a significant irregularity. However, many of the serialization mechanisms (like `readObject`) depend on mutatation; additional mechanisms for safely serializing value classes may be needed. This is an open issue. #### Values and generics in L10 With respect to generics, there are two undesirable fates that L10 must steer clear of. We know that specialized generics are coming, and L10 embodies a deliberate choice to ship values before we ship specializable generics. One wrong move would be to simply ban the use of generics over values; this would be a huge loss for reuse, as there are so many useful, well-tested, well-understood generic libraries out there. The other wrong move would be to interpret `Foo<V>` as an erased instantiation of `Foo`. For erased type parameters, `null` is always considered to be a member of the value set, which means that we might get unexpected NPEs when generic code puts a `null` where it is within its rights to do so. (In the worst case, methods that use `null` as a sentinel, like `Map::get`, become unusable.) What we will do is allow the instantiation of erased generics with _nullable_ values; we can say `Foo<V?>`, but not `Foo<V>` -- just as we can say `Foo<Integer>` but not `Foo<int>` today. (This leaves us free to assign a meaning to `Foo<V>` later for specializable generics.) So users can declare value classes, and generify over their nullable counterparts, with erasure, and later, they'll be able to generify over the value itself, with specialization. One can think of this as all type variables of erased generic classes (which is all generic classes, today) as having an implicit bound `T extends Nullable`. For any type `T`, the expression `T.default` evaluates to the default value for type `T` -- the value initially held by fields or array elements. For identity classes, this is `null`; for value classes, this is one where all fields have their default (zero) value. The locution `T.default` can be used both for concrete types `T` and for type variables (for type variables in erased generic classes, this is equivalent to `null`.) ## L20 -- Migration support for value-based classes The next sensible milestone after L10 adds one new feature: the ability to migrate existing [value-based classes][vbc] to value types. While theoretically we could merge this into L10, the reason to separate them is that L10 is useful to a variety of use cases (numeric-intensive code, machine learning, optimized data structures) who have no immediate need for migration, and we don't want to delay the delivery of L10 (and the critical feedback that will come with broad distribution) for the sake of optimizing some JDK classes. Value instances, like all other values, are initialized with an all-zero value (null, zero, false, etc.) However, for some value classes, the all-zero value is not a natural member of the domain, and asking class implementations to deal with it is likely to be a sharp edge. For these types -- and also for value types migrated from value-based classes -- we introduce a new mechanism: _null-default value classes_. This is a value class whose default all-zero bit value is interpreted as `null`, rather than one whose fields all hold their default value. (The opposite of null-default is _zero-default_.) A null-default value class is declared with the `null-default` modifier, and is a nullable type (implicitly implements `Nullable`): ```{.java} null-default value class Person { String first; String last; } ``` A `null-default` value class _is implicitly zero-hostile_; if the state on exit from the constructor has zeros for all fields, an exception is thrown. Classes that are intended to be compatibly migrated from value-based classes to value classes must be declared `null-default` (and therefore their implementations must conform to the zero-hostility requirements). Inner value classes are implicitly `null-default`. For a null-default type, `T.default` evaluates to `null`. For a null-default value type `T`, `T?` denotes the same type as `T`. #### Translation To ease migration compatibility, we adopt a hybrid translation strategy for null-default value classes. When a null-default value class appears in a _method descriptor_, we translate it with an `L` descriptor; only when it appears in a _field descriptor_ do we use the more precise `Q` descriptor. This is a trade-off; using slightly looser types in method descriptors may give up some calling-convention optimizations, but allows us to compatibly migrate classes like `LocalDateTime`. (This strategy of "loose types on the stack, sharp types on the heap" will show up again later when we get to migration of erased generics to specialized.) #### Field linkage Because we use the sharper types in field descriptors, it is possible that existing code will have `Constant_FieldRef_info` for a migrated field that refers to the field by `L` descriptor, but the descriptor in the target class has been migrated to `Q`. Link resolution of field bytecodes is adjusted to paper over this potential mismatch. #### Null-default value types and erased generics Null-default value types are nullable, and so can be used to instantiate erased generic classes (unlike zero-default value classes, which require an explicit indication of nullability at the use site). For a migrated type `T`, there will likely be existing code that uses generics such as `List<T>`; when these types are migrated to null-default value classes, these locutions continue to be valid (and continue to mean the same thing -- erased instantiation of `List` with `T`). #### Library support As part of this milestone, we should expect to migrate classes such as `Optional`, `LocalDateTime`, and other suitable value-based classes.