JEP: ? Title: Lazy Static Final Fields Author: John Rose, Remi Forax Organization: Oracle Created: 2014/04/10 Type: Experimental State: Draft Exposure: Open Component: --/-- Scope: SE Discussion: amber dash dev at openjdk dot java dot net Start: ? Effort: M Duration: M Template: 1.0

Summary

Expand the behavior of final variables to include optional lazy evaluation patterns, in language and JVM. In doing so, extend Java's pre-existing lazy evaluation mechanisms to per-variable granularity, from its current per-class granularity.

Motivation

Java uses lazy evaluation pervasively. Almost every linkage operation potentially triggers a lazy evaluation, such as the execution of a <clinit> method (class initializer bytecode) or invocation of a bootstrap method (for an invokedynamic call site or CONSTANT_Dynamic constant).

Class initializers are coarse-grained compared to mechanisms using bootstrap methods, because their contract is to run all initialization code for a whole class, rather than some initialization that may pertain to a particular field of that class. Such coarse-grained initialization effects make it especially difficult to predict and isolate the side effects of using one static field from the class, since computing the value of one field entails computation of all static fields in the same class.

So touching one field touches them all. In AOT compilers, this makes it difficult to optimize a static field reference, even if the field has a clearly analyzable constant value. It only takes one extra-complicated static field in a class to make all fields non-optimizable. A similar problem appears with proposed mechanisms for constant-folding (at javac time) constant fields with complex initializers.

As an example of an extra-complicated static field initialization, which in some codebases appears in almost every file, consider logger initialization:

private final static Logger LOGGER = Logger.getLogger("com.foo.Bar");

This harmless-looking initialization triggers a tremendous amount of behind-the-scenes activity at class initialization time – though it is unlikely that the logger is needed at class initialization time, or even at all. Deferring the creation to first use would streamline initialization, and might result in optimizing away the initialization entirely.

Final variables are very useful; they are the main mechanism for Java APIs to denote constant values. Lazy variables are also well-proven. Since Java 7 they have been an increasingly important part of JDK internals, expressed via the internal @Stable annotation. The JIT can optimize both final and "stable" variables more fully than other variables. Adding lazy finals will these useful design patterns usable in more places. Finally, their adoption will allow libraries such as the JDK to downsize their reliance on <clinit> code, with likely improvement to startup and AOT optimizations.

Description

A field may be declared with a new modifier lazy , a contextual keyword recognized only as a modifier. Such a field is called a lazy field, and must also be static and final.

A lazy field must be supplied with an initializer. The compiler and runtime arrange to execute the initializer on the first use of the variable, not when the containing scope (the class) is initialized.

Each lazy static final field is associated at compile time with a constant pool entry which supplies its value. Since constant pool entries are themselves lazily computed, this is sufficient to assign a well-defined value to any static lazy final variable associated with the constant pool entry. (More than one lazy variable can be associated with a single entry, although this is not envisioned as a useful feature.) The name of the attribute is LazyValue , and it must refer to a constant pool entry that can be ldc -ed to a value that can be converted to the type of the lazy field. The allowed conversions are the same as those used by MethodHandle.invoke .

Thus, a lazy static field may be viewed as a named alias of a constant pool entry within the class that defined the field. Tools such as compilers may exploit this property.

A lazy field is never a constant variable (in the sense of JLS 4.12.4) and is explicitly excluded from contributing to a constant expression (in the sense of JLS 15.28). Thus, it never possesses a ConstantValue attribute, even if its initializer is a constant expression. Instead, a lazy field possesses a new kind of classfile attribute called LazyValue , which the JVM consults when linking a reference to that particular field. The format of this new attribute is similar to the old one, because it also points to a constant pool entry, in this case the one which resolves the field value.

When linking a lazy static field, the normal process of executing class initializers is not bypassed. Instead, any <clinit> method on the declaring class is initialized according to the rules of JVMS 5.5. In other words, a getstatic bytecode of a lazy static field performs any linkage actions associated with any static field. After initialization (or during an already-started initialization in the current thread), the JVM then resolves the constant pool entry associated with the field, and stores the value of that constant pool entry into that field.

Since lazy static final fields cannot be blank finals, they cannot be assigned to, even in those limited contexts where blank finals may be assigned to.

There is a rule in Java which requires that a static variable may only appear in the initializers of static variables which occur later on in the class body. This rule reduces (but does not eliminate) the possibility that an untimely read of a static variable may obtain the default value of that varaible, rather than its initial value.

class C { static int x = y; //error: illegal forward reference static int y = 42; }

These ordering constraints are observed even for lazy static fields, as if they were not declared lazy. Thus, a lazy static field's initializer can only refer to a static field of the same class that occurs earlier in the same source file.

If in some case two lazy values must depend on each other in a circular relationship, the cycle can be hidden by the use of a private static method. In that case, a true cyclic dependency will cause a stack overflow error. In the case of non-lazy statics, an analogous cycle would cause a default value to become visible.

class C { //lazy static final Object x = y, y = x; //error lazy static final Object x = ycycle(), y = x; private static Object ycycle() { return y; } }

Any non-lazy static field initializer or class initializer block may also refer to a lazy static field value that precedes in the the source file. This is usually not desirable, as it would tend to cancel the benefit of the lazy field, but may be useful in combination with conditional expressions or control flow.

The purpose of the ordering rule is to require the user to specify a nominal initialization order for lazy statics. The actual dynamic initialization order may differ, but the nominal order serves to demonstrate statically that there are no unintentional cyclic dependencies between the statics, lazy and otherwise.

Lazy fields may be recognized by the core reflection API by use of two new API points on java.lang.reflect.Field . The new query method isLazy returns true if and only if the field was declared lazy. The new query method isAssigned returns false if and only if the field is lazy and has not been initialized, at the moment the method is called. (It may return true on the very next call in the same thread, depending on race conditions.) Other than isAssigned , there is no way to observe whether a lazy field has been initialized yet.

(The isAssigned reflective call is provided only to assist with occasional problems with circular initialization dependencies. Perhaps we can get away without implementing it, although people who code with lazy variables occasionally want to ask gently whether a lazy variable is set yet, in the same way that users of mutexes occasionally want to ask whether a mutex is locked, but without actually seizing the lock.)

To preserve implementation freedom, the contract of isAssigned is minimized. If a JVM can prove that a lazy static variable can be initialized without observable side effects, it may do so at any time; in such a case the isAssigned query will report true even before any getfield is executed. The minimized contract for isAssigned is that if it returns false , none of the side effects from initializing that variable have yet been observed by the current thread, whereas if it returns true , then the current thread can, in the future, observe all side effects of initialization. This contract allows compilers to substitute ldc for getstatic of their own fields, and allows JVMs to avoid tracking detailed initialization states of finals with shared or degenerate constant pool entries.

Multiple threads may race to initialize a lazy final. As is already the case with CONSTANT_Dynamic constant pool entries, the JVM picks an arbitrary winner of such a race and provides the value from that winner to all racing threads, as well as recording it for all future accesses. Thus, JVM implementations may elect to use CAS operations, if the platform supports those, to resolve races.

When the JVM stores a value into a lazy final field, it performs a freeze operation. This freeze happens before any getstatic instruction is allowed to see the field value. This is how pre-existing rules for safe publication apply to lazy finals.

The effect of a lazy final is closely similar to the effect of a static final defined on its own class, with no other static finals.

class C { lazy static final Object x = xval(), y = yval(); } f() { ... getstatic C.x ... } => class C_x { static final Object x = xval(); } class C_y { static final Object y = yval(); } f() { ... getstatic C_x.x ... }

The difference is that a true cyclic dependency between lazy statics will cause a stack overflow, rather than the observation of a default value.

Note that a class can convert a static to a lazy static without breaking binary compatibility. A client's getstatic instruction is identical in both cases. When the variable's declaration changes to lazy, then the getstatic instruction links differently.

Alternatives

Use nested classes as holders for single lazy variables.

Define some sort of library API for managing lazy values or (more generally) monotonic data.

Refactor would-be lazy static variables as nullary static methods and populate their bodies with ldc of CONSTANT_Dynamic constants, by some means.

Use non-final variables for publication of lazily evaluated data, being careful not to modify them, and to fence their initialization for safe publication.

(N.B. The above workarounds do not provide a binary-compatible way to evolve existing static constants away from their current reliance on <clinit> .)

In the direction of adding more functionality, we could allow lazy fields to be non-static and/or non-final, preserving current correspondences and analogies between static and non-static field behaviors. The constant pool cannot be a backing store for non-static fields, but it can still contribute bootstrap methods (that depend on the current instance). Frozen arrays (if implemented) could be given lazy variations, perhaps. Such investigations seem plausible as a follow-on projects for the current proposal.