Key Takeaways Project Valhalla is developing inline classes to improve affinity of Java programs to modern hardware

Inline classes enable developers to write types that behave more like Java's inbuilt primitive types

Instances of inline classes do not have object identity, which opens up a number of optimization opportunities

The arrival of inline classes reopens the debate around Java's generics and type erasure

Although promising, this is still a work in progress and not production ready yet

In this article, I'll introduce inline classes. This feature is the evolution of what were previously referred to as "value types." The exploration and research of this feature is still ongoing and is a major work stream within Project Valhalla, which has already been covered by InfoQ and in Oracle's Java magazine.

Why Inline Classes?

The goal of inline classes is to improve the affinity of Java programs to modern hardware. This is to be achieved by revisiting a very fundamental part of the Java platform — the model of Java's data values.

From the very first versions of Java until the present day, Java has had only two types of values: primitive types and object references. This model is extremely simple and easy for developers to understand, but can have performance trade-offs. For example, dealing with arrays of objects involves unavoidable indirections and this can result in processor cache misses.

Many programmers who care about performance would like the ability to work with data that utilizes memory more efficiently. Better layout means fewer indirections, which means fewer cache misses and higher performance.

Another major area of interest is the idea of removing the overhead of needing a full object header for each data composite — flattening the data.

As it stands, each object in Java's heap has a metadata header as well as the actual field content. In Hotspot, this header is essentially two machine words — mark and klass. First the mark word, which contains metadata that is specific to this specific object instance.

The second word of metadata is known as the klass word, which is a pointer to metadata (stored in the Metaspace area of memory) that is shared with all other instances of the same class. This klass pointer is crucial to understanding how the runtime implements certain language features, such as virtual method lookup.

However, for this discussion of inline classes, the data held in the mark word is especially important, as it is inherently tied to the concept of identity of Java objects.

Inline Classes and Object Identity

Recall that in Java, two object instances are not considered equal just because they have the same values for all their fields. Java uses the == operator to determine whether two references are pointing at the same memory location, and objects are not considered identical if they are stored separately in memory.

NOTE: This notion of identity is linked with the ability to lock a Java object. In fact the mark word is used to store the object monitor (among other things).

For inline classes, however, we want the composites to have semantics that are essentially those of primitive types. In that case the only thing that matters for equality is the bit pattern of the data, not where in memory that pattern appears.

Therefore, by removing the object header, we also remove the composite's unique identity. This change frees the runtime to make significant optimizations in layout, calling convention, compilation, and allocation.

NOTE: The removal also has other implications for the design of inline classes. For example they cannot be synchronized upon (because they have neither a unique identity nor anywhere to store the monitor).

It is important to realize that Valhalla is a project that goes all the way down through the language and VM and eventually reaches the metal. This means that it might look just like one new construct ( inline class ) to the programmer, but there are so many layers that the feature depends upon.

NOTE: Inline classes are not the same as the forthcoming records feature. A Java record is just a regular class that is declared with reduced boilerplate and has some standardized, compiler generated methods. Inline classes, on the other hand, are a fundamentally new concept within the JVM, and change Java's model of memory in fundamental ways.

The current prototype of inline classes (referred to as LW2) is functional, but it is still at a very, very early stage. Its target audience is advanced developers, library authors, and toolmakers.

Working with the LW2 Prototype

Let's dive into some examples of what can be done with inline classes in their current state in LW2. I will be able to show the effects of inline classes using low-level techniques (such as bytecode and heap histograms). Future prototypes will add more user-visible and higher-level aspects, but they haven't been completed yet, so I will have to stick to the low-level.

To obtain a build of OpenJDK that supports LW2, the easiest option is to download it from here — Linux, Windows and Mac builds are available. Alternatively, experienced open-source developers can build their own binary from scratch.

Once the prototype is downloaded and installed, we can develop some inline classes using it.

To make an inline class in LW2, a class declaration is tagged with the inline keyword.

The rules for inline classes (for now — some of these may be relaxed or changed in future prototypes) are:

Interfaces, annotation types, enums cannot be inline classes

Top level, inner, nested, local classes may be inline classes

Inline classes are not nullable and instead have a default value

Inline classes may declare inner, nested, local types

Inline classes are implicitly final so cannot be abstract

so cannot be Inline classes implicitly extend java.lang.Object (like enums, annotations, and interfaces)

(like enums, annotations, and interfaces) Inline classes may explicitly implement regular interfaces

All instance fields of an inline class are implicitly final

Inline classes may not declare instance fields of their own type

javac automatically generates hashCode(), equals(), and toString()

automatically generates javac does not allow clone(), finalize(), wait(), or notify() on inline classes

Let's look at our first example of an inline class, and see what an implementation of a type like Optional would look like as an inline class. To reduce indirection and for clarity of demonstration, we are going to write a version of an optional type that holds a primitive value, similar to the type java.util.OptionalInt in the standard JDK class library:

public inline class OptionalInt { private boolean isPresent; private int v; private OptionalInt(int val) { v = val; isPresent = true; } public static OptionalInt empty() { // New semantics for inline classes return OptionalInt.default; } public static OptionalInt of(int val) { return new OptionalInt(val); } public int getAsInt() { if (!isPresent) throw new NoSuchElementException("No value present"); return v; } public boolean isPresent() { return isPresent; } public void ifPresent(IntConsumer consumer) { if (isPresent) consumer.accept(v); } public int orElse(int other) { return isPresent ? v : other; } @Override public String toString() { return isPresent ? String.format("OptionalInt[%s]", v) : "OptionalInt.empty"; } }

This should compile using the current LW2 version of javac. To see the effects of the new inline classes technology, we need to look at bytecode, using the javap tool that can be invoked like this:

$ javap -c -p infoq/OptionalInt.class

Disassembling our OptionalInt type, we see some interesting aspects of the inline class in the bytecode:

public final value class infoq.OptionalInt { private final boolean isPresent; private final int v;

The class has a new modifier value that is left over from an earlier prototype where the feature was still called value types. The class and all instance fields have been made final even though that wasn't specified in the source code. Next, let's look at the object construction methods:

public static infoq.OptionalInt empty(); Code: 0: defaultvalue #1 // class infoq/OptionalInt 3: areturn public static infoq.OptionalInt of(int); Code: 0: iload_0 1: invokestatic #11 // Method "<init>":(I)Qinfoq/OptionalInt; 4: areturn private static infoq.OptionalInt infoq.OptionalInt(int); Code: 0: defaultvalue #1 // class infoq/OptionalInt 3: astore_1 4: iload_0 5: aload_1 6: swap 7: withfield #3 // Field v:I 10: astore_1 11: iconst_1 12: aload_1 13: swap 14: withfield #7 // Field isPresent:Z 17: astore_1 18: aload_1 19: areturn

For a regular class, we would expect to see a compiled construction sequence like this simple factory method:

// Regular object class public static infoq.OptionalInt of(int); Code: 0: new #5 // class infoq/OptionalInt 3: dup 4: iload_0 5: invokespecial #6 // Method "<init>":(I)V 8: areturn

The difference in the two bytecode sequences is clear — inline classes do not use the new opcode. Instead, we encounter two brand new bytecodes that are specific to inline classes — defaultvalue and withfield .

defaultvalue is used to create new value instances

is used to create new value instances withfield is used instead of setfield

NOTE: One of the consequences of this design is that the result of defaultvalue must, for every inline class, be a consistent and usable value of the type.

It's worth noticing that the semantics of withfield is to replace the value instance on top of stack with a modified value with an updated field. This is slightly different from setfield (which consumes the object reference on the stack) because inline classes are always immutable and are not necessarily always represented as references.

To complete our first look at the bytecode, we notice that, among the other methods of the class are auto-generated implementations of hashCode() and equals() that use invokedynamic as a mechanism.

public final int hashCode(); Code: 0: aload_0 1: invokedynamic #46, 0 // InvokeDynamic #0:hashCode:(Qinfoq/OptionalInt;)I 6: ireturn public final boolean equals(java.lang.Object); Code: 0: aload_0 1: aload_1 2: invokedynamic #50, 0 // InvokeDynamic #0:equals:(Qinfoq/OptionalInt;Ljava/lang/Object;)Z 7: ireturn

In our case, we have explicitly provided an override of toString(), but this method would also usually be auto-generated for inline classes.

public java.lang.String toString(); Code: 0: aload_0 1: getfield #7 // Field isPresent:Z 4: ifeq 29 7: ldc #28 // String OptionalInt[%s] 9: iconst_1 10: anewarray #30 // class java/lang/Object 13: dup 14: iconst_0 15: aload_0 16: getfield #3 // Field v:I 19: invokestatic #32 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer; 22: aastore 23: invokestatic #38 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String; 26: goto 31 29: ldc #44 // String OptionalInt.empty 31: areturn

To drive our inline class, let's look at a small driver program contained in Main.java:

public static void main(String[] args) { int MAX = 100_000_000; OptionalInt[] opts = new OptionalInt[MAX]; for (int i=0; i < MAX; i++) { opts[i] = OptionalInt.of(i); opts[++i] = OptionalInt.empty(); } long total = 0; for (int i=0; i < MAX; i++) { OptionalInt oi = opts[i]; total += oi.orElse(0); } try { Thread.sleep(60_000); } catch (Exception e) { e.printStackTrace(); } System.out.println("Total: "+ total); }

The bytecode for Main is not shown as it contains no surprises. In fact, it is the same (apart from package names) as the code that would be generated if Main used java.util.OptionalInt instead of our inline class version.

This is, of course, part of the point — to make inline classes minimally intrusive to mainstream Java programmers and provide their benefits without too much cognitive overhead.

Heap Behaviour for inline classes

Having noted the features of the compiled value class's bytecode, we can now execute Main and take a quick look at runtime behavior, starting with the contents of the heap.

$ java infoq.Main

Note that the thread delay at the end of the program is only there to allow us to have time to produce a heap histogram from the process.

We do this by running another tool in a separate window: jmap -histo:live <pid> , which produces results like this:

num #instances #bytes class name (module) ------------------------------------------------------- 1: 1 800000016 [Qinfoq.OptionalInt; 2: 1687 97048 [B (java.base@14-internal) 3: 543 70448 java.lang.Class (java.base@14-internal) 4: 1619 51808 java.util.HashMap$Node (java.base@14-internal) 5: 452 44600 [Ljava.lang.Object; (java.base@14-internal) 6: 1603 38472 java.lang.String (java.base@14-internal) 7: 9 33632 [C (java.base@14-internal)

This shows that we have allocated one single array of infoq.OptionalInt values, and that it occupies roughly 800M (100 million elements each of size 8).

As expected, there are no standalone instances of our inline class.

NOTE: Readers who are familiar with the internal syntax for Java type descriptors may note the appearance of a new, Q-type descriptor to denote a value of an inline class.

To have something to compare this to, let's recompile Main using the version of OptionalInt from java.util instead of our inline class version. Now the histogram looks completely different (output from Java 8):

num #instances #bytes class name (module) ------------------------------------------------------- 1: 50000001 1200000024 java.util.OptionalInt 2: 1 400000016 [Ljava.util.OptionalInt; 3: 1719 98600 [B 4: 540 65400 java.lang.Class 5: 1634 52288 java.util.HashMap$Node 6: 446 42840 [Ljava.lang.Object; 7: 1636 39264 java.lang.String

We now have a single array comprising 100 million elements of size 4 — which are references to the object type java.util.OptionalInt . We also have 50 million instances of OptionalInt, plus one for the empty value instance, giving a total memory utilization for the non-inline class case of around 1.6G.

This means that the use of inline classes reduces memory overhead by about 50%, in this extreme case. This is a good example of what is meant by the phrase "codes like a class, works like an int."

Benchmarking with JMH

Let's also take a look at a simple JMH benchmark. This is intended to allow us to see the effect of removing the indirections and cache misses, in terms of reduced program run time.

Details of how to set up and run a JMH benchmark can be found on the OpenJDK site.

Our benchmark will directly compare our inline implementation of OptionalInt with the version found in the JDK.

import org.openjdk.jmh.annotations.*; import java.util.concurrent.TimeUnit; @State(Scope.Thread) @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(TimeUnit.SECONDS) public class MyBenchmark { @Benchmark public long timeInlineOptionalInt() { int MAX = 100_000_000; infoq.OptionalInt[] opts = new infoq.OptionalInt[MAX]; for (int i=0; i < MAX; i++) { opts[i] = infoq.OptionalInt.of(i); opts[++i] = infoq.OptionalInt.empty(); } long total = 0; for (int i=0; i < MAX; i++) { infoq.OptionalInt oi = opts[i]; total += oi.orElse(0); } return total; } @Benchmark public long timeJavaUtilOptionalInt() { int MAX = 100_000_000; java.util.OptionalInt[] opts = new java.util.OptionalInt[MAX]; for (int i=0; i < MAX; i++) { opts[i] = java.util.OptionalInt.of(i); opts[++i] = java.util.OptionalInt.empty(); } long total = 0; for (int i=0; i < MAX; i++) { java.util.OptionalInt oi = opts[i]; total += oi.orElse(0); } return total; } }

Performing a single run on a modern, high-spec MacBook Pro gave this result:

Benchmark Mode Cnt Score Error Units MyBenchmark.timeInlineOptionalInt thrpt 25 5.155 ± 0.057 ops/s MyBenchmark.timeJavaUtilOptionalInt thrpt 25 0.589 ± 0.029 ops/s

This shows that inline classes are much, much faster in this specific case. However, it is important not to read too much into this example — it is merely for demonstration purposes.

As the JMH framework itself warns: "Do not assume the numbers tell you what you want them to tell."

For example, in this case the infoq.OptionalInt version of the benchmark allocates roughly 50% — is it this reduction in allocation that accounts for the performance speedup? Or are there other performance effects as well? This benchmark, in isolation, does not tell us — it is simply a single data point.

This rough benchmark should not be taken seriously or used as anything other than an indication that inline classes have the potential to show significant speedups under some carefully chosen circumstances.

For example, in the LW2 prototype, only interpreted mode and the C2 (server) JIT compiler are supported. There is no C1 (client) compiler, no tiered compilation, and no Graal. In addition, the interpreter is not optimized, as the focus has been on the JIT implementation. All of these features would be expected to be present in a shipping version of Java, and in their absence all performance numbers are completely unreliable.

In fact, it's not just performance where so much work still remains to be done, compared to the current LW2 preview. Fundamental questions still remain, such as:

How to extend generics to allow abstraction over all types, including primitives, values, and even void ?

? What should the true inheritance hierarchy look like for inline classes?

What to do about type erasure and backwards compatibility?

How to enable existing libraries (especially the JDK) to compatibly evolve to fully take advantage of inline classes?

How many of the current LW2 constraints can, or should, be relaxed?

While most of these are still open questions, one area where LW2 has tried to provide answers is by prototyping a mechanism for inline classes to be used as the type parameter (the "payload") in a generic type.

Inline classes as type parameters

In the current LW2 prototype we must overcome a problem, as Java's model of generics implicitly assumes nullability of values, and inline classes are not nullable.

To solve this, LW2 uses a technique called indirect projection. This is like a form of autoboxing for inline classes, and allows us to write a type Foo ? for any inline type Foo .

The end result is that the indirect projection type can be used as the parameter in a generic type (whereas the real inline type cannot) like this:

public static void main(String[] args) { List<OptionalInt?> opts = new ArrayList<>(); for (int i=0; i < 5; i++) { opts.add(OptionalInt.of(i)); opts.add(OptionalInt.empty()); opts.add(null); } int total = opts.stream() .mapToInt(o -> { if (o == null) return 0; OptionalInt op = (OptionalInt)o; return op.orElse(0); }) .reduce(0, (x, y) -> x + y); System.out.println("Total: "+ total); }

Instances of the inline class can always be cast to an instance of the indirect projection, but to go the other way, a null check is required, as seen in the body of the lambda in the example.

NOTE: The use of indirect projections is still highly experimental. The final version of inline classes may well use a different design altogether.

There is still a huge amount of work to be done before inline classes are ready to become a real feature in the Java language. Prototypes like LW2 are fun for the interested developer to experiment with, but it should always be remembered that these are just an intellectual exercise. Nothing in the current builds guarantees anything about the final form that the feature may eventually take.

About the Author

Ben Evans is a co-founder of jClarity, a JVM performance optimization company. He is an organizer for the LJC (London's JUG) and a member of the JCP Executive Committee, helping define standards for the Java ecosystem. Ben is a Java Champion; 3-time JavaOne Rockstar Speaker; author of "The Well-Grounded Java Developer", the new edition of "Java in a Nutshell" and "Optimizing Java" He is a regular speaker on the Java platform, performance, architecture, concurrency, startups and related topics. Ben is sometimes available for speaking, teaching, writing and consultancy engagements - please contact for details.