Summary

Enable Java compilers to use novel code generation strategies (intrinsification) in order to improve the performance of certain Java SE methods.

Motivation

In modern JVM implementations, Just-In-Time (JIT) compilers do an excellent job of optimizing bytecode at run time. A considerable amount of bytecode is "clerical" in nature -- shuffling data from the stack to the heap and back again -- and can be optimized with techniques such as box elimination and method inlining. However, there are limits to the analysis that a JIT compiler can perform in a reasonable time and space, so it might miss some opportunities for optimization. Unfortunately, the way that method invocations in source code are compiled to bytecode tends to increase the chances of a miss.

For example, consider an invocation of the method String::format (API. The first argument is a format string such as %s %d , followed by varargs of any type. A Java compiler generates bytecode that boxes primitive varargs, creates an array, initializes it, and invokes the method; the bytecode of the method's body reverses these steps to obtain values to interpolate according to the format string. Unfortunately, the method's body is too large to inline, so the JIT compiler cannot eliminate the boxing-and-unboxing of primitive varargs, nor the shuffling of varargs into an array and out again. Even more unfortunately, the format string is usually a constant expression, so without inlining it will be parsed every time the method's body runs.

String::format is important because it is a concise and reliable way to implement toString . However, some developers shy away from using it purely out of performance considerations, and instead use more verbose and error-prone mechanisms. By optimizing the invocation of String::format , the most readable and maintainable way to implement toString also becomes the most performant way.

JEP 280 replaced the translation of string concatenation with invokedynamic , resulting in faster bytecode, less allocation churn, and more uniform optimizability. We can apply the same technique to String::format (and closely related methods such as java.util.Formatter::format ) by compiling the invocation using an alternate translation strategy that customizes the bytecode for each specific invocation based on information available at compile time, such as the static types and values of the actual arguments.

Goals

Enable JDK developers to (i) tag methods as candidates for intrinsification by a Java compiler, and (ii) for those candidate methods, implement alternate translations of invocations that result in behavior which conforms to the specification of the method.

Non-Goals

It is not a goal to allow intrinsification of methods declared outside the core Java SE modules.

Description

Traditionally, a Java compiler translates a method invocation in source code to one of the bytecodes invokevirtual , invokeinterface , invokespecial , or invokestatic . This JEP allows the compiler to use an alternate translation when certain designated methods of the Java SE API are invoked. The use of an alternate translation is called intrinsification; the invocation is said to be intrinsified.

For the compiler to intrinsify a specific invocation of a given method, all of the following have to happen:

The method opts in to intrinsification at its declaration site, as part of its specification; The compiler identifies this invocation as intrinsifiable; The compiler knows of an intrinsic processor for the method; The intrinsic processor indicates an alternate translation strategy; and The compiler generates the bytecode corresponding to the indicated strategy.

Opting in to intrinsification

For a method of the Java SE API to opt in to intrinsification, it must be designated as an intrinsic candidate, via the annotation @IntrinsicCandidate . A compiler can thus recognize an invocation of such a method as intrinsifiable, and may (but is not required to) delegate the translation decision to an intrinsic processor.

The space of methods that can opt in to intrinsification is restricted, out of an abundance of concern for the broad impact of generating novel bytecode. Only a method exported by the java.base module may be designated as an intrinsic candidate, and only if it is either (i) an instance method in a final class, or (ii) a static method, so that the compiler can be sure of its behavior. Designating any other method as an intrinsic candidate will be ignored.

(It might seem that a final instance method in a non- final class is suitable, but the body of such a method may invoke non- final instance methods in the same class; those methods may be overridden at runtime, so the behavior of the final instance method is not sufficiently predictable for intrinsification. Even less predictable is the behavior of a non- final method in a non- final class, which is why java.io.PrintStream::format is not mentioned in this JEP despite its clear similarities with String::format .)

The annotation type IntrinsicCandidate is part of the Java SE API, and is meta-annotated with @Documented to flag the significance of applying the annotation.

Intrinsic processors

A Java compiler may provide a mechanism for the discovery of intrinsic processors. An intrinsic processor specifies which method or methods it is able to process; if no intrinsic processor for a given method is known to the compiler, then invocations of that method are not intrinsified. For predictability, all intrinsic processors are disabled by default, and may be enabled with the javac command-line option -XDintrinsify=all . If no alternate translation is indicated to the compiler by an intrinsic processor, or if the compiler decides to ignore such an indication, then it must generate bytecode according to JLS 15.12.3.

Generation of alternate bytecode

An intrinsic processor may indicate an alternate translation for a specific invocation of a given method, e.g., replace with invokedynamic using a given bootstrap, replace with another method call, replace with a constant load, etc. The compiler may then generate precise bytecode for that translation, rather than the traditional bytecode.

Example

Let's analyze the benefits of intrinsifying String::format to avoid the boxing overhead, varargs overhead, and the repeated analysis of constant format specifiers (the first argument). Consider the following invocation:

String name = ... int age = ... String s = String.format("%s: %d", name, age);

Traditionally, this results in boxing age to an Integer , allocating a varargs array, storing name and the boxed age into the varargs array, and then parsing and interpreting the format string -- on every invocation. The bytecode is lengthy:

0: ldc #2 // String John 2: astore_1 3: bipush 30 5: istore_2 6: ldc #3 // String %s: %d 8: iconst_2 9: anewarray #4 // class java/lang/Object 12: dup 13: iconst_0 14: aload_1 15: aastore 16: dup 17: iconst_1 18: iload_2 19: invokestatic #5 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer; 22: aastore 23: invokestatic #6 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String; 26: astore_3 27: return

When the format specifier is constant, which it almost always is, an intrinsic processor can select an alternate translation: (note that neither name nor age need to be constant variables)

String s = name + ": " + Integer.toString(age);

Given this translation, the compiler can optimize it to an invokedynamic using the mechanics of JEP 280, resulting in the following bytecode:

0: ldc #2 // String John 2: astore_1 3: bipush 30 5: istore_2 6: aload_1 7: iload_2 8: invokedynamic #3, 0 // InvokeDynamic #0:format:(Ljava/lang/String;I)Ljava/lang/String; 13: astore_3 14: return

As well as the evident simplification, this bytecode runs between 30 and 50 times faster than traditional bytecode.

Risks and Assumptions

If not properly implemented, the alternate translation may not be perfectly behaviorally compatible with the specification or original implementation.

Even if properly implemented, an alternate implementation may not properly track changes made to the original implementation in the future.

Even if properly implemented and tracked, the maintenance of intrinsic candidate methods and their alternate translations is made more difficult, since changes may need to be made in two places and must be behaviorally identical.

There is no guarantee that the performance of an alternate implementation will be superior, for every execution of every program on every machine, to the performance that would have been achieved by the original implementation.