by Ivan St. Ivanov

Last week in the Bulgarian JUG we had an event dedicated on trying out the OpenJDK Valhalla project’s achievements in the area of using primitives parameters of generics. Our colleague and dedicated blogger Mihail Stoynov already wrote about our workshop. You can find very useful links in his post about the slides that I showed, the VM that he prepared (which you can use to try Valhalla yourself) and even the recording of the meeting (which unfortunately for some of you is in Bulgarian).

In this three-part series of blogs I would like to go in some more details about the current implementation proposal and the reasoning behind the decisions. No, I am not an expert in this matter. I just try to follow the project Valhalla mailing list as well as I read Brian Goetz’s State of Specialization document. So look at this series more as explaining the generics proposals in layman’s terms.

Introduction

Java generics is one of its most widely commented topics. While the discussion whether they should be reified, i.e. the generic parameter information is not erased by javac, is arguably the hottest topic for years now, the lack of support for primitives as parameter types is something that at least causes some confusion. It leads to applying unnecessary boxing when for example you want to put an int into a List (read on to find out about the performance penalty). It also leads to adding “companion” classes in most of the generic APIs, like IntStream and LongStream for example.

One of the goals of OpenJDK’s project Valhalla is among others to research the possibility to generify over primitives in the language, the Virtual Machine and the standard libraries. Yes, it’s just research. Which means that the current proposal may or may not appear in the future versions of Java. What I am pretty sure is that it won’t make it to Java 9, which is about to be shipped next year.

The State of Generics

As already mentioned, it is not possible at the moment to define generic type or method parameterized with a primitive type. So if you want to create, let’s say, a List of integers, you will have to consider using the wrapper class:

List<Integer> intList = new ArrayList<>();

Besides looking kind of artificial, this brings also huge performance penalty. It comes from the way reference types are layed out in memory. If we take the array list above, internally it is represented as an array (of Integer’s). Which means that we will get an array with references to Integer objects scattered in the heap. Looking at a single Integer object we need memory not only for the int itself but for all other things needed by the virtual machine: pointer to the class object, some space for the garbage collector flags, others for the object monitor used by the Java synchronization infrastructure, etc. So instead of beautifully ordered ints, we potentially get something like this:

The memory overhead is not the only issue here. The modern processor architectures rely on several layers of cache. This makes going to the main memory for fetching the value of a certain variable a very expensive operation in terms of CPU cycles. That is why most of the optimizations done by the JVM tend to put as much data in the registers as possible. But the problem is that when talking about arrays (remember, ArrayList is represented as an array), the CPU instructions can only cache contiguous memory addresses. Thus if our Integers are scattered around the heap, most probably our VM will not be able to put them in the registers and we’ll have to pay the performance penalty of the cache misses.

Why not generics over primitives?

Normal question here would be why doesn’t Java support generifying over primitive types. The short answer is: because of generic type erasure by javac. Slightly longer answer follows in the next few paragraphs.

Let’s suppose that we have the following class definition:

public class Box<T> { private T value; public Box(T value) { this.value = value; } public T get() { return value; } }

The generic type T is only used by javac to ensure that correct types are boxed and then retrieved. If you decompile the product of the above class’s compilation, you will notice that the type of the value variable, the constructor parameter and the return type of the get() method are all java.lang.Object. Simply the compiler “erases” the information that you coded above and replaces it with the type that is the parent of all reference types. In Java there is no such thing as a common type of all types (both reference and primitive). Something like Any in Scala for example. That is why you cannot apply erasure to all types: with the current implementation javac doesn’t know to what they should be converted.

Why erasure at all?

Astute readers will ask the question: “But why we need this erasure anyway?” Before trying to answer it, let me first elaborate a bit on the compatibility topic.

Suppose that we have type A. And then we have a class C that uses or extends A. And class A is changed in some manner. We say that this change is source incompatible if the class C does not compile any more after this change (this is rather simplistic explanation, there are also a couple of other subtle causes of source incompatibilities, but let’s keep it simple). Some of you might remember when the enum keyword was added to the language in Java 5 – all the code that used that as identifier did not compile anymore. The same will be the fortune of all of you that use sun.misc.Unsafe BTW ;).

Next, suppose that we change somehow A and let’s say that both classes live in different jars. Class C may still compile after A’s change, however if you drop the hypothetical A.jar in the class path of our program, class C may refuse to link. This is considered as binary incompatibility. You may refer to the Java Language Specification for more information on that matter.

Going back to the generics story. If it was decided upon their introduction in Java 5 that the generic type is not going to be erased, then most likely it would break at least the binary compatibility of your classes. As generics were applied to the most widely used part of the API: the collection library, it would mean that virtually any meaningful Java program in this world would have to be at least recompiled on the day its users decided to upgrade to Java 5. The situation becomes even more complicated, because most of the libraries that we use in our program are developed and maintained by someone else. Which means that if one wanted to upgrade to Java 5, they would need to wait for all the external libraries to be recompiled with Java 5.

The bottom line is that non-erased generic type parameters would have brought a “flag day” when everybody should have recompiled and delivered a new version of their libraries. Which might be fine for smaller or more obedient language communities, but is not the case for Java.

Conclusion

So there are really compelling reasons why we are not able to generify our types and methods over primitive types. In the next installment of this series we’ll look at the current proposal in OpenJDK’s project Valhalla on how it can be implemented without breaking compatibility with older releases of Java.