158 Polymorphism Performance Mysteries Explained

Author: Dr. Heinz M. Kabutz Date: 2008-04-01 Java Version: 6 Category: Performance

Abstract: In this newsletter, we reveal some of the polymorphism mysteries in the JDK. The HotSpot Server Compiler can distinguish between mono-morphism, bi-morphism and poly-morphism. The bi-morphism is a special case which executes faster than poly-morphism. Mono-morphism can be inlined by the compiler in certain circumstances, thus not costing anything at all.

A warm welcome to The Java(tm) Specialists' Newsletter, sent to you from the Island of Crete in the Mediterranean. Last week I was in Estonia, a delightful little country just south of Finland. The total population is about 1.2 million people. For those Dilbert lovers out there, Estonia is not Elbonia ... ;-) Therefore, do not expect to find waist-high mud and pigs on phone lines. Instead, you will find a booming little country with super fast internet, plus if you are lucky, a bit of snow and ice. It is a country that is worth visiting, especially now that they are a member of the Schengen states, which simplifies travel for people on odd passports (such as my South African). We had several programmers from other cities and countries on our Java Specialist Master Course.

javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.

Polymorphism Performance Mysteries Explained

After our previous newsletter, Cliff Click, the author of the Server HotSpot VM, confirmed what we had observed by experimentation. Before we go into the details of mono-morphism, bi-morphism and poly-morphism, I would like to show you some code that you can use today to demonstrate to your friends and colleagues that Java is "infinitely" faster than C++ / C# / Pascal / [their favourite language here]. Infinitely faster, that is, at doing nothing :-)

Let's take the following class FastJavaFast. It will call a method 1000000000000000000000000000 times in under a second. That is 10 to the power of 27. The method will not just be any method, but an overridden method (like virtual methods in C++). You need to start the program with the -server flag in Java 6.

public class FastJavaFast { public void test() { for ( int i = 0 ; i < 1000 * 1000 * 1000 ; i++) { test2(); } } public void test2() { for ( int i = 0; i < 1000 * 1000 * 1000 ; i++) { test3(); } } public void test3() { for ( int i = 0; i < 1000 * 1000 * 1000 ; i++) { foo(); } } // this is a "virtual" method, in the C++ sense public void foo() { } public static void main(String[] args) { FastJavaFast fast = new FastJavaFast() { // I am overriding the foo() method public void foo() { // do nothing } }; long time = System.currentTimeMillis(); fast.foo(); fast.test3(); fast.test2(); fast.test(); time = System.currentTimeMillis() - time; System.out.println( "time = " + time); } }

On my old Dell D800 notebook, I can execute the test in just a couple of milliseconds! Calling an overridden method 1,000,000,000,000,000,000,000,000,000 in under a second is simply blazingly fast! If we keep on calling test(), it eventually executes in 0 milliseconds. Swallow that one, C#! No other language can do nothing as fast as Java :-)

Now for the explanation. The first time that a method is run, the HotSpot Compiler tries to optimize the code. It uses something called On-Stack-Replacement (OSR), where it replaces some of the byte codes with optimized code. This is not the best optimization, but a start. If you call the method a second time, the newly optimized method is called. Since we call the foo() method first, the HotSpot compiler determines that there is only one implementation of the method being called. When we call the test3() method, it has already determined that foo() can be inlined, and it eliminates the loop altogether. We then call test2(), where it again has already determined that test3() does nothing, thus eliminating the loop in test2(). Lastly, we call test() and similarly, the loop is optimized out.

You can test this by inlining the calls to test2() and test3(), thus having a method test() such as:

public void test() { for ( int i = 0 ; i < 1000 * 1000 * 1000 ; i++) { for ( int j = 0 ; j < 1000 * 1000 * 1000 ; j++) { for ( int k = 0 ; k < 1000 * 1000 * 1000 ; k++) { foo(); } } } }

I estimate this should take approximately 79,274,479,959 years to complete on my Dell ...

The amazing performance of inlining code should affect how we code Java. We should stay clear of long methods and rather break them up more. Lots of small methods usually result in less duplicate copy and paste. Since the inlining is done for you by the Server HotSpot Compiler, there is no cost in calling the method.

This was one of the reasons for the strange results in my previous newsletter, where the initial results were faster than subsequent runs. Instead of calling a test method many times in our main() method, we should rather extract that code into a separate method and call that from main().

Thanks to Cliff Click's input, I changed my test code to show these effects. First off, instead of hard-coding the instances, I generate a bunch of random B subclasses and put them into an array. In my test, I then iterate over the A instances and call the run() method. Instead of showing you all the classes again, I will just show you one example and refer you to my website for a zip file containing all the sources.

We start with a common interface for all the tests, called Test.java. The run() method calls B's foo() function, which might be a subclass instance. The description() returns a String showing what the test is doing.

public interface Test { public void run(); public String description(); }

For example, here we have A3, B3, C3:

public class A3 implements Test { private final B3 b; public A3(B3 b) { this .b = b; } public void run() { b.f(); } public String description() { return "Bi-Morphic: Two subclasses, via interface" ; } } public interface B3 { public void f(); } public class C3 implements B3 { public void f() { } }

In this new test class, we choose the test at start-up with a command-line argument. Depending on the test chosen, we generate instances of different test cases. The command line arguments are:

0 - No-morphic: Single subclass, pointed to directly 1 - Mono-Morphic: Single subclass, via interface 2 - Bi-Morphic: Two subclasses, via interface 3 - Poly-Morphic: Three subclasses, via interface 4 - Poly-Morphic: Four subclasses, via interface 5 - Poly-Morphic: Eight subclasses, via interface

We generate the test data by making instances of the relevant classes (A1, A2, etc.) with the corresponding B* classes and subclasses. For A4, the interface would be B4 and the implementations of B4 would be C4, D4, E4. When we run the test, we pick a random set of 10 instance, and then call the run() method on those. In order to "warm-up" the HotSpot Server Compiler, we run through all of the tests first, to let the compiler settle. Once that is completed, we do the actual runs. You can see from our results below that our standard deviation is rather low.

import java.lang.reflect.Constructor; import java.util.Random; public class PolymorphismCliffRandomTest { private static final int UPTO = 100 * 1000 * 1000 ; public static void main(String[] args) throws Exception { int offset = Integer.parseInt(args[ 0 ]); Test[] tests = generateTestData(offset); printDescriptions(tests); System.out.println( "Warmup" ); testAll(tests); System.out.println( "Actual run" ); printHeader(tests); testAll(tests); } private static void testAll(Test[] tests) { for ( int j = 0 ; j < 10 ; j++) { runTests(tests); } } private static void printDescriptions(Test[] tests) { System.out.println(tests[ 0 ].getClass().getSimpleName() + ": " + tests[ 0 ].description()); System.out.println(); } public static void runTests(Test[] tests) { long time = System.currentTimeMillis(); test(tests); time = System.currentTimeMillis() - time; System.out.print(time + "\t" ); System.out.flush(); System.out.println(); } public static void test(Test[] sources) { Test t0 = makeRandomTest(sources); Test t1 = makeRandomTest(sources); Test t2 = makeRandomTest(sources); Test t3 = makeRandomTest(sources); Test t4 = makeRandomTest(sources); Test t5 = makeRandomTest(sources); Test t6 = makeRandomTest(sources); Test t7 = makeRandomTest(sources); Test t8 = makeRandomTest(sources); Test t9 = makeRandomTest(sources); for ( int i = 0 ; i < UPTO / 10 ; i++) { t0.run(); t1.run(); t2.run(); t3.run(); t4.run(); t5.run(); t6.run(); t7.run(); t8.run(); t9.run(); } } private static Test makeRandomTest(Test[] sources) { return sources[(( int ) (Math.random() * sources.length))]; } private static void printHeader(Test[] tests) { System.out.print(tests[ 0 ].getClass().getSimpleName()); System.out.print( '\t' ); System.out.println(); } private static Test[] generateTestData(int offset) throws Exception { switch (offset) { default : throw new IllegalArgumentException( "offset:" + offset); case 0 : return fillSources(A1. class , B1. class , B1. class ); case 1 : return fillSources(A2. class , B2. class , C2. class ); case 2 : return fillSources( A3. class , B3. class , C3. class , D3. class ); case 3 : return fillSources( A4. class , B4. class , C4. class , D4. class , E4. class ); case 4 : return fillSources( A5. class , B5. class , C5. class , D5. class , E5. class , F5. class ); case 5 : return fillSources( A6. class , B6. class , C6. class , D6. class , E6. class , F6. class , G6. class , I6. class , J6. class ); } } private static Test[] fillSources( Class<? extends Test> aClass, Class<?> bClass, Class<?>... bClasses) throws Exception { Test[] sources = new Test[ 1000 ]; Random rand = new Random( 0 ); Constructor<? extends Test> constr = aClass.getConstructor(bClass); for ( int i = 0 ; i < sources.length; i++) { int offset = Math.abs(rand.nextInt() % bClasses.length); Object b = bClasses[offset].newInstance(); sources[i] = constr.newInstance(b); } return sources; } }

We run this six times, one for each test case. Here are the test cases:

A1: No-morphic: Single subclass, pointed to directly A2: Mono-Morphic: Single subclass, via interface A3: Bi-Morphic: Two subclasses, via interface A4: Poly-Morphic: Three subclasses, via interface A5: Poly-Morphic: Four subclasses, via interface A6: Poly-Morphic: Eight subclasses, via interface

On my server, the average results are:

A1 A2 A3 A4 A5 A6 0 (0) 0 (0) 495 (9) 2855 (31) 2838 (18) 2837 (10)

When you run it with Java 6 -server, A1 and A2 come back with an average of 0 (0). The standard deviation is in brackets. In the case of a simple pointer to B or where there is a single subclass of B, the method call is inlined.

A3 is the bi-morphic case, where we have two subclasses of B. In this case, the results come back as 495 (9). The Server HotSpot compiler, according to Cliff Click, deals with bi-morphism as a special case of poly-morphism: "Where the server compiler can prove only two classes reach a call site, it will insert a type-check and then statically call both targets (which may then further inline, etc)."

The cases A4, A5 and A6 are all where we have more than two subclasses of B. In these cases, the results come back as 2855 (31), 2838 (18) and 2837 (10). They are pretty much equal and much slower than the bi-morphic case. According to Cliff Click: "HotSpot uses an inline-cache for calls where the compiler cannot prove only a single target can be called. An inline-cache turns a virtual (or interface) call into a static call plus a few cycles of work. It's is a 1-entry cache inlined in the code; the Key is the expected class of the 'this' pointer, the Value is the static target method matching the Key, directly encoded as a call instruction. As soon as you need 2+ targets for the same call site, you revert to the much more expensive dynamic lookup (load/load/load/jump-register). Both compilers use the same runtime infrastructure, but the server compiler is more aggressive about proving a single target."

It was nice having my discoveries confirmed by the guy who wrote the JVM Server Hotspot Compiler :-) Thanks to Cliff Click for taking the time to explain how it works. We also saw that Java is really fast at doing nothing, so be extra careful with empty loops that you inserted into your code instead of Thread.sleep(). Since Java 6, they are in danger of being removed.

We see again how good design wins: short methods, lots of polymorphism. If we do not need the polymorphism, it is optimized out of our code. Thus the penalty for using it is extremely small, usually non-existant where it counts.

I am in the process of expanding my Java Design Patterns Course to include more patterns. This has been one of my most popular courses to date. If this would interest you, please let me know :-).

Kind regards

Heinz

P.S. In my last newsletter, I mentioned that we were trying to buy an olive grove on which to build our family house. We managed to finally buy it yesterday. If you can believe this, the lawyers are on strike in Greece at the moment. Even the kids at school strike! They say that you have not experienced true Greece if you have not been inconvenienced by at least one strike whilst on holiday here ;-)

We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)

Load Disqus comments

When you load these comments, you'll be connected to Disqus. Privacy Statement.

Please enable JavaScript to view the comments powered by Disqus.

Related Articles

105 Performance Surprises in Tiger 2005-03-28 Java 5 has some interesting performance optimizations than can make our code faster without our intervention. We look at some of the new features. Full Article

170 Discovering Objects with Non-Trivial Finalizers 2009-02-27 It is well known that implementing a non-trivial finalize() method can cause GC and performance issues, plus some subtle concurrency bugs. In this newsletter, we show how we can find all objects with a non-trivial finalize() method, even if they are not currently eligible for finalization. Full Article

068 Appending Strings 2003-04-21 Java compilers convert Strings appended with + to StringBuffer. The generated bytecode is compiler specific, with javac and the Eclipse compiler producing slightly different results. Full Article

Browse the Newsletter Archive