I have prepared three algorithms to test the compilers: a Mandelbrot Set generator, a Knapsack solver and a Modular Algebra calculator. The full code is available on Github.

I’ve tried to write the code in the most idiomatic Kotlin way, taking advantage of extension functions, data classes, context programming, for each loop. Not only the resulting code is very easy to read, but also the performances are not significantly slower than using a “C” like syntax.

I have chosen these three algorithms to have some realistic example of very different CPU-intensive tasks, and the code is completely in Kotlin, without using any Java library.

Mandelbrot Set

The Mandelbrot Set is probably the most famous fractal shape and you may have seen it even if you don’t know the name.

Mathematically is defined as the Set of all points in the Complex plane where the function z <- z²+c will not diverge when iterated. It is quite easy to generate the set iterating the functions for some points on the complex plane and creating an image out of them.

Since the goal here is the performance and not the graphics I kept things simple using text graphic.

Let’s start looking at the code for the Mandelbrot Set. Here is the definition of a type to represent the Complex numbers:

data class Complex(val r: Double, val i: Double){

operator fun times(other: Complex) =

Complex(

r = this.r * other.r - this.i * other.i,

i = this.i * other.r + this.r * other.i

)

operator fun plus(other: Complex) =

Complex(r = this.r + other.r, i = this.i + other.i)

operator fun minus(other: Complex) =

Complex(r = this.r - other.r, i = this.i - other.i)

fun squared() = this * this

fun squaredModule() = r * r + i * i

fun Double.toComplex() = Complex(r=this, i=0.0)

}

Then the actual calculation for each point of the Set:

fun mandelSet(initZ: Complex, c:Complex, maxIter: Int): Int {

var z = initZ

(1..maxIter).forEach{

z = z.squared() + c

if (z.squaredModule() >= 4)

return it

}

return maxIter

}

You can see here how using operator overloading and a data class to represent complex numbers can really simplify the code and making easier to understand.

Once we defined in the Complex class the rules to do operations on complex numbers, the mandelSet function has only to check if the operation z <- z²+c is “escaping” or not and in case after how many iteration it will exceed the threshold of 4.

Here you can see the characteristic cardioid shape of Mandelbrot Set in the output rendered in AsciiArt:

Knapsack problem

The Knapsack problem can be defined in several ways. Imagine a thief that just broke into a watch shop. He can steal as many watches as he wants, provided that he doesn’t exceed the maximum weight that he can carry on his knapsack.

Being a rational thief, he definitely wants to optimize the value of the watches in his knapsack. Every watch has a price and a weight attached. So we need an algorithm to find the set of Watches with the maximum total price for a given weight.

Real-world applications of this algorithm include optimizing cuts and materials for CNC machines and marketing strategies for allocating advertising budget.

For example, let’s start with a shop with only 3 watches defined like this:

val shop = Knapsack.shop(

Watch(weight = 1, price = 1),

Watch(weight = 3, price = 2),

Watch(weight = 1, price = 3)

)

If we have a maximum weight of 1, it’s better for us to pick up the third watch, rather than the first, because the value is higher.

If we have a maximum weight of 3, we can choose the number 2 (price 2) or number 1 and 3 (price 1 + 3). In this case, it’s better to pick up 1 and 3 even if their combined weight is less than the maximum.

These are the complete solutions for this shop:

assertEquals(3, selectWatches(shop, maxWeight = 1))

assertEquals(4, selectWatches(shop, maxWeight = 2))

assertEquals(4, selectWatches(shop, maxWeight = 3))

assertEquals(5, selectWatches(shop, maxWeight = 4))

assertEquals(6, selectWatches(shop, maxWeight = 5))

As you can see, as the number of watches available increase, the number of possible choices become high very, very quickly. It is a classical NP-Hard problem.

To solve it in reasonable times we need to cheat a bit and use Dynamic Programming. We can build a map with the already optimized solution for each set of watches and so we can avoid recalculating them each time.

The general algorithm is based on an exhaustive search based on recursion. This is the Kotlin code to solve it, separated in a memoization function and the recursive search of maximum value.

typealias Memoizer = MutableMap<String, Int> fun priceAddingElement(memo: Memoizer, shop: Set<Watch>, choice: Set<Watch>, maxWeight: Int, priceSum: Int): Int =

shop.filter { !(it in choice) && it.weight <= maxWeight }

.map {

selectWatches(

memo,

shop,

maxWeight - it.weight,

choice + it,

priceSum + it.price) }

.filter { it > priceSum }

.max() ?: priceSum

fun selectWatches(memo: Memoizer, shop: Set<Watch>, maxWeight: Int, choice: Set<Watch>, priceSum: Int): Int =

memoization(memo, generateKey(choice)) {

priceAddingElement(memo, shop, choice, maxWeight, priceSum)}

private fun memoization(memo: Memoizer, key: String, f: () -> Int): Int = when (val w = memo[key]) {

null -> f().also { memo[key] = it }

else -> w

}

I really like how Kotlin allows expressing the intention clearly without having to repeat yourself.

Modular Algebra

Modular algebra is a very useful and interesting branch of mathematics. It’s also called “clock algebra” because it resembles the operation with hours in a clock, for example, 7 + 6 usually is equal to 13, but if we are talking about the time it will be equal to 1. Here the clock is a modulo 12 algebra, but we can generalize it to any number.

It has a lot of interesting properties, for example keeping adding a number which is prime with the modulo will generate a sequence with all the numbers up to the modulo. This means that if we add a number prime to 12 in our clock algebra we will generate a sequence with all the hours. For example, 5 is the smallest number prime to 12, the sequence of 5 in modulo 12 is: 5, 10, 3, 8, 1, 6, 11, 4, 9, 2, 7, 0 and then it repeats itself.

This is the class to define a modular number and its operations. Note that division is quite problematic in modular algebras and I used a brute force approach as a shortcut:

data class ModularNumber(val num: Int, val modulo: Int): Comparable<ModularNumber> {



override fun compareTo(other: ModularNumber): Int = num.compareTo(other.num)



operator fun plus(other: ModularNumber) = (num + other.num).toModularNumber()

operator fun minus(other: ModularNumber) = (num - other.num).toModularNumber()

operator fun times(other: ModularNumber) = (num * other.num).toModularNumber()

operator fun div(other: ModularNumber) = ((1..modulo).first { (it * other.num) % modulo == (this.num % modulo) }).toModularNumber()

operator fun inc() = (num + 1).toModularNumber()



fun squared() = (num*num).toModularNumber()



fun Int.toModularNumber(): ModularNumber =

ModularNumber(this % modulo, modulo)



}

What can we do with such a charming algebra? We can draw nice patterns! To produce they we only need to compare the square of two numbers, inside modular algebra.

data class PlaneGrid(val size: Int, val initPredicate: (Int) -> Boolean) {

private val grid = BooleanArray(size*size, initPredicate)



operator fun get(x: Int, y: Int): Boolean = grid[coord(x,y)]



private fun coord(x: Int, y: Int) = (y-1) * size + x -1



fun count(): Int = grid.count {it}

}



val compareSquares: (ModularNumber, ModularNumber) -> Boolean =

{ x, y -> x.squared() >= y.squared() }



fun compareSquares(size: Int, modulo: Int): Int =

ModularField(modulo).applyFunction(compareSquares)(size).count()



fun sumOfFunction(from: Int, to: Int, f: (Int) -> Int) =

(from..to)

.map { f(it) }

.sum() data class ModularField(val modulo: Int) { fun applyFunction(f: (ModularNumber, ModularNumber) -> Boolean): (Int) -> PlaneGrid =

{ size ->

PlaneGrid(size) { index ->

val xMod = (index % size).toModularNumber()

val yMod = (index / size).toModularNumber()

f(xMod, yMod)

}

} fun Int.toModularNumber(): ModularNumber =

ModularNumber(this % modulo, modulo)

}

These are some examples of modular patterns generated by the above functions:

Modulo 5 Modulo 15 Modulo 12

@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@

@@@@ @@@@ @@@@ @@@@ @@@@@@@@@@@@@@ @@@@ @@@@@ @@@@@ @@@@@ @

@@ @@ @@ @@ @@ @@@@@@ @@ @@ @@@ @@@ @@@

@@ @@ @@ @@ @ @ @ @ @ @ @ @

@@@@ @@@@ @@@@ @@@@ @@@@@@@@@@@@@@ @@@@ @@@ @@@ @@@

@@@@@@@@@@@@@@@@@@@@ @ @ @@@@@ @@@@@ @@@@@ @

@@@@ @@@@ @@@@ @@@@ @ @@ @@ @ @ @@@@@@@@@@@@@@@@@@@@

@@ @@ @@ @@ @@ @@@@@@ @@ @@ @@@@@ @@@@@ @@@@@ @

@@ @@ @@ @@ @@ @@@@@@ @@ @@ @@@ @@@ @@@

@@@@ @@@@ @@@@ @@@@ @ @@ @@ @ @ @ @ @

@@@@@@@@@@@@@@@@@@@@ @ @ @@@ @@@ @@@

@@@@ @@@@ @@@@ @@@@ @@@@@@@@@@@@@@ @@@@ @@@@@ @@@@@ @@@@@ @

@@ @@ @@ @@ @ @ @ @ @ @@@@@@@@@@@@@@@@@@@@

@@ @@ @@ @@ @@ @@@@@@ @@ @@ @@@@@ @@@@@ @@@@@ @

@@@@ @@@@ @@@@ @@@@ @@@@@@@@@@@@@@ @@@@ @@@ @@@ @@@

@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@ @ @ @

@@@@ @@@@ @@@@ @@@@ @@@@@@@@@@@@@@ @@@@ @@@ @@@ @@@

@@ @@ @@ @@ @@ @@@@@@ @@ @@ @@@@@ @@@@@ @@@@@ @

@@ @@ @@ @@ @ @ @ @ @ @@@@@@@@@@@@@@@@@@@@

@@@@ @@@@ @@@@ @@@@ @@@@@@@@@@@@@@ @@@@ @@@@@ @@@@@ @@@@@ @

Benchmarks

And now the part you were waiting for all along. Let’s compare the performance of the two compilers!

Let’s remember that Graal is written in Java and it is taking advantage of new researches in the compiler space, but it is still relatively young. C2 on the other side is extremely well-tuned and mature.

Now a word about testing methodology: All tests run on my Notebook with i7 4.4Ghz 32Gb. I ran the tests several times with different parameters and the results have been consistent. The actual values in the graphs have been taken in the same session without other applications running.

I run a test called PerformanceTest in each project which executes an end to end benchmark continuously. In this way, the compiler cannot optimize the code for a specific case. Then I choose the faster result, assuming it is the one without GC and OS pauses. All tests are running mono-thread.

Here are the results in microseconds as an average of many runs:

These results are consistent in showing a definite performance gain using Graal over Hotspot. The main reason is the more sophisticated inlining and escape analysis optimizations.

In some cases, Graal EE special optimizations (like loop unrolling) give it a further advantage.

As a methodology, I always suggest starting with an end-to-end continuous running benchmark, as it is the safest way to collect performance metrics. It will give us the best approximation of production performance.

If we want to understand better where the bottlenecks are, we need to create more specific micro-benchmarks. Fortunately for us, there is an incredibly good library to do exactly this.

JMH

Enter Java Micro-benchmark Harness. JMH is a library to simplify the task to implement Java microbenchmarks correctly. JMH has been created by some JVM developers and it is used for JVM internal performance test. We can take this as a good indication of its quality.

How to set up JMH using Gradle (and Kotlin)?

You just need to edit your build.gradle file. First, add the dependencies:

testImplementation("org.openjdk.jmh:jmh-core:1.23")

testImplementation("org.openjdk.jmh:jmh-generator-annprocess:1.23")

Then, add the Gradle JMH plugin:

plugins {

kotlin("jvm") version "1.3.70"

id("me.champeau.gradle.jmh") version "0.5.0"

}

Finally, add a section with the properties you want to set. You can find the full list here.

jmh {

jmhVersion = "1.23"

jvmArgs = listOf(

"-Xms6g",

"-Xmx6g",

"-Dgraal.ShowConfiguration=info",

"-XX:+AlwaysPreTouch",

"-XX:+UnlockExperimentalVMOptions",

"-XX:+UseJVMCICompiler"

)

warmupForks = 0

fork = 2

warmupIterations = 5

iterations = 5

}

How to create a test for JMH?

To create a new benchmark we only need to put an annotation around a function.

@Benchmark

fun knapsack(blackhole: Blackhole) {

blackhole.consume(Knapsack.selectWatches(shop, 199))

}

There are many possible optional parameters for the benchmark annotation, the default mode is to count the operations for a second, but there are many other possible ways to run the test.

In this test, you can see we are using blackhole class to prevent the compiler to consider the benchmark as “dead code” since we are not doing anything with the result of calculations.

Let’s see the results:

From this graph, we can see a bigger difference in raw CPU performance than in the continuous run test.

One difference is that JMH uses some techniques to avoid garbage collection cycles, why the continuous run after a while (usually from a few seconds to some minutes) would fill all the memory and require a full GC cycle.

Graal garbage collection is currently not as good as the Hotspot one, so this can partly explain the results. On the positive side, RedHat is currently investing to bring their great Shenandoah garbage collector to GraalVM. Expect new benchmarks when this will happen!