A blog post by Renato Athaydes benchmarks the “hidden costs of kotlin” as described by a series of blog posts by Christophe B.

I advise you to read both of those first. Here, we will not cover the original reasons why these features might be costly nor the benchmarking code used to test it.

While the original blog posts were geared towards using kotlin on android, the benchmarks are on the jvm. Android, however, does not use the jvm. It has it’s own runtime ART (previously Dalvik). I decided to modify the benchmarks to run on an android device.

Just to be clear, nothing here invalidates any results of the previous benchmarking blog post. These are run on a completely different architecture and runtime.

Methodology

My fork is on github. I had to port the benchmarks from JMH to Spanner (a fork of Caliper) but they should be functionally the same. Also, because android does not natively support java 8 lambdas until api 24, the desugar is used to backport the feature. Tests were run on a Google Pixel running Android 7.1.2. It would be interesting to hear if the results are different on different android versions/devices.

Results

Please refer to the original benchmark blog post for what code was run in each benchmark. I’m only going to provide the results on Android here.

Lambdas

javaLambda runtime(ns): min = 16.89 , 1st qu.= 17.35 , median= 17.57 (-), mean= 17.60 , 3rd qu.= 18.03 , max = 18.11 javaLambdaGeneric runtime(ns): min = 26.58 , 1st qu.= 27.16 , median= 27.55 (-), mean= 27.52 , 3rd qu.= 27.94 , max = 28.30 kotlinInlinedFunction runtime(ns): min = 10.23 , 1st qu.= 10.24 , median= 10.30 (-), mean= 10.50 , 3rd qu.= 10.74 , max = 11.33 kotlinLambda runtime(ns): min = 28.62 , 1st qu.= 28.83 , median= 29.63 (-), mean= 29.68 , 3rd qu.= 30.20 , max = 31.88

Note: In the charts, lower is better.

Unlike before, we do see the kotlin lambda take more time than the java version. I’ve also added a version of the java lambda that uses a generic Function instead of ToIntFunction . You’ll note that this runs for almost exactly the same amount of time as the kotlin version, so the extra time is almost certainly due to boxing the primitive int. The kotlin inlined function is the fastest of them all, saving boxing, a few null checks, and a method lookup/call.

Companion Objects

javaPrivateConstructorCallFromStaticMethod runtime(ns): min = 77.83 , 1st qu.= 78.72 , median= 80.03 (-), mean= 81.09 , 3rd qu.= 83.70 , max = 86.39 kotlinPrivateConstructorCallFromCompanionObject runtime(ns): min = 100.68 , 1st qu.= 107.23 , median= 109.71 (-), mean= 109.52 , 3rd qu.= 113.14 , max = 116.23 kotlinPrivateStaticConstructorCallFromCompanionObject runtime(ns): min = 98.46 , 1st qu.= 101.49 , median= 106.56 (-), mean= 105.00 , 3rd qu.= 107.30 , max = 110.67

Again, we do see a cost to the kotlin version. I’ve added an additional method which applies the advice in “Exploring Kotlin’s hidden costs” to use const and @JvmStatic . This appears to remove the difference between the kotlin and java versions

Local Functions

javaLocalFunction runtime(ns): min = 92.55 , 1st qu.= 94.46 , median= 99.79 (-), mean= 99.13 , 3rd qu.= 102.84 , max = 105.55 javaLocalFunctionWithoutCapturingLocalVariable runtime(ns): min = 7.49 , 1st qu.= 7.53 , median= 7.58 (-), mean= 7.60 , 3rd qu.= 7.66 , max = 7.84 kotlinLocalFunctionCapturingLocalVariable runtime(ns): min = 105.18 , 1st qu.= 108.60 , median= 110.76 (-), mean= 114.35 , 3rd qu.= 122.18 , max = 127.57 kotlinLocalFunctionWithoutCapturingLocalVariable runtime(ns): min = 7.23 , 1st qu.= 7.27 , median= 7.36 (-), mean= 7.37 , 3rd qu.= 7.48 , max = 7.49

I’ve added an additional java lambda that does not capture a value. Again, unlike before, (starting to see a pattern?) there is a huge difference between the capturing and non-capturing lambdas. This doesn’t seem language-specific though. The kotlin and java versions are nearly the same. The cost here is capturing vs not.

Null Safety

javaSayHello runtime(ns): min = 425.72 , 1st qu.= 442.25 , median= 457.86 (-), mean= 458.86 , 3rd qu.= 469.51 , max = 516.85 kotlinSayHello runtime(ns): min = 439.67 , 1st qu.= 440.48 , median= 455.82 (-), mean= 468.59 , 3rd qu.= 472.44 , max = 577.72

And now we’ve come across a result that seems to agree with the previous blog post. However, the null check is completely dwarfed by string allocations. In order to see its actual effect, I’ve decided to remove the string concatenation.

javaSayHello runtime(ns): min = 10.95 , 1st qu.= 10.96 , median= 10.98 (-), mean= 11.03 , 3rd qu.= 11.12 , max = 11.21 kotlinSayHello runtime(ns): min = 13.82 , 1st qu.= 13.99 , median= 14.54 (-), mean= 14.37 , 3rd qu.= 14.62 , max = 14.75

So the null checks do have a cost, though at less than 5ns it’s pretty small.

Varargs

javaIntVarargs runtime(ns): min = 138.62 , 1st qu.= 140.98 , median= 142.37 (-), mean= 142.58 , 3rd qu.= 144.88 , max = 145.74 kotlinIntVarargs runtime(ns): min = 371.31 , 1st qu.= 375.83 , median= 392.00 (-), mean= 390.80 , 3rd qu.= 405.13 , max = 407.57

This result agrees most closely with the previous blog post. The java version is over 2x faster than the kotlin one.

Delegated Properties

javaSimplyInitializedProperty runtime(ns): min = 90.50 , 1st qu.= 91.68 , median= 93.43 (-), mean= 96.48 , 3rd qu.= 100.62 , max = 111.42 kotlinDelegateProperty runtime(ns): min = 211.95 , 1st qu.= 217.71 , median= 233.14 (-), mean= 235.72 , 3rd qu.= 252.38 , max = 268.81

We do see a cost to using the delegated property. However it appears to be much more than 10%.

Ranges (Indirect Reference)

kotlinIndirectRange runtime(ns): min = 133.25 , 1st qu.= 134.83 , median= 138.75 (-), mean= 140.82 , 3rd qu.= 147.47 , max = 154.65 kotlinLocallyDeclaredRange runtime(ns): min = 2.30 , 1st qu.= 2.32 , median= 2.36 (-), mean= 2.35 , 3rd qu.= 2.37 , max = 2.39

Ok, unlike before this is the opposite of a “not significant” cost. The execution differs by an order of magnitude!

Ranges (Non-primitive Types)

javaStringComparisons runtime(ns): min = 19.94 , 1st qu.= 20.07 , median= 20.48 (-), mean= 20.39 , 3rd qu.= 20.62 , max = 20.84 kotlinStringRangeInclusionWithConstantRange runtime(ns): min = 42.95 , 1st qu.= 43.30 , median= 43.67 (-), mean= 43.81 , 3rd qu.= 44.48 , max = 44.73 kotlinStringRangeInclusionWithLocalRange runtime(ns): min = 144.33 , 1st qu.= 147.76 , median= 154.53 (-), mean= 154.06 , 3rd qu.= 157.01 , max = 171.33

Again, unlike before, the difference here is quite large. Note: I’ve included the java string comparison benchmark as well. It’s not included in the previous blog post despite it being a part of the benchmark.

Ranges (Iteration)

kotlinRangeForEachFunction runtime(ns): min = 454.21 , 1st qu.= 475.43 , median= 508.76 (-), mean= 507.04 , 3rd qu.= 524.45 , max = 587.05 kotlinRangeForEachLoop runtime(ns): min = 157.03 , 1st qu.= 166.79 , median= 172.07 (-), mean= 170.62 , 3rd qu.= 175.78 , max = 176.92 kotlinRangeForEachLoopWithStep1 runtime(ns): min = 444.81 , 1st qu.= 447.64 , median= 455.73 (-), mean= 460.28 , 3rd qu.= 472.08 , max = 489.84

Like before, the forEach function is way slower than a simple for loop. However, the explicit step is quite costly as well.

Iterations: Collection Indices

kotlinCustomIndicesIteration runtime(ns): min = 231.88 , 1st qu.= 236.17 , median= 260.12 (-), mean= 256.89 , 3rd qu.= 273.05 , max = 280.20 kotlinIterationUsingLastIndexRange runtime(ns): min = 98.95 , 1st qu.= 105.44 , median= 107.80 (-), mean= 107.10 , 3rd qu.= 109.29 , max = 112.01

And finally, using lastIndex is more than twice as fast as using a custom indices .

Conclusion

At least for Android, all the advice in the “Exploring Kotlin’s hidden costs” series is correct. This underlines the need to not only measure, but measure on the platform you are using.