In a recent pull request review at work I suggested using context bound to declare effect capabilities instead of implicit values as this is what I see the most in OSS projects and it has also been my preference for a while. It makes the code look nicer even though the latter approach is equivalent. Context bound constraints get translated into implicits values at compile time.

Context bound

def p1 [ F [ _ ] : Applicative: Console ] : F [ Unit ] = Console [ F ]. putStrLn ( "a" ) *> Console [ F ]. putStrLn ( "b" ) *> Console [ F ]. putStrLn ( "c" )

Implicit values

def p2 [ F [ _ ]]( implicit ev : Applicative [ F ], c : Console [ F ]) : F [ Unit ] = c . putStrLn ( "a" ) *> c . putStrLn ( "b" ) *> c . putStrLn ( "c" )

Every time we call Console[F] what we are doing is invoking the “summoner” method normally defined as follows:

object Console { def apply [ F [ _ ]]( implicit ev : Console [ F ]) : Console [ F ] = ev }

So the implicit value gets resolved at compile time only once but I’ve got a very good question:

Is there any performance penalty introduced by the summoner?

And I thought I knew the answer… But I wasn’t very sure so that was a call for examining the JVM bytecode and see the differences!

JVM bytecode

Here’s the bytecode generated for both p1 and p2 . Let’s take a look at the differences.

Context bound program

public < F extends java . lang . Object > F p1 ( cats . Applicative < F >, com . github . gvolpe . Console < F >); descriptor: ( Lcats / Applicative ; Lcom / github / gvolpe / Console ;) Ljava / lang / Object ; flags: ACC_PUBLIC Code: stack = 4 , locals = 3 , args_size = 3 0 : getstatic # 56 // Field cats/implicits$.MODULE$:Lcats/implicits$; 3 : getstatic # 56 // Field cats/implicits$.MODULE$:Lcats/implicits$; 6 : getstatic # 61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$; 9 : aload_2 10 : invokevirtual # 65 // Method com/github/gvolpe/Console$.apply:(Lcom/github/gvolpe/Console;)Lcom/github/gvolpe/Console; 13 : ldc # 67 // String a 15 : invokeinterface # 73 , 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object; 20 : aload_1 21 : invokevirtual # 77 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops; 24 : getstatic # 61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$; 27 : aload_2 28 : invokevirtual # 65 // Method com/github/gvolpe/Console$.apply:(Lcom/github/gvolpe/Console;)Lcom/github/gvolpe/Console; 31 : ldc # 79 // String b 33 : invokeinterface # 73 , 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object; 38 : invokeinterface # 82 , 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object; 43 : aload_1 44 : invokevirtual # 77 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops; 47 : getstatic # 61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$; 50 : aload_2 51 : invokevirtual # 65 // Method com/github/gvolpe/Console$.apply:(Lcom/github/gvolpe/Console;)Lcom/github/gvolpe/Console; 54 : ldc # 84 // String c 56 : invokeinterface # 73 , 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object; 61 : invokeinterface # 82 , 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object; 66 : areturn LineNumberTable: line 10 : 0 line 11 : 24 line 10 : 43 line 12 : 47 LocalVariableTable: Start Length Slot Name Signature 0 67 0 this Lcom / github / gvolpe / summoner$ ; 0 67 1 evidence$1 Lcats / Applicative ; 0 67 2 evidence$2 Lcom / github / gvolpe / Console ; Signature: # 49 // <F:Ljava/lang/Object;>(Lcats/Applicative<TF;>;Lcom/github/gvolpe/Console<TF;>;)TF; MethodParameters: Name Flags evidence$1 final evidence$2 final

Implicit values program

public < F extends java . lang . Object > F p2 ( cats . Applicative < F >, com . github . gvolpe . Console < F >); descriptor: ( Lcats / Applicative ; Lcom / github / gvolpe / Console ;) Ljava / lang / Object ; flags: ACC_PUBLIC Code: stack = 4 , locals = 3 , args_size = 3 0 : getstatic # 56 // Field cats/implicits$.MODULE$:Lcats/implicits$; 3 : getstatic # 56 // Field cats/implicits$.MODULE$:Lcats/implicits$; 6 : aload_2 7 : ldc # 90 // String 1 9 : invokeinterface # 73 , 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object; 14 : aload_1 15 : invokevirtual # 77 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops; 18 : aload_2 19 : ldc # 92 // String 2 21 : invokeinterface # 73 , 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object; 26 : invokeinterface # 82 , 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object; 31 : aload_1 32 : invokevirtual # 77 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops; 35 : aload_2 36 : ldc # 94 // String 3 38 : invokeinterface # 73 , 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object; 43 : invokeinterface # 82 , 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object; 48 : areturn LineNumberTable: line 15 : 0 line 16 : 18 line 15 : 31 line 17 : 35 LocalVariableTable: Start Length Slot Name Signature 0 49 0 this Lcom / github / gvolpe / summoner$ ; 0 49 1 evidence$3 Lcats / Applicative ; 0 49 2 c Lcom / github / gvolpe / Console ; Signature: # 49 // <F:Ljava/lang/Object;>(Lcats/Applicative<TF;>;Lcom/github/gvolpe/Console<TF;>;)TF; MethodParameters: Name Flags ev final c final

So the bytecode generated for the context bound approach has a few extra calls to getstatic and invokevirtual but what does this actually mean? Find below the definition given by Wikipedia:

getstatic : get a static field value of a class, where the field is identified by field reference in the constant pool index (indexbyte1 « 8 + indexbyte2)

: get a static field value of a class, where the field is identified by field reference in the constant pool index (indexbyte1 « 8 + indexbyte2) invokevirtual : invoke virtual method on object objectref and puts the result on the stack (might be void); the method is identified by method reference index in constant pool (indexbyte1 « 8 + indexbyte2)

So, is it slower? How can we know? There’s only one way…

Benchmark it all!

When not sure about some performance question / issue, benchmark your code. Benchmarking is not easy but fortunately in the JVM we have a fantastic tool: Java Microbenchark Harness or JMH for short.

Here are the simple benchmarks I wrote, calling each method thousand times via replicateA and measuring the throughput:

import cats.Id import cats.implicits._ import org.openjdk.jmh.annotations._ class benchmarks { @Benchmark @BenchmarkMode ( Array ( Mode . Throughput )) def contextBoundSummoner () : Unit = p1 [ Id ]. replicateA ( 1000 ). void @Benchmark @BenchmarkMode ( Array ( Mode . Throughput )) def evidenceSummoner () : Unit = p2 [ Id ]. replicateA ( 1000 ). void }

And these are the results, running with 20 iterations, 5 warm-up iterations, 1 fork and 1 thread:

sbt> jmh:run -i 20 -wi 5 -f1 -t1 [ info ] Benchmark Mode Cnt Score Error Units [ info ] contextBoundSummoner thrpt 20 15777.375 ± 593.111 ops/s [ info ] evidenceSummoner thrpt 20 17302.136 ± 442.127 ops/s

Conclusion

The difference is small enough to not be a performance concern so I would still recommend using the context bounds approach but remember to benchmark and deeply analyze your code before jumping to conclusions!

After publishing it on Twitter I’ve got good feedback and some suggestions so here’s the update.

Macro-based summoner: imp

Chris Birchall shared this interesting macro-based project named imp by Erik Osheim and I ran the same analysis with it.

First of all, the summoner was changed accordingly using the summon macro.

object Console { import imp.summon import language.experimental.macros def apply [ F [ _ ] : Console ] : Console [ F ] = macro summon [ Console [ F ]] }

And here are the results of the benchmarks, running 20 iterations like before:

sbt> jmh:run -i 20 -wi 5 -f1 -t1 [ info ] Benchmark Mode Cnt Score Error Units [ info ] contextBoundSummoner thrpt 20 14881.788 ± 626.458 ops/s [ info ] evidenceSummoner thrpt 20 15039.118 ± 411.016 ops/s

The macro-based solution was faster as it claims to be. The scores are almost identical and that’s because the JVM bytecode generated by both methods are exactly the same! And effectively running the benchmarks more times gives similar results and sometimes the winner is the classic evidenceSummoner . So we can safely claim that both methods p1 and p2 are exactly the same for the JVM.

FWIW someone else have run benchmarks on imp before. They’re slightly different though.

@inline final

Pavel Khamutou suggested adding the @inline keyword to the summoner and I have also made it final so here’s how it looks like:

object Console { @inline final def apply [ F [ _ ]]( implicit ev : Console [ F ]) : Console [ F ] = ev }

Unfortunately the generated JVM bytecode was the same as without trying to inline it so the benchmark results were very similar to the first results.

-opt:l:inline & -opt-inline-from:** compiler flags

Kaidax suggested turning on the inliner compiler flags as described in this Lightbend blog post. At first I didn’t see any results but after being pointed out on Reddit by /u/zzyzzyxx that I was doing it wrong (thanks!), I tried once again and the bytecode was effectively changed.

The calls to invokevirtual have been removed and a bunch of extra instructions have been added.

public < F extends java . lang . Object > F p1 ( cats . Applicative < F >, com . github . gvolpe . Console < F >); descriptor: ( Lcats / Applicative ; Lcom / github / gvolpe / Console ;) Ljava / lang / Object ; flags: ACC_PUBLIC Code: stack = 4 , locals = 3 , args_size = 3 0 : getstatic # 56 // Field cats/implicits$.MODULE$:Lcats/implicits$; 3 : getstatic # 56 // Field cats/implicits$.MODULE$:Lcats/implicits$; 6 : getstatic # 61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$; 9 : ifnonnull 14 12 : aconst_null 13 : athrow 14 : aload_2 15 : ldc # 63 // String a 17 : invokeinterface # 69 , 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object; 22 : aload_1 23 : invokevirtual # 73 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops; 26 : getstatic # 61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$; 29 : ifnonnull 34 32 : aconst_null 33 : athrow 34 : aload_2 35 : ldc # 75 // String b 37 : invokeinterface # 69 , 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object; 42 : invokeinterface # 78 , 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object; 47 : aload_1 48 : invokevirtual # 73 // Method cats/implicits$.catsSyntaxApply:(Ljava/lang/Object;Lcats/Apply;)Lcats/Apply$Ops; 51 : getstatic # 61 // Field com/github/gvolpe/Console$.MODULE$:Lcom/github/gvolpe/Console$; 54 : ifnonnull 59 57 : aconst_null 58 : athrow 59 : aload_2 60 : ldc # 80 // String c 62 : invokeinterface # 69 , 2 // InterfaceMethod com/github/gvolpe/Console.putStrLn:(Ljava/lang/Object;)Ljava/lang/Object; 67 : invokeinterface # 78 , 2 // InterfaceMethod cats/Apply$Ops.$times$greater:(Ljava/lang/Object;)Ljava/lang/Object; 72 : areturn StackMapTable: number_of_entries = 3 frame_type = 255 /* full_frame */ offset_delta = 14 locals = [ class com / github / gvolpe / summoner$ , class cats / Applicative , class com / github / gvolpe / Console ] stack = [ class cats / implicits$ , class cats / implicits$ ] frame_type = 255 /* full_frame */ offset_delta = 19 locals = [ class com / github / gvolpe / summoner$ , class cats / Applicative , class com / github / gvolpe / Console ] stack = [ class cats / implicits$ , class cats / Apply$Ops ] frame_type = 88 /* same_locals_1_stack_item */ stack = [ class cats / Apply$Ops ] LineNumberTable: line 10 : 0 line 34 : 14 line 10 : 14 line 11 : 26 line 34 : 34 line 11 : 34 line 10 : 47 line 12 : 51 line 34 : 59 line 12 : 59 LocalVariableTable: Start Length Slot Name Signature 0 73 0 this Lcom / github / gvolpe / summoner$ ; 0 73 1 evidence$1 Lcats / Applicative ; 0 73 2 evidence$2 Lcom / github / gvolpe / Console ; Signature: # 49 // <F:Ljava/lang/Object;>(Lcats/Applicative<TF;>;Lcom/github/gvolpe/Console<TF;>;)TF; MethodParameters: Name Flags evidence$1 final evidence$2 final

The benchmark results show that it has effectively been optimized:

sbt> jmh:run -i 20 -wi 5 -f1 -t1 [ info ] Benchmark Mode Cnt Score Error Units [ info ] contextBoundSummoner thrpt 20 16330.873 ± 462.765 ops/s [ info ] evidenceSummoner thrpt 20 15768.175 ± 587.291 ops/s

Benchmarking machine

The benchmarks have run on a Ubuntu 18.04 LTS, 16 GB RAM and Intel® Core™ i7-8550U CPU @ 1.80GHz × 8 machine on Java Oracle™ 8:

Java ( TM ) SE Runtime Environment ( build 1.8.0_161-b12 ) Java HotSpot ( TM ) 64-Bit Server VM ( build 25.161-b12, mixed mode )

Source code

Try it out yourself: https://github.com/gvolpe/summoner-benchmarks

Conclusion #2

The conclusion remains the same. Context bound constraints are my favorite and as demonstrated have very little overhead. Using the macro-based solution is interesting but if you really care about that level of performance maybe the JVM isn’t what you’re looking for? :)

Thank you all for your amazing feedback!