Be careful of passing functions around as parameters. You may be sacrificing performance for convenience.

I am a JAVA programmer who has decided to make the switch to SCALA. One of the first things I do when learning a new language is to take existing code and convert it to the new language. This way I have a reference implementation to compare the new code against. I try to compare things like code-size and performance to see how the new language stacks up.

I chose to convert an implementation of the greedy 2-Opt algorithm from JAVA to SCALA and see how it performs. 2-Opt is an algorithm that minimizes the path through a set of vertices. This code is cpu intensive and would provide great performance comparison. My first SCALA implementation ran around 1.75x slower than the JAVA implementation. I was surprised at this result. I knew my code was correct because the results were identical to the JAVA code. So I decided to break down my implementation to see if I could identify the problem.

My SCALA implementation made use of the many features available around functions. I decided to rewrite the SCALA code so that it was as close to JAVA as possible. When I did this, the JAVA code and the SCALA code was 25 milliseconds apart. At this point I decided to dig a little deeper to identify the exact SCALA construct that was causing the delay and write a test program that highlighted the problem.

It turned out to be how the cost function was being passed into the 2-Opt algorithm. In graph theory, the cost function determines the cost or weight of taking the path between to vertices. The cost function needs to be passed in as it may change given the types of vertices or the space that they are in. I chose to pass the function in as a parameter because it allowed me to create all my cost functions in one class and pass in the one I needed, versus java where I would have needed a cost interface and then implement each cost function as its own class.

I originally passed the function directly into the 2-Opt with the following signature:

def twoOptCostFunction(v:Array[Vertex], costFunc:(Vertex, Vertex)=>Int) : Array[Vertex]

It takes a function that accepts two Vertices and returns an Int. This caused the code to run twice as slow as this method:

def twoOptCostClass(v:Array[Vertex], costClass:CostClass) : Array[Vertex]

The two methods were identical except for the signature and calling the function, the function itself was the same.

Here is the cost class:

class CostClass { // Euclidean2D Cost function def cost(v1:Vertex, v2:Vertex):Int = { val xd = v2.x - v1.x val yd = v2.y - v1.y round(sqrt(xd*xd + yd*yd)).toInt } }

Here is how the methods are called.

twoOptCostClass(verts1, costClass) twoOptCostFunction(verts2, costClass.cost);

Here is how the cost function is called

if (costClass.cost(v(v1),v(v2))+costClass.cost(v(v3),v(v4)) > costClass.cost(v(v1),v(v3))+costClass.cost(v(v2),v(v4))){ if (costFunc(v(v1),v(v2))+costFunc(v(v3),v(v4)) > costFunc(v(v1),v(v3))+costFunc(v(v2),v(v4))){

The net result of my findings is a bug in scala-trac, the reluctance of assigning functions to val, and the reluctance of making the switch to SCALA. I am waiting to see what the SCALA compiler team does with the bug. From some notes added to the bug it seems that if the cost function has parameters that were primitives it would have performed better. This is somewhat limiting to me, and I would in general not use this feature at all if the performance is lagging.

Here is the bug in scala-trac: https://issues.scala-lang.org/browse/SI-4920

Here is the source code for my test program: FuncReferenceTest.scala

Share this: LinkedIn

Reddit

Facebook

Twitter

Email

Like this: Like Loading...

This entry was posted on Thursday, August 18th, 2011 at 2:38 pm and is filed under java, performance, scala. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.