by Heiko Seeberger

The Scala collections, which are part of the standard library, are known for their vast amount of high-level functional operations like map, flatMap, filter, sliding or groupBy, just to name a handful. These not only allow for high developer productivity – just imagine implementing something like groupBy yourself every time you need it – but usually also give us reasonable or even excellent performance. This proves true in particular when dealing with concurrent programming, because the default collections in Scala are immutable and using immutable objects instead of synchronization or defensive copies results in increased performance at large.

Nevertheless there are some situations when we have to pay a substantial penalty for using these nice high-level and thread-safe collections. Luckily Scala is a multi-paradigm language geared to real-world applications and hence lets us pick the right tool among several for the job at hand: In these situations, when collections and functional programming don’t give us the performance we need, we can use arrays and imperative programming.

The Use Case: Calculate Hex-Code for Bytes

Let’s take a look at the following use case: Write a function that takes an array of bytes – which is represented as an Array[Byte] in Scala source code and as a native JVM array after compilation – and returns an array of UTF-8 characters representing the concatenated hex-codes of all the bytes.

To give an example: Array[Byte](0, 1, 15, 16) should be transformed to Array[Char](0, 0, 0, 1, 0, F, 1, 0) . As each byte corresponds to two hex characters, the resulting array has twice the size of the input.

Now you might ask how this is related to collections. Well, Scala allows us to treat an array as a collection of type scala.collection.Seq , i.e. like a sequence. Therefore we can apply all these high-level functional operations to arrays, e.g.:

scala > Array ( 1 , 2 , 3 ) . map ( _ + 1 ) res0 : Array [ Int ] = Array ( 2 , 3 , 4 ) scala> Array(1, 2, 3).map(_ + 1) res0: Array[Int] = Array(2, 3, 4)

Of course we could use an existing library, e.g. Apache Commons Codec, but maybe we don’t want to depend on an external library for such a simple task or we just want to have some fun and hack some Scala ourselves.

Implementation Design

Scala allows us to add extension methods to any existing type without subclassing, simply by defining an implicit class that wraps a value of the type to be extended. By using a value class via extending AnyVal the Scala compiler is able to avoid creating instances of the wrapper and instead inline everything, so there’s negligible runtime overhead.

As we want to be able to call toHex and/or toHexString on any byte array, our implementation looks like this:

implicit class ByteArrayOps ( val bytes : Array [ Byte ] ) extends AnyVal { def toHexString : String = new String ( toHex ) def toHex : Array [ Char ] = ??? } implicit class ByteArrayOps(val bytes: Array[Byte]) extends AnyVal { def toHexString: String = new String(toHex) def toHex: Array[Char] = ??? }

We are going to provide two implementations, one using a high-level and functional approach and one using imperative programming tuned for arrays.

Benchmarking

As we are interested in the performance of the different implementations, we obviously have to run some benchmarks. For the JVM, JMH is the de-facto standard for micro-benchmarks and since my former coworker Konrad Malawski has created sbt-jmh – an sbt-plugin for JMH – we can easily run JMH benchmarks from sbt, which is the build tool of our choice. All we have to do is add the following line to the plugins definition of our sbt project, which resides under project/plugins.sbt:

addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.2.6")

Functional Approach

As mentioned, we can treat an array as a sequence and hence we can use the aforementioned method flatMap to transform the given Array[Byte] :

def toHex : Array [ Char ] = bytes. flatMap { byte => val high = digits ( ( byte & 0xF0 ) >>> 4 ) // digits is Array('0', '1', ..., 'E', 'F') val low = digits ( byte & 0x0F ) Array ( high, low ) } def toHex: Array[Char] = bytes.flatMap { byte => val high = digits((byte & 0xF0) >>> 4) // digits is Array('0', '1', ..., 'E', 'F') val low = digits(byte & 0x0F) Array(high, low) }

For each byte from the given input we calculate the high and low hex character simply by using a suitable bit mask, some bit shifting when needed and a lookup of the appropriate hex character. Then we return a new array consisting of the two hex characters which is the reason why we have to use flatMap instead of map .

For anybody familiar with the Scala collections, this approach should look straightforward. In any case it should be obvious that this implementation creates a lot of intermediate arrays, one for each element of the given input. With a little pondering it should also become clear that flatMap cannot preallocate a single array for the return value, but instead has to create an intermediate result for each step. Hence we expect this approach to create a lot of intermediate arrays and involve a substantial amount of copying, two factors which might negatively impact performance. But let’s wait and see what the benchmarks tell us.

Imperative Approach

Now, instead of transforming the given input, we preallocate the result – which we can do because we know its size for this special case – and use a loop to index into the input and result:

def toHex : Array [ Char ] = { val hex = Array. ofDim [ Char ] ( bytes. length * 2 ) // 2 hex chars for each byte var n = 0 while ( n < bytes. length ) { hex ( n * 2 ) = digits ( ( bytes ( n ) & 0xF0 ) >>> 4 ) hex ( n * 2 + 1 ) = digits ( bytes ( n ) & 0x0F ) n + = 1 } hex } def toHex: Array[Char] = { val hex = Array.ofDim[Char](bytes.length * 2) // 2 hex chars for each byte var n = 0 while (n < bytes.length) { hex(n * 2) = digits((bytes(n) & 0xF0) >>> 4) hex(n * 2 + 1) = digits(bytes(n) & 0x0F) n += 1 } hex }

Of course this code is harder to understand, because instead of simply declaring what needs to be done it expresses in great detail how to compute the result. On the other hand we may expect better performance, because we don’t allocate any arrays except for the final result and perform all element access via index which is known to be very fast for arrays.

Benchmark Results

Using the sbt-jmh plugin we can run some benchmarks. For a byte array of size 1.024 we get the following results:

jmh:run -wi 10 -i 10 -f 2 -t 1 ... [info] Benchmark Mode Cnt Score Error Units [info] Benchmarks.benchmarkImperative thrpt 20 544482.362 ± 4692.282 ops/s [info] Benchmarks.benchmarkNaive thrpt 20 40273.748 ± 733.762 ops/s

Of course we all know that we have to be very careful when interpreting results of micro-benchmarks. Nevertheless these results clearly show that the imperative approach is about one order of magnitude faster than the functional one, which matches our earlier assumptions.

Conclusion

We have shown that in some situations an imperative approach using arrays can be much more performant than using the functional collection API. Of course this is not to promote imperative programming, but instead to show the flexibility of Scala and the freedom to pick the right tools.

The full source code is on GitHub. As always, comments are welcome.