Scala: String Interpolation Performance

Performance comparison between different kinds of string concatenation/formatting in Java/Scala

UPD: Here is the Pull Request to scala-compiler with changes inspired by this post.

String concatenation is a basic building block in every modern programming language. Many different projects, especially in Web, produce a lot of strings. And it’s interesting, how this problem is solved in different languages. In our case, in Java and Scala (because it’s much easier to compare, and these I use day-to-day).

There are a lot of posts on Internet about this topic. But still, there are something interesting to say about it (I hope you won’t be bored). Yes, this is about performance and microbenchmarking, I know.

Basically, this post is about how to make robust string concatenation (with macros!) and performance comparison. You may skip this big part right to Graphs and Conclusion.

What is string concatenation?

In JVM String object is immutable. If you need to create a new String with a content of two other strings, you need to create a third one. In Java you may concatenate not only string but any objects — Java compiler will convert an object to its string representation automatically. Scala compiler also does it.

What will compiler do, when we write such code in Java:

int a = 1;

String b = "s";

String str = a + b;

This code will be converted to this code:

int a = 1;

String b = "s";

String str = new StringBuilder()

.append(a)

.append(b)

.toString();

Nothing new, it’s well-known thing. But, it’s important to understand, what is happening there:

new StringBuilder creates a class with a char array of 16 elements. append method converts an argument to chars and stores into an internal array. If an array is not big enough, it will create a bigger array. toString creates a String object, but char array is not shared between StringBuilder and this new String object (because it’s mutable in StringBuilder).

So, in order to concatenate 2 objects, we need to create an intermediate object, big enough to store the string representation of our 2 objects.

How to improve default concatenation

We have 2 heavy operations from the list above: allocation of a buffer in StringBuilder (and reallocations!) and a creation of String (making copy of StringBuilder’s buffer).

Let’s put some efforts to deal with it. Because I had a lot of experience in .NET, I know, how string concatenation is implemented there.

What’s about C#.NET?

In C# the same problem solved in a slightly different way. Actually, many problems in .NET solved slightly differently.

This concatenation will be converted by the compiler to call to static method Concat (yes, in .NET they use CamelCase instead of camelCase in Java, yet another little difference):

String str = String.Concat(a, b);

And inside Concat we may see, how it works:

public static String Concat(String str0, String str1) {

if (IsNullOrEmpty(str0)) {

if (IsNullOrEmpty(str1)) {

return String.Empty;

}

return str1;

} if (IsNullOrEmpty(str1)) {

return str0;

} int str0Length = str0.Length;

String result = FastAllocateString(str0.Length + str1.Length);

FillStringChecked(result, 0, str0);

FillStringChecked(result, str0Length, str1);

return result;

}

What is interesting here is resulting string length precomputation. They create a char buffer of resulting size, fill it and use in String object without copying. Obviously, this should work much faster (and it’s).

Let’s try to do something similar in Scala.

Optimized string concatenation

So, what do we need is precompute a StringBuilder length to avoid any reallocations.

def concat(o1: Object, o2: Object, o3: Object,

o4: Object, o5: Object, o6: Object): String = {

concatNonNull(orNull(o1), orNull(o2), orNull(o3),

orNull(o4), orNull(o5), orNull(o6))

} private def orNull(o: Object): String =

if (o == null) "null" else o.toString private def concatNonNull(s1: String, s2: String,

s3: String, s4: String,

s5: String, s6: String): String = {

new StringBuilder(s1.length + s2.length + s3.length + s4.length + s5.length + s6.length)

.append(s1)

.append(s2)

.append(s3)

.append(s4)

.append(s5)

.append(s6)

.toString

}

This is a version for 6 arguments (we may generate as many as we want). Precalculation of a StringBuilder buffer size should give us some performance boost (we will see it later).

Fast StringBuilder.toString

Let’s take a look at StringBuilder.toString implementation:

@Override

public String toString() {

// Create a copy, don't share the array

return new String(value, 0, count);

}

And in String’s constructor we see:

public String(char value[], int offset, int count) {

// ... validations omitted

this.value = Arrays.copyOfRange(value, offset, offset+count);

}

Aha! Arrays.copyOfRange creates a new char[] and copy data there. We know for sure, that char array will be filled up and we can use it directly (we’re good ppl, we won’t modify it after string creation). In String class, there is a constructor, that don’t do copy, but… it’s package-private. Meaning, we can’t use it. Or we can?

We may try to use fast reflection (MethodHandle) to access to private constructor! Looks pretty easy:

public static String fastNewString(char[] chars) throws Throwable {

Constructor<String> constructor = String.class

.getDeclaredConstructor(char[].class, boolean.class);

constructor.setAccessible(true);

// it should be cached

MethodHandle newString = MethodHandles.lookup()

.unreflectConstructor(constructor);

return (String) newString.invokeExact(chars, true);

}

Spoiler: it doesn’t work! Details below.

That’s it?

So, that’s it, no more optimizations? Of course not! What else to we do in concatenation? The toString method calls for each argument. This also could be optimized. How?

Imagine, you pass an integer there, which is primitive. In order to pass it, you need to box it and then we call Integer.toString method. What’s wrong with toString? It allocates new memory. StringBuilder has an optimization for appending primitive values, it calculates how many chars needed to print an integer (in order to reallocate internal buffer, if necessary) and then uses internal method Integer.getChars, passing internal char array there. So, there are no additional allocations for printing integer.

Having such generic method, we can’t utilize StringBuilder power to not do unnecessary allocations. But, we’re in Scala world, we may use macroses! The very good explanation is in this StackOverflow topic.

Scala String Interpolation

But before I move to macros implementation of string concatenation, let’s take a look at Scala’s way of string formatting which is called String Interpolation:

val value1 = 1

val value2 = "abc" s"start${value1}middle${value2}end"

This is an equivalent of Java of Scala simple string concatenation, but looks much better:

"start" + value1 + "middle" + value2 + "end"

Scala compiler transforms such interpolation to this code:

new StringContext("start", "middle", "end").s(value1, value2)

Oh no. Again, more allocations? Yes. At least 2: one for an array of string constants and another for an array of arguments. And also, inside method s, there is a processing of string constants (yes, on each call). Doesn’t look as highly optimal.

Macros to rule them all

We may create a macros, that will generate another code for string interpolation. We still can use this neat syntax, but do whatever we want to concatenate these string constants and arguments.

For example, we can precalculate length of StringBuilder, and then append arguments there and be sure, that there won’t be any other allocation.

The code of macros is relatively small (around 100 lines), but it’s hard to demonstrate it well. I will explain what it does. This code:

def value1: Int

def value2: Object sfi"a${value1}b${value2}c"

Will be transformed to this code:

val local_2 = {

val tmp = value2

if (tmp eq null) "null" else tmp.toString

}

new java.lang.StringBuilder(12 + local_2.length)

.append("a")

.append(value1)

.append("b")

.append(local_2)

.append("c")

.toString

Important parts of generated code:

sfi stands for “super fast interpolation” :) Precalculated length constant — 12 — consists of 3 characters for strings “a”, “b” and “c” and 9 characters allocated for int (value1). It may be not enough (for bigger int), but this is a simplification for now. We don’t box integer and don’t call toString, but using append method overload for int. It should be faster.

Obviously, macros could be much richer — we can calculate exactly length of integer (and all primitive values) to not over/under allocate buffer. We may determine whether it’s safe to not create local variables (if it’s already local variable or field) to reduce bytecode size.

But, for testing purposes, I think it’s good enough.

Let’s microbenchmark it!

Basically, I will test this case:

public static String concat(int value1, String value2,

Object nullObject) {

return value1 + "a" + value2 + "b" + value2 + nullObject;

}

With various input arguments, resulting string will be of 7, 17, 29, 75, 212, 1004 and 1006 characters long, which should cover a lot of use cases.

What to test

Besides various input arguments there will be benchmarks for different kinds of concatenation/formatting (all listed in this test):

Java concatenation, as in code fragment above;

String.format method;

java.text.MessageFormat;

slf4j message formatter (just because it’s a popular logging framework);

Scala concatenation;

s string interpolator;

f string interpolator;

raw string interpolator;

optimized concatenation 1 (with length precalculated);

optimized concatenation 2 (without copying of char array);

optimized concatenation via string interpolation + macros (soInterpolator);

super fast interpolation via macros (sfiInterpolator).

Charts and tables

As you can see, optimized concatenation works better on strings longer than 16 characters. String interpolations, based on macros (so and sfi), are the best. Scala’s interpolations (s, f and raw) are slow, its performance lower than regular string concatenation and approximately the same as slf4j.

The more convenient chart is here.