Update: Note that this is about the old implementation and no longer applies.

There’s been a thread on Scala debate in the last day or two about breaking the current array implementation. It’s in most regards a very tedious thread, but one thing that’s emerged in the course of the thread is that people don’t really understand how Scala arrays work. This isn’t surprising – they’re a bit confusing. It looks like they must be horribly inefficient, but actually they’re only mildly inefficient and much of the time the compiler can remove even that and treat them as normal Java arrays.

This post is basically a “What’s going on here?” post. I’ll explain some of the basics of how Scala arrays work and what you can expect in terms of performance. Warning: This post will contain JVM bytecode. If that scares you… well, close your eyes while reading or something. :-)

There are essentially two distinct cases here. The case where the arrays are polymorphic (depend on a type parameter we don’t know the class of. e.g. an Array[T]) and the case where they’re monomorphic (depend on a fully realised type parameter. e.g. an Array[Int] or an Array[String]. I’m also going to count things like Array[List[T]], where the class is known even though the actual type depends on a parameter. This is technically wrong, but works for the purpose of this discussion).

The issue here is that Scala supports polymorphic arrays where the type parameter ranges across both object and value types. This is interesting, because the JVM doesn’t. Consequently it comes with some overhead in the implementation. Occasionally this is a significant performance hit.

Ok. So, Scala is boxing arrays to make them appear polymorphic. It must be generating horribly inefficient code to do so, right? Let’s take a look to see what happens:

Consider the following method:

def foo = { val bar = new Array[Int](3); bar(0) = 1; bar; }

So, we’re expecting something involving boxing ints to and from objects. Let’s use javap to look at the generated bytecode.

public int[] foo(); Code: 0: iconst_3 1: newarray int 3: astore_1 4: aload_1 5: iconst_0 6: iconst_1 7: iastore 8: aload_1 9: areturn



Oh, um. So, not much boxing huh?If you can’t read bytecode, what this is doing is basically exactly what a Java compiler using the corresponding code for int[] would generate. It’s using the correct instructions for creating and storing things into a primitive int array.

Object arrays behave similarly:

def foo = { val bar = new Array[String](3); bar(0) = "fish"; } public java.lang.String[] foo(); Code: 0: iconst_3 1: anewarray #14; //class java/lang/String 4: astore_1 5: aload_1 6: iconst_0 7: ldc #16; //String fish 9: aastore 10: aload_1 11: areturn

This correctly generates a String[] as you’d expect.

When you have a primitive array, Scala generates pretty efficient bytecode for it – it just uses a native Java primitive array. Similarly when you have an object array of a known class type.

Note that it’s specifically the classtype that has to be known. The actual type is of no importance.Here’s another example of it doing the right thing:

def foo[T] = { val bar = new Array[List[T]](3); bar(0) = Nil; bar } public scala.List[] foo(); Code: 0: iconst_3 1: anewarray #14; //class scala/List 4: astore_1 5: aload_1 6: iconst_0 7: getstatic #20; //Field scala/Nil$.MODULE$:Lscala/Nil$; 10: aastore 11: aload_1 12: areturn

Multidimensional arrays behave similarly:

def foo = { val bar = new Array[Array[Int]](10, 10); bar(0)(3) = 1; bar; } public int[][] foo(); Code: 0: ldc #13; //int 10 2: ldc #13; //int 10 4: multianewarray #15, 2; //class "[[I" 8: astore_1 9: aload_1 10: iconst_0 11: aaload 12: iconst_3 13: iconst_1 14: iastore 15: aload_1 16: areturn

So far, nothing very interesting to see here. It works exactly like you’d expect it to. Where the magic starts to happen is when you have an Array[T].

So, what happens if we use one of these things polymorphically?

def foo = { val baz = new Array[Int](3); bar(baz, 3); } def bar[T](ts : Array[T], t : T) = ts(0) = t;

Clearly we need to do something. In Java this would just work by using an Object[] under the scenes, but for Scala we can’t do that as we need to be able to pass it an array of primitives.

So what we do is we box the array. We create a standard wrapper which can take both a primitive array and an object array and present an array like API for both. This is in fact the thing you’re seeing when you see the API for scala.Array – all the methods like map, flatMap, projection, etc. belong to the BoxedArray wrapper class (or one of its superclasses).

Let’s see how this looks at the bytecode level:

public void bar(scala.runtime.BoxedArray, java.lang.Object); Code: 0: aload_1 1: iconst_0 2: aload_2 3: invokevirtual #18; //Method scala/runtime/BoxedArray.update:(ILjava/lang/Object;)V 6: return public void foo(); Code: 0: iconst_3 1: newarray int 3: astore_1 4: aload_0 5: new #28; //class scala/runtime/BoxedIntArray 8: dup 9: aload_1 10: invokespecial #31; //Method scala/runtime/BoxedIntArray."":([I)V 13: iconst_3 14: invokestatic #37; //Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 17: invokevirtual #41; //Method bar:(Lscala/runtime/BoxedArray;Ljava/lang/Object;)V 20: return

So, the method bar just generates the expected code using the BoxedArray class. On calling this code, we first create an int[] as normal, we box it with a BoxedIntArray (an implementation of BoxedArray for wrapping an int[]) and call the method as expected. There’s a bit of boxing overhead, but it’s not too bad. The main point is that at the backend we are still using a primitive array for storage.

Ok. This is all very well, but what happens if we need to create a polymorphic array? How can I do new Array[T](3) if I don’t know what T is so don’t know whether it needs to be a primitive or an object array how can I create this?

The key observation here is that as long as we are polymorphic in the array type we must box anyway. So you never have to expose a native Java array until a point where you know what the actual type is. Until that point we use a custom boxed array implementation called BoxedAnyArray. What happens here is that it starts out using an Object[] until it knows what type it is. At the point where we request it as a Java native array we now have a concrete array type we want it as. Consequently we create an array of that type at that point, populate it from the Object[] and change the BoxedAnyArray in place to now be backed by that array instead of the Object[] it originally had. The Object[] is now discarded.

Because the BoxedAnyArray’s backing store is now the newly created (possibly primitive) array, any updates to and from the boxed version will now point to this new array. Hence the native array and the boxed array will remain in synch, even though this isn’t the array it started out with. Further, and this is important, after this point we have a native array in our hands and can treat it accordingly. Anything that was using the BoxedAnyArray continues to do so, but there is no “taint” associated with our newly created array and in particular no additional overhead.

Unfortunately this trick doesn’t come for free. BoxedAnyArray is kindof a slow beast (I think some of this can be fixed with an additional level of indirection actually, but it currently isn’t). It does a lot of checking in the apply and update methods and, additionally, they’re synchronized. While you continue to use BoxedAnyArray your performance will suffer much more than the other BoxedArray implementations.

Let’s read some more bytecode:

def newarray[T] = new Array[T](3); def foo = { val x = newarray[Int] x(0) = 3; x } public scala.runtime.BoxedArray newarray(); Code: 0: new #14; //class scala/runtime/BoxedAnyArray 3: dup 4: iconst_3 5: invokespecial #17; //Method scala/runtime/BoxedAnyArray."":(I)V 8: areturn public int[] foo(); Code: 0: getstatic #25; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$; 3: aload_0 4: invokevirtual #29; //Method newarray:()Lscala/runtime/BoxedArray; 7: getstatic #35; //Field java/lang/Integer.TYPE:Ljava/lang/Class; 10: invokevirtual #39; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object; 13: astore_2 14: aload_2 15: instanceof #41; //class scala/runtime/BoxedArray 18: ifeq 37 21: getstatic #25; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$; 24: aload_2 25: checkcast #41; //class scala/runtime/BoxedArray 28: getstatic #35; //Field java/lang/Integer.TYPE:Ljava/lang/Class; 31: invokevirtual #39; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object; 34: goto 38 37: aload_2 38: checkcast #43; //class "[I" 41: astore_1 42: aload_1 43: iconst_0 44: iconst_3 45: iastore 46: aload_1 47: areturn

It’s generating slightly redundant bytecode here, in that it’s testing to see if the local variable is a BoxedArray or an int[] before unboxing it despite the fact that it’s always a BoxedArray, but that’s not a big deal. The key point here is that we had a BoxedAnyArray and we end up with an int[].

Here’s an example combining the two instances of polymorphism we’ve seen so far:

def foo = { val x = newarray[Int] bar(x, 3); } public void foo(); Code: 0: getstatic #39; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$; 3: aload_0 4: invokevirtual #43; //Method newarray:()Lscala/runtime/BoxedArray; 7: getstatic #49; //Field java/lang/Integer.TYPE:Ljava/lang/Class; 10: invokevirtual #53; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object; 13: astore_2 14: aload_2 15: instanceof #21; //class scala/runtime/BoxedArray 18: ifeq 37 21: getstatic #39; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$; 24: aload_2 25: checkcast #21; //class scala/runtime/BoxedArray 28: getstatic #49; //Field java/lang/Integer.TYPE:Ljava/lang/Class; 31: invokevirtual #53; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object; 34: goto 38 37: aload_2 38: checkcast #55; //class "[I" 41: astore_1 42: aload_0 43: new #57; //class scala/runtime/BoxedIntArray 46: dup 47: aload_1 48: invokespecial #60; //Method scala/runtime/BoxedIntArray."":([I)V 51: iconst_3 52: invokestatic #66; //Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 55: invokevirtual #68; //Method bar:(Lscala/runtime/BoxedArray;Ljava/lang/Object;)V 58: return

Note how the BoxedAnyArray gets unboxed and reboxed into a BoxedIntArray before being passed anywhere. This is a good thing – BoxedIntArray is a much lighter weight thing.

Summary:

When used monomorphically, Scala arrays are Java arrays.

When used polymorphically, Scala arrays are scala.runtime.BoxedArray instances. There is a separate subclass of BoxedArray for each primitive type and one for object. When a monomorphic method receives a BoxedArray it will usually unbox it rather than call its apply and update methods.

There is also BoxedAnyArray. This has a fairly high overhead and is used *only* for creating polymorphic arrays. The compiler will get rid of it at the earliest possible opportunity and rebox as other forms.

So, what does this mean in terms of performance?

Firstly, if you use arrays monomorphically, they will be Java arrays and have no additional overhead associated with them (you’ll need to box them for invoking methods on Array like map, projection, etc, but this isn’t too bad). So if you want really high performance code, only use monomorphic arrays.

At the next level, you can still get reasonably high performance code if you use polymorphic arrays but only create them monomorphically. You’ll get a boxing overhead for primitives, but that’s true of any polymorphic code in Scala.

Slowest of all is when you create an array polymorphically and continue to use it polymorphically. Once you’re in monomorphic code it will regain normal performance characteristics and future uses of it will behave as if it had been instantiated monomorphically (but previously stored references to it will continue to pass through a BoxedAnyArray and have the associated overhead).

One final point of warning. Consider the following code:

class Foo[T]{ private[this] val arr = new Array[T](2); def apply(i : Int) = arr(i); def update(i : Int, t : T) = arr(i) = t; }

Note that the Array never escapes this. So there’s no way to unbox it!

Looking at the bytecode confirms:

public class arrays.Foo extends java.lang.Object implements scala.ScalaObject{ public arrays.Foo(); Code: 0: aload_0 1: invokespecial #12; //Method java/lang/Object."":()V 4: aload_0 5: new #14; //class scala/runtime/BoxedAnyArray 8: dup 9: iconst_2 10: invokespecial #17; //Method scala/runtime/BoxedAnyArray."":(I)V 13: putfield #21; //Field arr:Lscala/runtime/BoxedArray; 16: return public void update(int, java.lang.Object); Code: 0: aload_0 1: getfield #21; //Field arr:Lscala/runtime/BoxedArray; 4: iload_1 5: aload_2 6: invokevirtual #27; //Method scala/runtime/BoxedArray.update:(ILjava/lang/Object;)V 9: return public java.lang.Object apply(int); Code: 0: aload_0 1: getfield #21; //Field arr:Lscala/runtime/BoxedArray; 4: iload_1 5: invokevirtual #38; //Method scala/runtime/BoxedArray.apply:(I)Ljava/lang/Object; 8: areturn }

The array never gets unboxed and consequently continues to use the higher overhead BoxedAnyArray forever. If you instead use an Array[AnyRef] here and cast your code will run noticably faster (which was the observation which sparked this whole discussion)