In my last post, I was looking at improving the performance of my planet-rendering code. One of the problems I was having was that while thing ran pretty quickly on my desktop (580ms to render the "inferno" planet, which is one of the more complex ones) I didn't really bother to try running in on an actual device until much later.

Boy, was that a mistake! To make sure I wasn't doing anything really stupid, I made a quick-n-dirty Android app that just lets me quickly test the planet rendering code. You can see it in action below on a couple of different planet types (click for larger version):

Now, keep in mind these are running my Galaxy Nexus -- not a slow device by any stretch of the imagination. But check out the rendering times! While I was able to render these images in a couple of hundred milliseconds on my desktop, they were taking almost two orders of magnitude longer on the device! Over 6 seconds for the "inferno"? Crazy!

Obviously, that's completely unacceptable. What's the problem? It turns out, I rather glibly rejected the reason yesterday when I was doing my initial performance tweaks:

Every time we call setPixelColour or do some interpolation between colours or whatever, it's generating a new piece of garbage. What if we just make the Colour class a bunch of static methods that work directly on int "argb" values? Well I tried that, and it turns out it makes no difference whatsoever.

Of course, what I failed to realise is that it makes no difference on the desktop JVM. I should've known better and actually tried it on a device first.

Always Measure

The first question: how do I know it's because of garbage collection? Mostly it's because every time I generate an image, LogCat would spew out a bunch of lines like this:

dalvikvm(6316): GC_CONCURRENT freed 431K, 7% free 12898K/13831K, paused 1ms+2ms dalvikvm(6316): GC_CONCURRENT freed 381K, 7% free 12928K/13831K, paused 1ms+2ms dalvikvm(6316): GC_CONCURRENT freed 433K, 7% free 12936K/13831K, paused 1ms+3ms dalvikvm(6316): GC_CONCURRENT freed 121K, 5% free 13211K/13831K, paused 1ms+2ms

It was spewing out around 10-15 of these per image. That's 10-15 times the garbage collector would run to generate one image. The next thing I tried was running the app under ddms , which has some pretty neat heap-tracking features. I could look at all the objects allocated and see where the real problems were.

At the end of the day, though, I guess I didn't know unless I tried. I had some time today, so I decided have a go at reducing the garbage I made, and see what kind of difference it makes.

A quick look at the output from ddms shows I was allocating tens of thousands of Colour , Vector2 and Vector3 objects per image. If I could reduce the number of allocations, I reckon the garbage collections would drop and performance would greatly improve.

Avoid creating unnecessary objects

The most simple change was made to some of my interfaces that were unnecessarily creating objects. For example, I changed this:

public Vector3 normalized() { double s = 1.0 / length(); return new Vector3(x * s, y * s, z * s); }

To this:

public void normalize() { double s = 1.0 / length(); scale(s); }

This doesn't allocate a new Vector3 object, I just need to be careful where I don't want the original object to be modified to create a copy first (doing that means I can more easily manage the lifetime of that copy, though).

Another one that you see come up quite a bit is this kind of pattern:

List<Vector3> points = // some list of Vector3 objects for (Vector3 v : points) { // do stuff }

This causes an allocation of an ArrayListIterator object every time you execute that loop. If you're looping multiple times per pixel in your image, then that's quite a bit of garbage. The solution is to use a "traditional" loop:

List<Vector3> points = // some list of Vector3 objects int numPoints = points.size(); for (int i = 0; i < numPoints; i++) { Vector3 v = points.get(i); // do stuff }

Another place I was allocating unnecessary objects was in my Vector2.hashCode() implementation, which looked like this:

@Override public int hashCode() { return new Double(x).hashCode() ^ new Double(y).hashCode(); }

This one took a bit of thought, but I eventually figured out the solution by looking at the Android source code for Double.hashCode() and just implementing it directly:

@Override public int hashCode() { long lx = Double.doubleToRawLongBits(x); long ly = Double.doubleToRawLongBits(y); return (int) (lx ^ ly); }

Object Pooling - not just for database connections

The canonical example of when to use object pooling is database connections. Expensive to create, easy to re-use. Most advice you'll find on object pooling will specifically tell you that it's not worth it for small objects that are used only for a short time - the garbage collector is designed for that sort of usage anyway.

But in memory-constrained devices like Android, where garbage collections are much more frequent (desktop JVMs are much less memory-constrained and can let garbage build up until a lot more - interfering with the running of your program less) avoiding any kind of allocation at all is important.

So here's the (very simple) ObjectPool class that I came up with:

public class ObjectPool<T extends ObjectPool.Pooled> { private PooledCreator mCreator; private Pooled[] mObjects; private int mFreeIndex; public ObjectPool(int poolSize, PooledCreator creator) { mCreator = creator; mObjects = new Pooled[poolSize]; mFreeIndex = -1; } public T borrow() { if (mFreeIndex < 0) { return (T) mCreator.create(); } return (T) mObjects[mFreeIndex--]; } public void release(Pooled obj) { if (obj == null) { return; } if (mFreeIndex >= mObjects.length - 1) { return; } mObjects[++mFreeIndex] = obj; } public interface Pooled { } public interface PooledCreator { Pooled create(); } }

We allocate a fixed-size array at startup, then, when you request a new object, it'll check if one is available from the "pool" and if not it'll just allocate a new one. When you "release" an object, it'll go back on the free pool (unless there's no room left in the pool).

The nice thing about this is that it simply falls back to letting the garbage collector handle the object if you forget to release it, or you fill up the pool.

The class is not thread-safe, but I plan to generate all planet images on a single worker thread anyway. Adding thread-safety is not a huge task, but we're going for performance here after all.

Adding this to my Vector2 , Vector3 and Colour classes is easy:

public class Vector3 implements ObjectPool.Pooled { public static ObjectPool<Vector3> pool = new ObjectPool<Vector3>(250, new Vector3Creator()); public double x; public double y; public double z; private Vector3() { } . . . } Vector3 v = Vector3.pool.borrow(); // do stuff... Vector3.pool.release(v);

The number of changes throughout my code was actually not all that much. The main difficulty came from ensuring that every time I got an object out of the pool, I made sure I returned it (as I mentioned, not doing so doesn't actually cause a problem, you just defeat the purpose of pooling). The other thing that caused me a bit of trouble was making sure I didn't release an object to the pool more than once (because that would cause all sorts of problems when it came time to remove it and you found it was affecting random other instances of the class somewhere else!)

The ddms tool came in handy here, because I could look at every place an object was allocated but not freed.

The Results

And the verdict? I think the results speak for themselves:

The performance now is actually comparable to the performance I was getting on the desktop! A 4½ times speed-up!

It's still not perfect. There are a few more places I could reduce the generated garbage, but this was very big bang for very little buck, so I'm quite happy with the overall result.