By now you've probably heard about ART and how it will improve the speed and performance of Android, but how does it actually perform today? The new Android Runtime promises to cut out a substantial amount of overhead by losing the baggage imposed by Dalvik, which sounds great, but it's still far from mature and hasn't been seriously optimized yet. I took to running a battery of benchmarks against it to find out if the new runtime could really deliver on these high expectations. ART is definitely showing some promise, but I have to warn you that you probably won't be impressed with the results you'll see here today.

Reality Check

Let's be honest, benchmarking apps tend to be inaccurate and unreliable, often giving wildly varying results even when run in precisely identical situations. However, they are the only option available for recording meaningful and measurable values on performance. Further, since most popular benchmarks are built on the NDK (Native Development Kit), they won't gain any benefit from running under ART. Despite these limitations, there are some interesting and unexpected results that help us learn a little more about the current state of performance.

How The Benchmarks Were Run

Each benchmark was run at least 4 times on a completely stock Nexus 5 (it isn't even rooted) with both Dalvik and ART. To ensure there was no interference from apps at startup, a minimum of 5 minutes was given after a reboot before any tests were run. In addition to the 6 benchmarking apps listed below, I also tried 2 browser benchmarks (SunSpider & BrowserMark) in Chrome, but neither displayed significantly different scores. So, let's get to the results.

Linpack for Android

One of the key factors in getting good test results is knowing that the tools are measuring the right thing. While many of the benchmark apps target the NDK, a few stick to the SDK. The first and most consistent among them is Linpack for Android, a port of the already popular benchmarking app used throughout numerous computing platforms. It produces a score by performing a series of calculations on floating point numbers. I think this is an obvious choice after reading the description, "This test is more a reflection of the state of the Android Dalvik Virtual Machine than of the floating point performance of the underlying processor." Thanks to ART, scores are 10%-14% higher than they would be with Dalvik. Not too shabby… (link)

Real Pi Benchmark

Calculating digits of Pi is another popular way of stressing a processor, and particularly suitable because most methods stick to integer calculations and avoid floating-point math entirely. Along with Linpack, this gives us coverage of both basic mathematical operations. On top of it, Real Pi happens to use native code to perform the AGM+FFT formula, but uses Java for Machin's formula. On the native side, ART came out about 3.5% faster, probably due to interface optimizations rather than mathematical performance. More importantly, testing with the java code turned out to be 12% faster. (link) Note: in this test, lower numbers are better.

Quadrant Standard

The previous tests are highly specific to mathematical performance, so it's time to branch out to test more of the system. Both Linpack and Real Pi show some positive improvement with ART, but Quadrant gave a result that borders on the amazing, perhaps even too good. The CPU score is off the charts for ART, almost doubling that of Dalvik, which is substantially better than even the most optimistic estimates we've heard so far... While tests for I/O, 2D, and 3D rendering show fairly negligible differences, Dalvik does take an oddly high 9% advantage in the memory test. (link)

3D Mark

I was leery of using a benchmarking app that clearly focuses on the NDK, as it theoretically shouldn't be affected very much by ART. However, as the tests were run, an interesting pattern emerged where the Dalvik runtime repeatedly held a slight advantage. It's difficult to attribute a reason for Dalvik to do better here, but I'm open to theories. (link)

AnTuTu Benchmark

Breaking performance down even further, AnTuTu helps to expose a pattern. It's increasingly clear that ART is making significant strides with floating-point operations, but doesn't usually turn out huge gains for integers. A strong showing in "RAM Operation" also hints at better use of caching as opposed to just raw memory I/O. These high scores indicate areas where the Dalvik virtual machine was probably very expensive, causing more extensive overhead. The other results weren't particularly remarkable except for the Storage I/O, which might suggest a couple of specific optimizations. One significantly low score appears for UX Dalvik, but it's not clear what AnTuTu is measuring, so this may not be particularly relevant. (link)

CF-Bench

For the ultimate in number production, Chainfire's own benchmark tool takes out a lot of the guesswork by performing tests built on both the SDK and NDK. Again, native code displays a small but curious advantage on Dalvik. Here we can see the integer calculations are swinging back towards Dalvik, as well. Mostly confirming the pattern, floating-point operations demonstrate a significant speed gain, this time in the 23%-33% range. (link)

Other Interesting Measurements

Measuring the first boot after switching runtimes isn't your typical test, no doubt, but the time it takes is quite striking. I wanted to record just how long it took to complete both the App Optimization step and then the total time to actually reach the unlock screen. When I ran this test, I had 149 apps installed.

The Other Stuff

While numbers can be helpful, they don't tell the full story. Benchmarks usually push the hardware to work as hard as possible for a few seconds, then switch to a new test that does the same thing. Sadly, this ignores details that aren't easily measured. I don't have a good way to measure the smarter timing of memory management (especially garbage collection) or better handling of multiple threads. While I can't show numbers for these things, I can demonstrate them. The classic test for a browser simply requires flinging the page as fast as possible and watching it try to keep up. After stress testing Chrome for Android with the mobile version of David's gigantic HTC One review, it turns out that even the supercharged SoC of the Nexus 5 can't quite keep up while running on Dalvik… ART, on the other hand, never lost a pixel. Take a look for yourselves.



Left: running on Dalvik, right: running on ART

To be fair, switching to the desktop version and giving a single fling will easily send you into blank screen territory, but it's still obvious that the renderer catches up faster on ART than on Dalvik. When more optimizations are in place, maybe we won't be far off from flawless scrolling even in the desktop version. For another demonstration, a user by the name of spogbiper has posted his own side-by-side comparison with two Nexus 7s. The one running ART seems to be more responsive.

https://www.youtube.com/watch?v=9CfU_-bBJ50 https://www.youtube.com/watch?v=xgnZbdO4NV0

Summary And Conclusions

The numbers and the videos together paint a picture of where ART stands today. It will definitely make a difference, but its current incarnation just hasn't matured enough to deliver significant gains. Floating-point calculations and basic responsiveness are obviously reaping the benefits of the new runtime, but that's about it. There's little or no overall improvement for integer calculations, most regular code execution, or much of anything else. In fact, it looks like gamers would be better served by sticking to Dalvik, for now.

Why aren't the benchmarks blowing us away? If I were to make a guess, it's probably because the first goal in developing ART was to make sure it was functional and stable before the heavy optimizations came into effect. If that's the case, there is probably quite a bit of code for error-checking and logging just to ensure everything is operating as it should, which might even be responsible for more overhead than we had with Dalvik. Even in the places where ART doesn't outperform Dalvik, the numbers tend to remain reasonably close. As subsequent versions of the runtime emerge from Mountain View, we should expect to see the performance gap growing wider as ART pulls ahead.

Now for the real question: is it worth switching to ART right now? Google obviously isn't recommending it for regular users, and I tend to agree. While ART seems very solid and I feel like responsiveness is better - possibly just the placebo effect - there are still circumstances where it is unstable and causes apps to crash. If there is even a single instance where you have to switch back to Dalvik to get an app to run correctly, that inconvenience far outweighs the minimal performance gain you might have had. Once I've finished this series, I will probably stick to Dalvik for the remainder of KitKat; and I imagine most people will be better served by doing the same.