[Previous Chapter: Intro]

Context

or

Why These Rules Matter

It is important to understand the context in which these best practices are necessary. In particular, it is critical to understand that mobile devices have severe limitations that are completely different from the world of desktop and server computing. Moreover, failure to take these constraints into account when developing applications can lead to poor performance and memory consumption not only for any given application, but also for the entire device, because many apps with similar performance problems contribute together to create a poorly-performing device overall.

Here are some of the important constraints, limitations, and realities of past, current, and foreseeable-future mobile devices:

Memory

Memory is very limited on mobile devices. While this is not true across all devices, it is certainly true on a large portion of the mobile ecosystem. This is a particularly important fact to remember when the devices that we, the application developers, use are probably significantly faster/better/newer than the majority of devices that users either own now or will own in the future. For example, while typical reasonable devices that we might own, such as the Nexus 5, have 1GB — 2GB of memory, 512MB is a very common memory configuration both in the U.S. as well as in emerging markets where low-end phones are prevalent. So judging your app’s memory requirements based on systems with 2GB or more simply isn’t realistic.

It is also crucial to remember that Android runs multiple activities and services in parallel. This dynamic is critical to creating a good user experience as users switch between recent apps, because these apps don’t have to re-launch from scratch. But this means that if any of these apps consume more memory than they need to, then there will be less system memory left over for the others. When that happens, the system will evict app processes (shutting them down), forcing the user into a situation where apps are constantly re-launching when the user switches to them because they cannot stay present in the background due to memory pressure.

So overall: use as little memory as you can, because the entire system suffers if you don’t.

CPU

First of all, it’s important to state the obvious: even the highest-end mobile CPUs are significantly slower than average CPUs on desktops, and orders of magnitude slower than the CPUs in some servers (especially when you consider the constraint of typically only having one or two cores on mobile devices, compared to the massive parallelization available in cloud computing).

But even when you realize that you’re dealing with inherently slower CPUs in the mobile space, you also need to consider that most of your users are probably using slower devices than you (a typical software developer) own. The same advice applies to memory as it does to CPU performance: low-end devices abound in the world and are being sold all the time. So don’t benchmark your app’s performance on your relatively new and decent Nexus device, because chances are great that most of your users will have devices with slower processors and smaller memory configurations.

Another problem with CPU power is that even if a user’s device has a decent CPU, that processor is not always running at its maximum speed potential. That is, the system will throttle the CPU down whenever it can to make sure that the battery lasts as long as possible and that the system does not overheat. In general, this happens when it is not going to be noticeable by the user, like when the screen is off, or there is no user input happening, or there are no animations running. This down-clocking has two implications for applications: (1) your app may run in some situations with down-clocked CPUs, so that you are only getting a fraction of the possible speed of even a decent CPU, and (2) your app can cause the system to avoid down-clocking the CPU by doing things that are triggers for leaving the CPU processing speed maximized (for example, frequent, constant, or infinite animations should be avoided because the system will always try to run at maximum power during animations to allow for jank-free animations). Obviously, you don’t want to trade off this higher CPU rate and usage for lowered battery life, so avoid doing things when you don’t need to.

GPU

GPU performance advice is similar to the CPU advice, above. But there are some additional factors to be aware of:

Uploads can be expensive: Uploading large textures (bitmaps) to the GPU can be quite expensive on any system, and the larger the bitmap, the longer the operation will take. This means that constantly thrashing bitmap-dependent graphics operations such as, well, bitmaps, paths (which are rasterized into bitmaps), and large amounts of new/different text (an issue with some non-English languages with large character sets) can cause performance problems because of the large amount of textures being copied to the GPU.

Fill rate hasn’t kept pace with raw GPU performance: Often, the problem we face with GPU performance isn’t the raw performance of the hardware for drawing geometry, or even textures, but rather the sheer number of pixels to be filled on high-density devices. These devices with high-resolution screens cause a performance problem because the hardware cannot fill that many pixels within a single frame of animation. This problem is known as overdraw, and is caused by applications redrawing the same area in their application many times due to overlapping content such as window background, container backgrounds, and translucent views.

Memory == Performance

Many of the practices discussed in this guide are about more optimal memory usage. But it is important to note that memory is closely tied to runtime performance as well as other related things like battery life. This is because the more you allocate, the more the device has to do for your application. Memory allocations and collections require increased activity from the runtime. Larger heaps mean more memory for your application, thus less memory for the overall device, which leads to other activities being told to reduce their memory consumption or being killed outright, which also leads to slower overall perceived performance on the device as the user navigates between different activities that then have to be restarted. Larger heaps also lead to much longer GC pauses since a larger heap takes longer to traverse for both allocations and collections. And all of this takes more battery because the more the device has to do, the more the limited battery is used and drained.

So while it is tempting to break down the best practices in this document in terms of things that can be done to address either memory or performance concerns, the truth is that they are all closely related and should be considered techniques for writing a good Android app overall.

Low-End Devices

As mentioned in some of the sections above, part of the problem with performance is that the device you use to develop your app on is probably significantly more powerful and modern than most of the devices that your app will run on. This is due to a combination of users still having older devices as well as low-end devices still being sold. This is an important point: the problem is not that users just need to, and eventually will get a more modern phone, but rather that many of the “modern” phones being sold today are using older/cheaper/slower hardware because manufacturers can sell these as cheaper devices. The common way to think about Moore’s Law is to imagine that we’ll all have more powerful devices in the near future. But Moore’s Law also means that it is increasingly easy to make cheaper devices, not just faster devices. Building for the traditional Moore’s Law model of more powerful systems means cutting yourself out of the huge market of cheaper, less powerful devices. So while 2GB may be common in current mainstream devices in 2015, there are still many devices being sold with 512MB out in the real world, especially in emerging market countries.

An easy way to fix this problem, at least for the application you develop, is to get ahold of the cheapest device available and use that as your main development device.

Smooth Frame Rate

The jank-free experience we shoot for on Android is sub-16 millisecond (ms) frame times. That is, applications need to be able to process input, layout, draw, and do anything else required to display the next frame on the screen within that 16 ms window. This speed allows the system to render at 60 frames per second (fps) during animations and input gestures. Animations must be able to repaint the necessary parts of the screen at 60 fps to achieve perceptibly smooth motion. Less than this rate is detectable by the user as slow, or jerky animation. What is even more problematic is an application that can usually render in less than 16 ms, but which drops a frame now and then because it cannot consistently hit that speed; this kind of inconsistency hiccup is very noticeable to the user.

Hitting a smooth frame rate means any particular frame needs to execute all rendering code (including the framework processing the rendering commands to send them to the GPU and the GPU drawing them into the buffer which is then shown on the screen) within 16 ms. This is why what may seem like relatively small performance problems like losing 5 ms to a garbage collection event can be hugely significant because it severely limits the amount of time left in a frame to do the actual rendering to avoid jank. And the closer the application is to the 16ms boundary, the easier it is to hit this jank when events like GCs kick in.

Note that missing the 16 ms barrier doesn’t mean that the app can hit 17ms and thus achieve a slightly smaller rate like 59 fps. Buffers can only be posted to the screen at 1/60th of a second intervals. So if your app misses the window for one frame, it will wait until the next. So if it takes your app 17 ms to render a frame, this will be visible to the user as taking twice as long, or only hitting 30 fps for that frame. This causes double-jeopardy jank for that time period, where not only did the animation pause for that skipped frame, but also some of the content showed up late because it took so long to get there, causing a discontinuity of motion.

Runtime

There are two runtimes on Android to be aware of: Dalvik and ART. Prior to the Lollipop release, Android used the Dalvik runtime. Although ART was available as a developer option for testing purposes in KitKat, it became the only runtime in Lollipop.

Dalvik is a Just-in-Time (JIT) compiler which is able to perform some micro-optimizations, but not nearly as many as some other JIT compilers. ART is an Ahead-of-Time compiler which is able to optimize more aggressively than Dalvik. However, neither of these runtimes offer the level of optimizations that are common to server and desktop runtime platforms, such as method-inlining and escape analysis. There is some inlining that ART performs on leaf methods, but further optimizations will only be available in future releases when the new optimizing compiler is available. Since app developers will need to support older releases of Android for some time to come, they should continue to care about the constraints of the current and previous compilers.

In general, ART performance over Dalvik is improved by anywhere from 30–200+% through several mechanisms. The compiler in ART performs additional optimizations, for example, improving interface dispatch substantially. The scope of optimization is a bit larger for ART (it has a method compiler vs. a trace-based one), and allocations are much faster. Finally, mutator/application threads contend less with garbage collection and are paused less frequently for shorter periods of time.

Garbage Collection

Garbage Collection (GC) is the process by which the runtime frees memory for objects which are no longer referenced. GC can also be the source of significant performance problems if the amount of work that the GC has to do exceeds the time that it has to do that work while allowing the application to hit a smooth frame rate.

The garbage collectors in Dalvik and ART differ substantially. One major difference is that Dalvik is not a moving collector. This means that all allocated objects will stay at the same location in the heap, which makes it harder and more time-consuming for Dalvik to find free memory for new objects, especially as the heap is more populated and fragmented over time and whenever there are large objects for which it must find room. Heap fragmentation can also lead to more frequent GC pauses as Dalvik attempts to clear the heap from dead objects. These GC pauses are quite expensive and can easily take 10–20ms on fast devices under normal circumstances. It’s important to note that the duration of garbage collections is proportional to the number of objects in the heap, which is another reason to avoid allocating objects when possible.

ART brought improved garbage collection dynamics. For one thing, ART is a moving collector; it is able to compact the heap when a long pause in the application won’t impact user experience (for example, when the app is in the background and is not playing audio). Also, there is a separate heap for large objects like bitmaps, making it faster to find memory for these large objects without wading through the potentially fragmented regular heap. Pauses in ART are regularly in the realm of 2–3ms.

Although ART has vastly improved garbage collection performance over Dalvik, it is still a concern when writing Android apps because even the small 2–3 ms (much less the longer pauses than can come from more extreme situations) can be enough to take a frame over the 16ms boundary that leads to jank during rendering or animations. So while garbage collection is not as expensive on Lollipop and later, it is still something to be avoided when possible, particularly during situations like animations, where a missed frame will be noticeable to the user.

UI Thread

Many of the performance/jank problems that we have seen come from the problem of doing too much on the UI thread. Android is a single-threaded UI system, where all operations that happen on views, including drawing those views, happen on the single UI thread in the activity. Anything else that happens on this same thread while the view hierarchy is trying to draw can cause jank because it simply doesn’t have time to do it all in the limited 16 ms it has to achieve a smooth framerate.

In the Lollipop release, the framework introduced the “Render Thread,” which is now responsible for sending the actual rendering operations to the GPU. This thread offloads some of the processing off of the UI thread. But input, scrolling, and animations still happen on the UI thread, so that thread must still be responsive, even on this more recent release.

Storage

Storage metrics differ on the wide variety of Android devices, but it can be slow and limited. An application that takes up 500MB might be considered small in a desktop environment, but on a device that has only 8GB (which is common for low to medium level devices in 2015), or even 4GB with a removable SD card, to store the entire OS, all applications, and all media downloaded by the user, it’s a significant chunk. In these situations, an app this large may cause the user to evict other content to make room for it, or to uninstall it because there is simply not enough room for it.

Storage performance is also a concern. It may be tempting to think that flash memory on a mobile device behaves like a desktop SSD, but these media devices can have drastically different performance.

Also, note that external SD card memory may have significant variability in I/O performance depending on the vendor, chipset, and speed class. But apps should not prevent users from using SD cards as it defeats the memory expansion that is necessary on low-end devices that come with little built-in storage.

Network

It is easy for many software developers in cities with modern infrastructure to make assumptions about the speed and capabilities of wifi and carrier networks that do not apply in many parts of the world. Rather than the LTE or 4G data that many of us may expect, or faster wifi speeds from pervasive access points at home and work, much of the world commonly operates on 2G networks and may incur heavy data charges for data transactions. This leads to one of the two common problems that apps have:

Reliance on fast network speeds: Apps that are heavily dependent on large media sources (video, audio, images) may have no choice when those objects are requested. But if there are parts of the app experience that can avoid downloads until specifically requested, that creates a better experience when network capabilities aren’t up to the downloads.

Over-syncing: Just because your app wants up-to-date information, that doesn’t mean that the user needs it, or, more importantly, that doesn’t mean that the device needs to suffer the experience of all apps interacting with the network constantly. This dynamic can easily make the device work too hard as well as prevent it from going to sleep, which leads to an overall horrible battery life experience.

Every Device is a Village

It is easy to view your application as the most important application that the user will run on their device. And, in fact, it might just be that. However, the user will run other applications, including the System UI and their Home/Launcher app, even if it’s just to get to your application. The more resources that your application uses (memory, CPU, GPU, and battery), the more the rest of the applications on the device will suffer, and the worse that device will appear to the user. And the larger your process is, the more the system will be tempted to evict it from memory when it needs extra resources, which means that your application will take longer to launch when the user returns to it, making your app suffer in terms of user experience and satisfaction.

So while being a good citizen on an Android device is just a Good Thing to Do, it’s also the pragmatic thing for every application to do, because the entire device (and its user) suffers if all applications are greedy.

Tragedy of the Commons

One of the biggest problems in mobile is that of the Tragedy of the Commons. Specifically, apps will act in the interest of their specific situation and will not, by themselves, kill the performance or experience of the device. And profiling or analyzing that specific app may not flag issues as being major problems that need to be fixed by the application. But the overall effect on the device, or the platform overall, is that all of the apps are taking pieces of limited resources (e.g., CPU, memory, bandwidth), resulting in an overall user experience that suffers.

A great example of this dynamic is seen in applications syncing too often. It might make sense for your specific application to query the server over the network at some specific frequency. But if the user has >100 applications that are all doing this, the net result is a device which will never sleep and which will run out of battery much sooner than if all of the applications depended on a lazy, batch-oriented sync system.

[Next Chapter — The Rules: Memory]