We have had multi-core processors in our PCs for over a decade, and today they are considered the norm. At first it was dual-core, then quad-core, and today companies like Intel and AMD offer high end desktop processors with 6 or even 8 cores. Smartphone processors have a similar history. Dual-core energy efficient processors from ARM arrived about 5 years ago, and since then we have seen the release of ARM based 4, 6 and 8 core processors. However there is one big difference between the 6 and 8 core desktop processors from Intel and AMD and the 6, and 8 core processors based on the ARM architecture – most ARM based processors with more than 4 cores use at least two different core designs.

While there are some exceptions, in general an 8 core ARM based processor uses a system known as Heterogeneous Multi-Processing (HMP) which means that not all the cores are equal (hence Heterogeneous). In a modern 64-bit processor this would mean that a cluster of Cortex-A57 or Cortex-A72 cores would be used in conjunction with a cluster of Cortex-A53 cores. The A72 is a high performance core, while the A53 has greater energy efficiency. This arrangement is known as big.LITTLE where big processor cores (Cortex-A72) are combined with LITTLE processor cores (Cortex-A53). This is very different to the 6 or 8 core desktop processors that we see from Intel and AMD, as on the desktop power consumption isn’t as critical as it is on mobile.

The key thing to remember is that an octa-core big.LITTLE processor has eight cores for power efficiency, not for performance.

When multi-core processors first came to the desktop, a lot of questions were raised about the benefits of a dual-core processor over a single core processor. Was a dual-core 1.6GHz processor “better” than a 3.2GHz single core processor, and so on. What about Windows? Could it utilize a dual-core processor to its maximum potential. What about games – aren’t they better on single-core processors? Don’t applications need to be written in a special way to use the extra cores? And so on.

Multi-processing primer

These are legitimate questions, and of course the same questions have been asked about multi-core processors in smartphones. Before we look at the question of multi-core processors and Android apps, let’s take a step back and look at multi-core technology in general.

Computers are very good a doing one thing. You want to calculate the first 100 million prime numbers? No problem, a computer can loop round and round all day crunching those numbers. But the moment you want a computer to do two things at once, like calculating those primes while running a GUI so you can also browse the web, then suddenly everything becomes a bit more difficult.

I don’t want to go too deep here, but basically there is a technique known as preemptive multi-tasking which allows the available CPU time to be split among multiple tasks. A “slice” of CPU time will be given to one task (a process) and then a slice to the next process, and so on. At the heart of operating systems like Linux, Windows, OS X, and Android is a bit of technology called a scheduler. Its job is to work out which process should receive the next slice of CPU time.

Schedulers can be written in different ways, on a server the scheduler might be tuned to give priority to tasks performing I/O (like writing to the disk, or reading from the network), whereas on a desktop the scheduler will be more concerned with keeping the GUI responsive.

When there is more than one core available the scheduler can give one process a slice of time on CPU0, while another process gets a slice of run-time on CPU1. This way a dual-core processor, together with the scheduler, can allow two things to happen at once. If you then add more cores, then more processes can run simultaneously.

You will have noticed that the scheduler is good at slicing up the CPU resources between different tasks like calculating primes, running the desktop, and using a web browser. However a single process like calculating primes can’t be split across multiple cores. Or can it?

Some tasks are sequential by nature. To make a cake you need to crack some eggs, add some flour, make the cake mix etc, and then at the end put it into the oven. You can’t put the cake tin into the oven until the cake mix is ready. So even if you had two chefs in a kitchen you can’t necessarily save time on one task. There are steps to be followed and the order can’t be broken. You can multi-task, in that while one chef is making the cake the other can prepare a salad, but tasks which have a predefined sequence can’t benefit from dual-core processors or even 12 core processors.

If you still hear people saying things like, 'but a smartphone doesn't need 8 cores' then just throw your hands up in despair.

However not all tasks are like that. Many operations that a computer performs can be split into independent tasks. To do this the main process can create another process and farm out some of the work to it. For example, if you are using an algorithm to find prime numbers, that doesn’t rely on previous results (i.e. not a Sieve of Eratosthenes), then you could split the work in two. One process could check the first 50 million numbers and the second process could check the second 50 million. If you have a quad-core processor then you could split the work into four, and so on.

But for that to work the program needs to be written in a special way. In other words the program needs to be designed to split the workload into smaller chunks rather than doing it in one lump. There are various programming techniques for doing this, and you might have heard expressions like “single-threaded” and “multi-threaded.” These terms broadly mean programs which are written with just one executing program (single-threaded, all lumped together) or with individual tasks (threads) which can be independently scheduled to get time on the CPU. In short, a single-threaded program won’t benefit from running on a multi-core processor, whereas a multi-threaded program will.

OK, we are almost there, just one more thing before we look at Android. Depending on how an operating system has been written, some actions that a program performs can be multi-threaded by nature. Often the different bits of an OS are themselves independent tasks and when your program performs some I/O or maybe draws something to the screen that action is actually carried out by another process on the system. By using of what is known as “non-blocking calls” it is possible to get a level of multi-threading into a program without actually specifically creating threads.

This is an important aspect for Android. One of the system level tasks in Android’s architecture is the SurfaceFlinger. It is a core part of the way Android sends graphics to the display. It is a separate task that needs to be scheduled and given a slice of CPU time. What this means is that certain graphic operations need another process to run before they are complete.

Android

Because of processes like the SurfaceFlinger, Android benefits from multi-core processors without a specific app actually being multi-threaded by design. Also because there are lots of things always happening in the background, like sync and widgets, then Android as a whole benefits from using a multi-core processor. As you would expect Android has the ability to create multi-threaded apps. For more information on this see the Processes and Threads section in the Android documentation. There is also some multi-threaded examples from Google, and Qualcomm have an interesting article on programming Android apps for multi-core processors.

However the question still remains, are the majority of Android apps single-threaded, and as such only use one CPU core? This is an important question because if the majority Android apps are single-threaded then you could have a smartphone with monster multi-core processor, but in reality it will perform the same as a dual-core processor!

In all my tests I did not see any real-world apps that used all 8 cores at 100%, and that is how it should be.

There seems to be some confusion about the difference between quad-core and octa-core processors. In the desktop and server world octa-core processors are built using the same core design replicated across the chip. However for the majority of ARM based octa-core processors there are high performance cores and core with better energy efficiency. The idea is that the more energy efficient cores are used for more menial tasks while the high performance cores are used for the heavy lifting. However it is also true that all the cores can be used simultaneously, like on a desktop processor.

The key thing to remember is that an octa-core big.LITTLE processor has eight cores for power efficiency, not for performance.

Testing

Android apps are able to take advantage of multi-core processors and big.LITTLE allows the scheduler to pick the best core combination for the current workload.

It is possible to get data from Android about how much it has used it core in the processor. For those who are technically minded, the information can be found in the /proc/stat file. I wrote a tool which grabs the per core usage information from Android while an app is running. To increase the efficiency, and lessen the performance hit of the monitoring, the data is only collect while the test app is active. The analysis of the collected data is done “off-line.”

Using this tool, which doesn’t have a name yet, I ran a series of different types of apps (gaming, web browsing etc) on a phone with a quad-core Qualcomm Snapdragon 801 processor and again on a phone with an octa-core Qualcomm Snapdragon 615 processor. I have collated the data from these test runs and with the help of Android Authority’s Robert Triggs, I have generated some graphs which show how the processor is being used.

Let’s start with an easy use case. Here is a graph of how the cores in the Snapdragon 801 are used when browsing the web using Chrome:

The graph shows how many cores are being used by Android and the web browser. It doesn’t show how much the core is being used (that comes in a moment) but it shows if the core is being utilized at all. If Chrome was single-threaded then you would expect to see one or two cores in use and maybe a blip up to 3 or 4 cores occasionally. However we don’t see that. What we see is the opposite, four cores are being used and it occasionally dips down to two. In the browsing test I didn’t spend time reading the pages that loaded, as that would have resulted in no CPU use. However I waited until the page was loaded and rendered, and then I moved on to the next page.

Here is a graph showing how much each core was utilized. This is an averaged-out graph (as the real one is a scary scrawl of lines). This means that the peak usages are shown as less. For example the peak on this graph is just over 90%, however the raw data shows that some of the cores hit 100% multiple times during the test run. However it still gives us a good representation of what was happening.

So what about an octa-core? Will it show the same pattern? As you can see from the graph below, no it doesn’t. Seven cores are consistently being used with the occasional spike to 8, and a few times when it dips to 6 and 4 cores.

Also the averaged core usage graph shows that the scheduler behaved quite differently since the Snapdragon 615 is a big.LITTLE processor.

You can see that there are two or three cores which run more than the others, however all the cores are being utilized in some way or another. What we are seeing is how the big.LITTLE architecture is able to swap threads from one core to another depending on the load. Remember the extra cores are here for energy efficiency, not performance.

It is a myth that Android apps only use one core.

However I think we can safely say that it is a myth that Android apps only use one core. Of course this is to be expected since Chrome is designed to be multi-threaded, on Android as well as on PCs.

Other apps

So that was Chrome, an app that is designed to be multi-threaded, what about other apps? I ran some tests on other apps and briefly this is what I discovered:

Gmail – On a quad-core phone the core usage was evenly split between 2 and 4 cores. However average core utilization never went above 50% which is to be expected as this is a relatively light app. On an octa-core processor the core usage bounced between 4 and 8 cores, but with a much lower average core utilization of less than 35%.

YouTube – On a quad-core phone only 2 cores were used, and on average at less than 50% utilization. On an octa-core phone YouTube mainly used 4 cores with the occasional spike to 6, and drop to 3. However the average core utilization was just 30%. Interestingly the scheduler heavily favored the big cores and the LITTLE cores were hardly used.

Riptide GP2 – On a phone with a quad-core Qualcomm processor this game used two cores most of the time with the other two cores doing very little. However on an phone with an octa-core processor, between six and seven cores where used consistently, however most of the work was done by just three of those cores.

Templerun 2 – This game probably exhibits the single-threaded problem more than the other apps I tested. On an octa-core phone the game used between 4 and 5 cores consistently and peaked at 7 cores. However really only one core was doing all the hard work. On a quad-core Qualcomm Snapdragon 801 phone, two cores shared the work fairly evenly, and two cores did very little. On a quad-core MediaTek phone all four cores shared the workload. This highlights how a different scheduler and different core designs can drastically alter the way the CPU is used.

Here is a selection of graphs for you to peruse. I have included a graph showing the octa-core phone idle, as a base reference:

One interesting app was AnTuTu. I ran the app on the octa-core phone and this is what I saw:

As you can see, the latter part of the test completely maxes out all the CPU cores. It is clear that the benchmark is artificially creating a high workload, and since nearly all the cores are running at full speed then SoCs with more cores will score better for that part of the test. I never saw this kind of workload on any normal apps.

In one way it is the benchmarks which are artificially inflating the performance benefits of octa-core phones (rather than the power efficiency advantages). For a more comprehensive look at benchmarking check out Beware of the benchmarks, how to know what to look for.

Why are light apps using 8 cores?

If you look at an app like Gmail you will notice and interesting phenomenon. On a quad-core phone the core usage was evenly split between 2 and 4 cores, but on an octa-core phone the app used between 4 and 8 cores. How come Gmail can run on 2 to 4 cores on a quad-core phone but needs at least four cores on an octa-core phone? That doesn’t make sense!

The key again is to remember that on big.LITTLE phones not all the cores are equal. What we are actually seeing is how the scheduler is using the LITTLE cores then as the workload increases the big core are brought into play. For a while there is a small amount of crossover and then the LITTLE cores go to sleep. Then when the workload decreases the opposite happens. Of course this is all happening very fast, thousands of times per second. Look at this graph which shows the utilization of big vs LITTLE cores during my testing of Epic Citadel:

Notice how at first the big cores are being used and the LITTLE cores are inactive. Then, at around the 12 second mark, the big cores start to be used less and the LITTLE cores spring to life. At the 20 second mark the big cores increase their activity again and the LITTLE cores go back down to almost zero usage. You can see this again at the 30 second mark, the 45 second mark, and at the 52 second mark.

At these points the number of cores being used fluctuates. For example, in the first 10 seconds only 3 or 4 cores are being used (big cores), and then at the 12 second mark the core usage peaks at 6 and then drops again to 4, and so on.

This is big.LITTLE in action. A big.LITTLE processor isn’t designed like the octa-core processors for PCs. The extra cores allows the scheduler to pick the right core for the right job. In all my tests I did not see any real-world apps that used all 8 cores at 100%, and that is how it should be.

Caveats and wrap-up

The first thing to underline is that these tests do not benchmark the performance of the phones. My testing only shows if Android apps run across multiple cores. The advantages or disadvantages of running over multiple core, or running on a big.LITTLE SoC, are not covered. Neither are the benefits or drawbacks of running parts of an app on two cores at 25% utilization, rather than on one core at 50%, and so on.

Secondly, I haven’t yet had a chance to run these tests on a Cortex-A53/Cortex-A57 setup or a Cortex-A53/Cortex-A72 setup. The Qualcomm Snapdragon 615 has a quad-core 1.7GHz ARM Cortex A53 cluster and a quad-core 1.0GHz A53 cluster.

Thirdly, the scan interval for these statistics is around one third of a second (i.e. around 330 milliseconds). If a core reports its usage is 25% in that 300 milliseconds and another core reports its usage is 25% then the graphs will show both cores running simultaneously at 25%. However it is possible that the first core ran at 25% utilization for 150 milliseconds and then the second core ran at 25% utilization for 150 milliseconds. This means that the cores were used consecutively and not simultaneously. At the moment my test setup doesn’t allow me any greater resolution.

But having said all that. Clearly Android apps are able to take advantage of multi-core processors and big.LITTLE allows the scheduler to pick the best core combination for the current workload. If you still hear people saying things like “but a smartphone doesn’t need 8 cores” then just throw your hands-up in despair, as it means they don’t understand Heterogeneous Multi-Processing and they don’t understand that big.LITTLE is about power efficiency and not overall performance.

Chrome - Active cores on a quad-core phone.

Chrome - Core usage on quad-core phone.

Chrome - Number of cores in use on octa-core phone.

Chrome - Core usage on octa-core phone.

AnTuTu running on an octa-core phone.