Testing Performance of Mobile Apps - Part 1: How Fast Can Angry Birds Run?

Newman Yang, XBOSoft

Introduction

Mobile computing is the next step in the evolution of computing where mobile devices become the client device rather than a PC. With new development technologies and platforms, the cost of entry in building a software application, especially a mobile application, is much lower than it was 5 or even 2 years ago. This can be coupled with SaaS based business models that shift user expectations and increase their demands for usability and performance as they can easily download someone else's application for a free trial if they are not happy with yours. So with more competition and more picky users, quality is a main concern for both new entrants to the market, and those that have existing users and are giving access to their software via a mobile device.

For mobile quality assurance and testing, one of the most critical concerns is mobile user experience, and a primary component of the user experience relies on the performance of the application. Evaluating and testing the performance of a mobile application is not as straight forward as evaluating and testing the performance of traditional web-based solutions as there are several other variables such as application structure (browser versus native), network used (2G, 3G, 4G, etc.), payload structure, etc. When we do mobile performance testing, we like to systematically decompose the tests as follows:

Client application performance:

This system component has two variables; browser versus native application, coupled with the device's own hardware and software configuration. For the device configuration, we consider the normal variations of model, processor, and memory, and its usage of those resources when executing typical user scenarios. Similar to comparing a Client-Server application versus a Browser-server application, for mobile we consider a native application still as a client if it needs remote access to a server application. Some native applications, for example a dictionary, or a solitary card game are totally stand alone. But these days, many native applications reside on the mobile device and still communicate readily with a server application. The native application is sometimes used for better presentation, security and more flexible configuration as opposed to a browser based application.

Mobile browser-based application performance is usually heavily dependent on network and server application performance. The performance is usually slower as a result and leads to a reduced user experience. In addition, some browsers may have higher performance than others as there is no standard. Your server application also needs to be able to recognize the device/browser combination in order to render properly.

Network performance

The application may behave differently on different networks as network protocols impact throughput and delays. Often our clients want us to test on different networks and in different countries because carriers sometimes place overhead on data transmission and network latency can vary. Latency is dependent on the application in how efficient it is in its transmission method-algorithms, and the amount of data transmitted as well (often referred to as payload). For a native mobile application, the user's perception of performance can be improved depending on how much of the application and its data resides on the local device versus the server application. In a way, we have gone backwards to the client-server application paradigm where we want to store more data on the device to reduce the network delay while at the same time the local computing device has limited capacity. This is changing with newer 4G LTE networks and the most recent devices with dual processors and large amounts of RAM, but 4G still has limited coverage, and the latest devices can be over $600 depending on what subsidies you get through carrier service packages.

Server performance

Whether a browser based application or a native application, there will still be many computing cycles and data transactions server-side, whether cloud-based or web-server based with various infrastructure options. Examining the server performance is similar to measuring website or webapp performance where we need to decompose our analysis into the components of the server that are providing the services including the database, application server, and associated hardware. Each of these components has many variables that can result in numerous permutations. In addition, each permutation involves interaction between its components which could significantly impact performance yet is sometimes unpredictable and dependent on the situation and context. Given this, there are many performance testing tools that try to account for these variables, but no tool can solve all parts of the equation, and most tools are specialized towards a specific platform and testing aspect, i.e. iPhone native app performance.

Mobile Performance Testing for Local Applications

Given the complex nature of mobile application performance testing, in this article we'll address mobile client application performance for a native application (versus browser based, see above). We decided to test a local native application because native applications primarily use local resources and the results will be more dependable and controlled whereas browser based applications can be heavily dependent on the server side application and network. We'll discuss some of the tools used for local device and application performance testing, laying out the considerations made, and the results along the way. This article is the first of a series. Future articles will discuss mobile performance testing from the network and server application perspectives.

As with any testing, we need to have a clear objective, structured approach, and parameters defined before we get started. At minimum, we need the following:

Parameter Possible Values Objective Discover performance based on different hardware/software configurations with the same application for a defined task. Discover which platform meets a set performance objective with the least cost. Determine which platform uses the least resources. Type of application Game Ecommerce Banking Information Point of view Gamer Shopper Developer Tester Buyer Defined task Buy a product Look up a ticker symbol Obtain a bank balance, transfer money amongst accounts Kill a man with spear Start an application Download one page of information Start up the game

Table 1. Determining Test Objectives and Parameters

When we examine our goals, we can develop a more definitive objective statement such as:

For this test, our objective from the point of view as a gamer is to determine, for the game of Angry Birds (a local mobile native application for the Android platform), the impact on CPU and memory configuration, of the typical tasks of: starting the game and the 2d/3d movement characteristics of throwing the bird a far distance as shown in Figure 1.

Figure 1. Angry Birds

Based on these objectives, we begin to examine tools and what they can do for us in meeting our goals. We need this type of contextual information for our tests because certain types of applications require different types of resources. For instance, resources that are critical to the performance of a gaming application with 3D graphics such as the CPU are different than what may be critical to a language dictionary, which may need more and quicker memory for a large database. So when we design our test, we need to think carefully about what we are trying to figure out, and how we will do that.

In testing, we always do verification, whether it is verification of data, defects, etc... And the same goes for local mobile device performance testing and the tools we use. Sometimes, we get different results from different tools. Hopefully, they are the same, or similar in order to make a conclusion about the performance of the device. But never the less, we need to verify. As such we often choose several tools; a minimum of two, but hopefully three, in order to verify our results. To illustrate some local performance testing on three identical phones, we chose Quadrant, Smartbench, and Linpack to see how they compared and analyzed the results in light of our objective in investigating performance with Angry Birds.

Using Quadrant

Quadrant Advanced (made by Aurora Softworks) is a popular free tool for testing mobile device performance for the Android platform.

Figure 2. Quadrant by Aurora Softworks

After running the tests, it will connect with the Quadrant database and generate a comparison table including your device and some popular devices that are updated by Aurora Softworks on a periodic basis. While the performance of the other devices can be used for comparison purposes, we usually like to test with the real device because the particular configuration of the phones in their database is not shown. As shown in Figure 2, the horizontal axis on the top shows the total score. Note that these scores are used for relative purposes and there is no indicator of good or bad performance. In other words, 1000 has no meaning other than if Phone 1 has performance of 1000 and Phone 2 has a performance rating of 500, then Phone 1 performs twice that of Phone 2. At the bottom of screen, it shows details for the current device as total score and score for each of the following performance characteristics:

CPU-Performance of device's Central Processing Unit. The higher the score the faster it can process data and instructions.

Mem- Similar to a computer's RAM and used for saving data as well as operational computations and tasks. Its performance and task are heavily associated with the CPU. If the score is too low, it will also cause the application slow response or even crash.

I/O-Input and output and represents speed for read and write to peripherals and bus operations.

2d and 3d -Indicates the processing speed for graphics. Most of popular games such as Angry Birds need strong 2D and 3D.

Obviously, different applications have different requirements in order to perform well. For example, some games need strong 3D graphics, while other applications that don't have strong presentation or motion requirements just need a suitable CPU and enough memory. On the other hand, they may need strong processing and a lot of memory management and swapping.

Quadrant is an excellent tool when you have limited devices and you want to compare the performance of a few real devices to several others which you do not have. For instance, if you are a developer and want to investigate if your financial stock trading application for Android has performance issues on some popular devices and you do not enough time and resources to test on many devices. The developer can only executed the application on three devices that are on hand (Lenovo A1, HTC Desire HD, Motorola ME 860) but wants to see how the performance on other popular devices compares with the three devices tested. Again as mentioned earlier, these are still just ball-park comparisons because the configuration of their list of compared devices is not provided.

For our tests, the three devices tested with their hardware, operating system and price (everyone is concerned with price and performance) is shown in table 2:

Lenovo A1 HTC Desire HD Motorola Me860 Android Version V2.3.4 V2.2 V2.2 Price $110 $380 $420 CPU Qualcomm MSM7627T 800MHz Qualcomm MSM8255 1024MHz Nvidia Tegra2 1024MHz Memory RAM 512MB RAM 768MB RAM 1024MB

Table 2. Price and Phone/OS tested

After executing Quadrant on all three devices, we had the following results (note that higher numbers indicate better/stronger performance):

Lenovo A1 HTC Desire HD Motorola Me860 Total Score 730 1446 2054 CPU 760 1963 2262 Memory 911 1094 2584 I/O 478 2602 2910 2D 357 565 546 3D 1145 1006 1967

Table 3. Results from Quadrant by Aurora Softworks

As seen in Table 3, the HTC Desire HD and Motorola ME 860 are much stronger than the Lenovo A1 in CPU, I/O and 2D. So we can presume that if the application has no performance issues on the Lenovo A1, it should also not have performance issues on the other devices.

Figure 3. Quadrant results for our 3 phones tested

It is also important to note the difference between the HTC and Motorola device. They are about the same, with the Motorola incrementally better except in memory and 3D where the Motorola device performs twice as well as the HTC device. So depending on the nature of the stock trading application and the typical user scenarios, we would guess that it would not matter very much since stock trading is not done in 3D and memory performance is not that important. However, for a 3D game, we suspect that the Motorola device would perform much better. Also notice that the Lenovo A1, for 3D, performed better than the HTC Desire HD, so although it lagged behind severely for other categories, it had good 3D performance. Something worth investigating further.

Another test we often conduct is to test the same device but with different operating system, i.e. Android 2.2 versus Android 2.3. In general when designing tests, we like to have many parameters but only vary one at a time, as in a controlled lab experiment. Otherwise it's too difficult to ascertain what changed the performance.

As a tester, Quadrant can help us to determine if performance issues are caused by the device or the application itself. For example, since the Motorola Me860 performs much better for 3D than both the HTC and Lenovo device according to Quadrant, we can use this information for our analysis of the performance-related play characteristics of 3D games for the 3 phones.

Scenario Lenovo A1 HTC Desire HD Motorola ME 860 1. Launch Angry Bird V2.1.1 19 seconds 14 seconds 10 seconds 2. Throw the bird to a far distance 8 seconds 6 seconds 5 seconds

Table 4. Results for launching Angry Birds on 3 phones

As seen in Table 4, we launched Angry Birds to verify the 3D results and determine if their magnitude was correct. The Lenovo device performed at about half the level of the Motorola in our device tests, but better than we expected when compared to the HTC. We also conducted a typical task of throwing the bird and it appears the Lenovo did quite well in this situation when compared to its Quadrant test scores.

Using Smartbench

Smartbench is another popular tool with similar capabilities as Quadrant. We don't like to assume, as mentioned above, so we always use more than one tool for relative validation of results with the same device. Smartbench only has two general performance measurements: productivity and games. Practically speaking, productivity is related to business applications that may involve calculations and data transfer and therefore use significant CPU and memory while games performance is related to graphics.

Using the same devices, we found that the results are very similar to those from Quadrant. The Lenovo A1 tested weaker than other two devices in both productivity and game performance.

Items Lenovo A1 HTC Desire HD Motorola ME 860 Productivity 383 1212 2732 Game 909 1045 2488

Table 5. Smartbench comparison table

Figure 4. Smartbench Performance Results

Examining the test results in Figure 4 and Table 5, it's easy to see that the Motorola device is highest performer in both Productivity and Games. But what is most interesting is the scale or percentage increase and that the Motorola device performed significantly better than the HTC device in both of these categories similar to the Memory and 3D performance characteristics in the Quadrant test results.

Using Linpack

We often use another tool to further verify results called Linpack which measures CPU floating point performance. Single Thread depicts the results of executing a single threaded program while Multi-Thread depicts results from executing a multi-threaded program, showing different CPU performance characteristics dependent on the type of application. If we only run simple applications that don't need multiple threads, such as composing email, the CPU processes it as single thread, but if composing email and listening to music at the same time or running a complex application that has simultaneous tasks, CPU will process them as multi-thread.

As seen in the results in Table 6, the most important item is MFLOPS which means Millions of Floating Point Operations per Second. Obviously, the more the MFLOPS it can execute, the better. 'Time' is the length of time to execute a certain program, where less is better. Condensing the results into table format, we get the following:

Lenovo A1 HTC Desire HD Motorola ME 860 Single Thread (MFLOPS) 9.914 39.01 38.938 Time 8.46 2.15 2.15 Multi Thread (MFLOPS) 8.554 32.511 51.961 Time 19.72 3.25 3.24

Table 6. Linpack comparison table

Figure 5. Linpack Performance Results

From the comparison table, we can see that for the single threaded test, the HTC Desire HD has similar performance as the Motorola ME 860, while the Lenovo A1 is much slower. However, for Multi-Threaded performance, the Motorola ME 860 is better than HTC Desire HD, 51.9 versus 32.5, almost 60% higher. Therefore, the Motorola ME 860 can process multi-threaded tasks much faster than the HTC Desire HD.

In real life what this means is that when running a complex application (such as Angry Birds) or executing many applications simultaneously, (such as doing email, SMS, running a game in the background and playing music) the Motorola ME 860 will be much faster than the HTC Desire HD. But for some simple applications, such as a 2D game, the Motorola ME 860 will be no faster than the HTC Desire HD. So this really gets back to our user scenarios and profiles under which we design our tests. We really need to think carefully about what the user will really do, not in a lab environment, but in real life. Will they do only one thing at a time? Probably not.

To examine and verify this, we decided to check if Angry Birds can run smoothly while playing music (playing one mp3) at the same time. We also recorded performance for deleting 100 SMS messages simultaneously.

Scenarios Lenovo A1 HTC Desire HD Motorola ME 860 Play Angry Bird V2.1.1 and play music at the same time Slight discontinuity but tolerable Continuous Continuous Play Angry Birds and delete 100 SMS simultaneously 4 seconds 3 seconds 2 seconds

Table 7. Executing real life scenarios

Besides the above example scenarios, we can also do different verification tests such as calling, playing other games, etc. More than likely, the results will tell us that the Lenovo A1's performance is worse than the other two devices. But, getting back to our price in Table 2, the Lenovo still can provide users with the ability to play Angry Birds while doing other basic phone functions. So, for those with a limited budget, the Lenovo can meet their needs.

Summary

In summary, when conducting mobile performance tests, you need to narrow down your scope and objective. Sometimes your objective may shift slightly as tests are executed and you adapt based on your results. In all cases, you should run the same tests on the same devices with different tools in order to validate your results. Variables in your tests can include:

Operating system

Device Model

CPU

RAM

When you get the results, you need to examine them carefully in order to uncover any inconsistencies, and then run more tests to verify and reason why these inconsistencies occurred and under what situations. You also need to cross-verify with multiple tools in order to weed out any inconsistent performance characteristics. Choice of the tool is not as important as the analysis and achieving of your objective. Of course if you have no objective, you'll most likely not achieve it.

Getting back to the bigger picture, the performance of the device and the application are tied together, and are specific to the user scenario and the user's goals. As we have shown, the Lenovo, although performs worse than the other two phones, can still get the job done for much less. If I am a developer, and I want to test my application on phones that I think have good market penetration, I may not be shooting for the greatest performance on the expensive phones. Rather, I want to make sure my application works acceptably on less expensive phones with more market share.

Aside from the local device, if you test your application locally, and discover that it has acceptable performance, but still performs poorly in a real situation, it usually can be attributed to either network or application server performance constraints. We'll cover each of these in a future article.

References and Further Reading

http://www.lostintechnology.com/topics/mobile/,

http://www.passmark.com/products/pt_mobile.htm

Mobile Load Testing

Testing Performance of Mobile Apps - Part 2: A Walk on the Wild Server Side

Testing Performance of Mobile Apps - Part 3: The Network

State of Mobile Testing 2013

Software Testing Knowledge

Software Testing Magazine

Software Testing Tutorials and Videos

Load Testing Tools and Performance Testing Software

Click here to view the complete list of archived articles

This article was originally published in the Fall 2012 issue of Methods & Tools