Anirudh Regidi

Update: Following our report, UL Benchmarks conducted their own investigation and delisted the Oppo Find X and Oppo F7 from its 3DMark benchmark list. This is a particularly heavy blow for the Find X since the phone was listed at number 4 in the 3DMark Slingshot rankings.

Would you ever trust a company that lies to your face, a company that takes advantage of your ignorance to fool you into believing a lie? Whatever your policy on such behaviour, after three weeks of testing, we can categorically state that Huawei (and sub-brand Honor) and Oppo (and sub-brand Realme) are cheating on benchmarks, and have been for some time. These fudged figures are being used to promote and sell their phones.

Huawei and Oppo are no strangers to controversy and have been pulled up several times in the past for precisely such antics. The two companies aren’t the first companies to have been caught cheating, both Samsung and OnePlus (unsurprisingly, OnePlus related to Oppo) have been caught doing the same thing. Unlike Huawei and Oppo, however, Samsung and OnePlus at least had enough respect for their respective communities to stop doing this.

It might seem like we’re making a big deal out of something insignificant. Who cares about benchmarks, right? But, the fact that these companies are willing to lie to us, so consistently and persistently, is symptomatic of a deeper problem within the company structure. If they’re lying about this, what else are they lying about? More importantly, what else will they lie about in the future?

Why we use benchmarks

This can be best explained with an example. Say you’re trying to decide between the Google Pixel 2 and the Samsung Galaxy Note 9. Features aside, how does one compare the two? The phones are literally poles apart.

The Pixel 2 is running on Qualcomm’s Snapdragon 835 chip with 4 GB RAM and 128 GB storage. The Note 9 is running Samsung’s Exynos 9810, 6 GB RAM and 128 GB storage. One has a 5-inch FHD+ display while the other boasts of a 6.4-inch QHD+ unit.

Does Google’s use of a smaller, lower-resolution display on the Pixel 2 mean it gets better battery life despite using a smaller battery? Is Samsung’s newer Exynos platform better than last year’s Snapdragon flagship?

If two different reviewers use the phones, their experiences could be very different. I, for example, spend most of my time listening to podcasts or reading books on the Kindle app. My colleague spends his time Instagramming and Tweeting. Our experiences in relation to performance and battery life will be very different. If you were to ask us about the phones, we’d have two different experiences to describe.

This is where benchmarks come in.

Before we get into that, there’s one thing you must note: Benchmarks have never been, and never will be, an indicator of real-world performance. A good benchmark will, however, always be an indicator of relative performance.

Take the Note 9 and Pixel 2 again. Looking at our benchmark sheet, I can see that the Note 9 scored 10 hrs 42 min in the PCMark 8 battery life 2.0 benchmark and 837.9 in the Car Chase (on-screen) module in GFXBench. In the same tests, the Pixel 2 scores 8 hrs 51 min and 1135.

Since our testing has shown that PCMark 8 and GFXBench are a reliable metric for battery life and gaming performance respectively, I can immediately expect the Note 9 to offer around 20 percent better battery life than the Pixel 2, but 35 percent worse gaming performance at the phone’s native resolution.

Neither of these tests is a guarantee that you’ll see 12 hrs of battery life or 60 fps in PUBG Mobile, but both results give me a data point for reference when making recommendations.

Most important of all, benchmarks are not meant for the average person. Unless one understands the capabilities and limitations of a benchmark, there is no point in referring to them. What does 4,000 on GeekBench 4 mean? Does it matter that an iPad Pro scores higher than a 12-inch MacBook? A score of 5,000 is better than 4,000, yes, but to most people, it’s just a number.

Reviewers care about these benchmarks because they help us better understand the limitations of a phone or device. It helps inform our reviews and helps us fine-tune our verdict, which in turn helps our readers make an intelligent buying decision.

However, I’ve been doing this for years. It’s my job to understand what benchmarks mean. You, as the average consumer, don’t need to know or care about benchmarks. It’s great if you do, but it shouldn’t be a necessity. You don't need a PhD in aerodynamics to enjoy motorsport.

Why cheat?

Unfortunately, certain unscrupulous smartphone makers are happy to take advantage of our ignorance to try and sell us their products. A phone scoring 250,000 in AnTuTu is better than one scoring 240,000, right? You’re not an engineer. You don’t understand the nuances of chipset design and OS integration. All you see is a number and are given to understand that “bigger is better.” Why wouldn’t you take the “faster” phone?

Companies like Huawei and Oppo, the ones who we know for sure are cheating on benchmarks, have advertised benchmark figures when trying to sell their phones. Oppo, for example, wants to convince us that its MediaTek P60-powered phones (like the F7) are better than its Snapdragon 660-powered counterparts. We now know that Oppo was cheating.

Huawei’s Kirin 970 which is powering its P20 Pro and Honor 10, performs half as well as Qualcomm’s Snapdragon 845 (present in the OnePlus 6, Poco F1). The Snapdragon 845 in the Oppo Find X actually performs much worse than the Snapdragon 845 in the OnePlus 6, but because Oppo is cheating, they appear to perform at par.

The Huawei P20 Pro and Oppo Find X are lovely phones. One set new standards for camera performance and the other is so lovely that it makes Apple’s $999 iPhones look dated. Their performance in real-world use was always more than adequate for the average user and was never a highlight of the product.

Neither company needed to cheat to sell these phones. They chose to do so anyway.

How exactly are they cheating?

Huawei and Oppo have both issued statements that they’re not targeting benchmarking apps per se. They say that they’re using ‘AI’ that intelligently scans programs and adjusts performance based on demand. This is demonstrably false.

These companies are simply targeting benchmark apps by name. If an app is called ‘AnTuTu’ or ‘3DMark’ or named after any other targeted benchmarking app, these phones automatically switch to an extreme performance mode. On the same note, if a benchmarking app is not called ‘AnTuTu’ or ‘3DMark’ or named after any other targeted benchmarking app, the phones do not switch to this extreme performance mode and perform normally.

If these phones were using ‘AI’ and intelligently managing performance, the name of the app would not matter. To top it off, the phones are being pushed so hard when cheating that performance is pushed to unsafe levels. In some cases, the phones were so hot that we couldn’t hold them. In any other situation, we’d have assumed that the phones were damaged and on the verge of blowing up.

Results: A nasty surprise

We were expecting Huawei/Honor and Oppo/Realme to be cheating on the tests, and that’s exactly what we observed. However, what we didn’t expect were the lengths to which these companies would go to cheat and the utter disregard that they would demonstrate for the safety of either the user or the device.

Let’s just say that three of the phones got so hot that they could, potentially, cause 1st-degree burns, and maybe even a battery explosion.

If you’re interested, you’ll find a description of our test process at the end of the article. For now, here’s a quick summary of the extent of cheating observed in the five devices that were found to be cheating:

Honor 10: 92.23% boost

Huawei P20 Pro: 76.43% boost

Realme 1: 45.63% boost

Realme 2: 60.86% boost

Oppo Find X: 36.82% boost

As you can see, Huawei/Honor are pushing their phones so hard that they’re almost doubling the performance figures. As we later discovered, this extreme performance comes at an extreme price.

Scores for other phones in the test were within 5 percent of their baseline scores. These include the Samsung Galaxy Note 9, OnePlus 6, Redmi Note 5 Pro, Poco F1 and the Nokia 7 Plus.

The second phase of testing, which only involves the cheating phones, was designed to stress the thermal limits of the devices. The TL; DR version is that phones get hot under load, we wanted to see how hot they got, especially since when cheating, the phones run hotter than they should.

We ran the same benchmarks in a 10-run loop and noted the temperature. In all cases, the phones ran significantly hotter when cheating.

How significant? How do 76 °C on the CPU and 60 °C on the battery sound? My overclocked PC at home doesn't hit 76 °C.

The Realme 1 and 2 didn't seem powerful enough to get too hot, but they did get unstable under load. The Realme 2 crashed several times and finally gave up on the benchmark.

The more egregious of the cheaters, the Honor 10 and the Oppo Find X, were consistently running at over 60 °C and their batteries at well over 50 °C. I cannot underscore enough how dangerous this is.

At 45 °C, you can get first degree burns on your skin. At 50 °C, the glue that’s holding your phone together will start melting. This is also the upper limit for the safe operating temperature of a lithium-ion battery.

Why this matters

Huawei and Oppo, and anyone else who is cheating but not on this list, are both:

Lying about the performance of their products. Are either comfortable with the risk to customer’s devices, or worse, ignorant of such risks.

Cheating on benchmarks has nothing to do with benchmarks, it’s all about marketing and fooling the average customer into buying an underwhelming product.

Unless you’re Apple, phones are sold on the merit of their features: “Charge your battery to 100 percent in 10 minutes”, “50 percent faster AI processing”, “3x better gaming performance.”

We’re not naïve. We know that most of this is marketing mumbo-jumbo, but it can’t be denied that the marketing also has impact. How else did AI turn into a buzzword?

It’s perfectly understandable for someone to want the very best of something, and we’ll draw comparisons on any metric that we have available to us, whether we understand them or not.

To take advantage of this ignorance and attempt to scam us into buying something is a very underhanded way of doing business. To top it off, both Huawei and Oppo have attempted to pass their actions off as “AI” at work, and even spin it into a positive. This is, frankly, an insultingly facetious excuse.

Clearly, these companies have a very low opinion of their customer base. I think it's high time we treated them with the same disdain.

Editor's note: On 21 September, we submitted our findings and a portion of our data to Huawei, Honor, Oppo and Realme in expectation of a response. We are yet to hear back from either of the companies involved. All the raw data from the tests has been handed over to UL Benchmarks. They will now be conducting an independent investigation.

How we tested

First, we reached out to UL Benchmarks, the makers of the popular PCMark and 3DMark benchmarking tools. The company had recently reported on Huawei’s cheating on benchmarks (following an AnandTech report) and had already delisted all of Huawei’s phones.

UL Benchmarks was kind enough to hook us up with a private version of 3DMark for use in our tests. This private version of the app is identical to the Play Store version of 3DMark – which anyone can download for free – in every respect except in name.

Our plan was to test the performance of an assortment of phones using this private app and to determine which phone-makers were cheating on benchmarks.

In the end, we settled on a list of ten devices from seven different brands. This list is as follows:

Honor 10

Huawei P20 Pro

Nokia 7 Plus

OnePlus 6

Oppo Find X

Pocophone Poco F1

Realme 1

Realme 2

Redmi Note 5 Pro

Samsung Galaxy Note 9

We would have liked to include Vivo in this list as well, but we couldn’t get the private version of 3DMark to run on the phones and had to ignore the devices.

Prepping the phones: All phones were first wiped and reset to factory defaults. Once reset, we signed in to the Play Store and ran all updates. All OS updates available at the time were also installed.

We did not interfere with any bloatware that may or may not have been present on the phones since we wanted to test these phones as the manufacturer intended them to be used. All power management options were also left at defaults.

We then disabled all notifications and automatic updates and charged the phones to 100 percent.

The phones were then allowed to rest for a full hour so as to be given a chance to cool down to their idle temperatures.

We then installed only the private version of 3DMark and ran tests to determine the baseline performance of all the devices. All background apps were, of course, purged from memory.

Once we finished testing with the private app, I installed the Play Store version of the app and repeated the tests in precisely the same manner.

Performance testing: The first part of the test simply focussed on the raw performance of these phones. I picked the 3DMark Slingshot and Slingshot Extreme ES 3.1 modules for this bit of the testing. Slingshot and Slingshot Extreme are graphically intensive tests that simulate gaming performance.

Each test was conducted three times at intervals of 5 minutes. The average of three runs was recorded in a spreadsheet.

Thermal throttling: Smartphones can provide a burst of high performance for some seconds, possibly up to a minute or more. After that, their chipsets get too hot and performance is dialled back to a more sustainable level. Features like “liquid cooling” that the likes of Samsung and Xiaomi talk about only really help in extending the duration of this burst.

This is useful for day-to-day operation as tasks like taking video, opening a browser, etc. only need a few seconds of max power to seem responsive.

In more intensive tasks, such as gaming, it’s necessary to check for performance under sustained loads. To simulate such conditions, we ran the benchmarks in a loop for 10 runs. Unlike with the performance tests, there was no gap between each run.

We ran two sets of tests, one with the private app and one with the public app. Of course, the phone was charged to 100 percent and allowed to cool down to its idle temperature before switching benchmarks.

Temperature and performance scores were noted for the duration of the runs. Battery temperature was observed but not noted unless there was something unusual.

This test was only conducted on phones that were found to be cheating because we wanted to determine how long they could sustain their extreme performance levels.

Analysis: Once we had all the data, we entered them into a spreadsheet and started our analysis. Primarily, we were concerned with comparisons with the baseline performance as measured with the private app. Wherever there was a doubt, we re-ran the tests to verify.

In total, we conducted an average of 70 tests per cheating phone and about 20 tests each on the ones that weren't cheating.

We would have loved to share the raw data with everyone, but UL Benchmarks requested that we keep the name of the private app, well, private. The raw data would reveal the name of the app.

As mentioned earlier, all the raw data has been sent to UL Benchmarks for analysis.