When a new phone is announced, manufacturers often point to improvements like new processors and GPUs as things consumers desire. We want more powerful gadgets, but what does that really mean? What do we actually get with these generational improvements?

More gratifying benchmark scores don’t correlate with performance improvements for your favorite services.

This was a challenging aspect of my iPhone XS review. The newest iPhone completely demolishes my beloved iPhone SE in benchmarks. However, rendering a video in iMovie, the SE consistently ekes out a tiny victory.

This conversation isn’t unique to iPhones.

With more phones getting updates to Android Pie (just in time for Android Q to arrive, I’ll have to save that rant for another editorial) we can now compare different generations of phone hardware on a level OS playing field.

One of the snappiest performers from 2017, one of my all time favorite phones, the XPERIA XZ1 Compact is rocking a Snapdragon 835.

I’ll basically hunt for ANY reason to bring this little raptor of a phone out of retirement.

Don’t mock this Mighty Mouse, even by today’s standards, few phones from 2017 have remained as snappy as the XZ1C. After updating to Pie (and getting a separate security update) the XZ1C pulls this score out of Geekbench (a synthetic benchmarking app).

By comparison, here’s the score from the top benchmarking phone in my collection, the Huawei Mate 20, with a Kirin 980 CPU.

Now, a lot of tests go into producing these scores. Geekbench times your phone performing a series of tasks, then assigns these scores based on the completion of those tasks.

You can dig through granular individual times in the Geekbench app. Let’s be frank though, most people want that single end grade. It could be DXO talking about cameras, or a benchmarking app ranking performance, people want one easy number with which to list winners and losers. Context be damned!

On their face, these scores look damning for the older Sony. The Kirin 980 is almost 80% higher than the Snapdragon 835 in single-core benchmarking, and it’s roughly a 65% improvement for multi-core performance. From the end of 2017 to the end of 2018, these scores would indicate a significant jump in processing power.

The Mate 20 is more powerful. No one would dispute that. How should we determine these advantages in daily operation, though? Are any of these gains practical? Will you feel them in real-world use? Another rhetorical question before I actually compare some results?

If I replicate the same tests I performed in the iPhone video above, it’s again clear that benchmarking does not correlate with specific examples of real-world work.

Video Rendering is Hard

Rendering one minute of 4K video in PowerDirector, the Mate 20 finished in roughly 58 seconds. The XPERIA finished the same render in 72 seconds. About a 24% improvement. That’s really good for one year of hardware improvements, but it’s also pretty far from the difference in benchmarking results.

There might be some concerns looking at different CPUs. The Sony uses a Qualcomm SnapDragon, while the Huawei has a HiSilicon Kirin. Unfortunately, my Mate 10 Pro hasn’t received any updates since its first year of release.

Comparing the Mate 10 Pro (Kirin 970) on Oreo, against the Mate 20 (Kirin 980) on Pie, the Mate 10 is actually a second faster at rendering the same video. Android Pie seems to be a bit of a performance hog, even though benchmark numbers stay fairly consistent between the two operating systems.

We saw a subtle but similar performance drop when the OnePlus 6 was updated from Android 8 to Android 9. The OP6 consistently takes about 4-5 seconds longer to run the same test after installing the newest OS update.

Video Stabilization is Also Hard

Another app I like to track, Google Photos has an amazing software stabilization plugin. It’s extremely useful for smoothing out shaky video. To test, I use the same mildly shaky minute of 4K video, and time how long Photos takes to process it.

The Mate 20 is faster than the XPERIA, but not by a lot. The Mate finished the stabilization test in roughly 78 seconds. The XPERIA in 83 seconds. Five seconds separate the two. The Huawei is about 6% faster.

Scrolling back up this post to look at those synthetic benchmark scores, we’re nowhere near any of those tests accounting for this difference (or lack of difference) in performance.

I’m honestly still looking for other real-world measurements.

Opening a 16MB Excel test spreadsheet as another example, the Mate is consistently faster, but we’re talking a fraction of a second which is almost un-measurable (I had to count individual frames from a video recording of both phones opening each spreadsheet). Certainly not worth the hassle or the cost of flipping an older phone.

Playing games, the Mate easily achieves higher frame rates in graphics intense titles, but the XPERIA can often hang near the full refresh rate of its display. It’s hard to get much “smoother” than the full speed of your screen.

Mobile chipsets are getting more powerful, but there are other concerns for developers extracting this power. Battery life and thermals spring to mind.

Even when we try to adjust synthetic benchmarks, when we try to quantify real-world use with a “grade” or a “score”, we still run into problems actually explaining what happens when a computer tries to chew up data.

I’m a big fan of Gary Explains, and he’s been doing some great work in trying to standardize testing. His Speed Test G runs a series of tests on video, and you time how long it takes the phone to complete those tests. Put those timed runs on video, do a little play-by-play commentary, and they make for fun “speed run” videos. Bragging rights!

Problems creep up when we see anomalies in phone performance though. Running multiple short tests, and then lumping individual processes all under one score, means consumers might miss specifics. A phone might perform perfectly well for their intended use, but then lose the overall race in a test which isn’t relevant to their needs.

This was demonstrated in a “Pixel 1 vs 2 vs 3” showdown. We all know the Pixel 2 is more powerful than the Pixel 1, but the Pixel 1 lead the first 40 seconds of the speed run embedded above. Through a sorting test, a photo blur test, some game play on 2048, a bloom test, and it wasn’t until the Pixel 1 started slowing down on a database that the Pixel 2 was able to catch and surpass the older phone.

Because this is a “speed run” we can’t properly explain that early performance disparity. What individual services caused the Pixel 2 to struggle? We just get to the end of the race, where the Pixel 2 wins by a significant margin because of an advantage in video game graphics processing.

Do We Get What We Pay For?

Maybe we’ve got limiters on these devices to prevent nuking our batteries. Maybe these chipsets throttle before they run too hot. Maybe app developers don’t know how, or are prevented from, extracting all of the power out of these chips. Whatever the reason, the end consumer really isn’t getting the performance gains indicated by marketing materials or benchmarking apps.

This continues to reinforce my assertion that general consumers would still be well served by significantly older hardware. Basically, anything that arrived in 2016, should be more than powerful enough for most daily (and some advanced) tasks today.

If asked, I think many consumers would balk at the idea that flipping an older phone (and spending nearer $1000 to replace it) will only come with minor improvements to app launching times and a handful of fun new camera modes.

The main drawback for using an older phone of course, is planned obsolescence. Nearly every Android manufacturer ends software support after about a year and a half. If Google and other developers were really coding with an eye on efficiency, we could easily achieve 3-4 year upgrade cycles. Let alone concerns over really shady practices like battery throttling (I’ll have to revisit that for yet another cranky rant).

I’m not saying it IS greed, but it does feel a lot LIKE greed.

People enjoy riding the upgrade train, getting the latest and greatest. It’s emotional. It feels good. The industry has done a terrific job training consumers to expect something EXCITING to arrive EVERY year.

But every year, amidst promises of faster speeds, prettier bar graphs, flashier scores, BIGGER NUMBERS, what do we really get for our cash?