I readily admit that this headline could have been and probably was written around the turn of the century, when 3DMark was a dominant topic of discussion among PC gamers. But it's 2016 now and it seems we need to revisit the matter of benchmarks as they start to proliferate on mobile devices. DxOMark's cameraphone test has grown into the 3DMark of the mobile world — a multifaceted, widely used performance exam that churns out a definitive scalar score — and it's producing similar effects. Phone geeks are comparing and disputing scores while manufacturers are taking notice and actively optimizing their cameras to produce higher DxOMarks. Not better images, higher marks.

Take HTC's 10 as the most recent instructive example. It scored 88 out of 100 on DxOMark's mobile camera benchmark. If you're only reading the numbers, that rating makes the 10 equal to Samsung's Galaxy S7 Edge, the only other smartphone to score as high as 88. I've used both extensively and I can tell you that's simply not the case — the S7 is outstanding whereas the 10 is merely good. Why do they then share the same score, and why is Sony's Xperia Z5 sitting in third place with a no less impressive 87?

Empirical data can be beguiling, and often incomplete

The commonality between the 10, the S7, and the Z5 is in the excellent technical capabilities of their imaging sensors. But the divergence is in the actual results each camera produces and the experience of using the device. The Z5 is the best example of the limitations of benchmarks, because it can take brilliant pictures, but most often doesn't because of its laggy performance. If your phone's software is so bad as to discourage you from even trying to snap a fleeting moment, what difference does the sensor's quality make?

Benchmarks, like statistics, exist to inform. We just need to be wary about misinterpreting their numerical certitude for factual completeness. 3DMark and DxO endeavor to cover most usage scenarios, but they'll never cover all, and part of the dissatisfaction with graphics performance benchmarks stems from their inadequate reflection of actual in-game performance.

If a laptop or a phone does well in a web-browsing battery benchmark, that only gives an indication that it would probably fare decently when handling bigger workloads too. But not always. My good friend Anand Shimpi, formerly of AnandTech, once articulated this very well by pointing out how the MacBook Pro had better battery life than the MacBook Air — which was hailed as the endurance champ — when the use changed to consistently heavy workloads. The Pro was more efficient in that scenario, but most battery tests aren't sophisticated or dynamic enough to account for that nuance. It takes a person running multiple tests, analyzing the data, and adding context and understanding to achieve the highest degree of certainty.

DxOMark has become the holy metric of camera performance

As objective as benchmarks always try to be, there's inevitably a measure of subjective judgment baked into them. For example, DxO's seven technical priorities — exposure and contrast, flash, color, autofocus, artifacts, noise, and texture — might not be the same ones that you value. Or they might be weighed differently. If you want a phone to help you photograph your kids playing in the park, you probably don't care about the flash, but very much care about having the fastest autofocus and image processing.

What I worry about most is that smartphone manufacturers have turned DxOMark into their holy metric for camera excellence. I've heard from numerous Android smartphone makers about the priority given to obtaining the highest possible score in DxO's benchmark. Motorola made a big deal out of it with the launch of its flagship Moto phones last year, Sony has clearly been keeping a keen eye on the rankings, and HTC is making sure everyone knows it has the (shared) best score.

When I asked HTC if the 10's camera was designed specifically to do well in DxO's benchmarks, the company gave only the boilerplate response that it's aiming for the best possible mobile photography experience. Only the facts contradict its claim. Autofocus on the HTC 10 achieved a mighty 93 out of 100 from DxO, but HTC has just had to issue a software update to fix its autofocus algorithms, which were too panicky in showing a warning about obstructing the focusing laser. It's an obvious and annoying usability issue, but not something a technical test would pick up on. Another thing DxO doesn't tell you: HTC's camera software only calculates full-frame exposure and doesn't allow the option to tap an area on the viewfinder to expose specifically for it.

Doing well in benchmarks is necessary, but not sufficient

There's a strand in smartphone UX design built around "sloppy interactions," as Nokia's former design chief Marko Ahtisaari used to put it. It was the thing underpinning the Nokia N9's interface, and it's something almost entirely outside the remit of any benchmark — the goal is to make devices easy to use in a sloppy manner, as people do in the real world, which runs completely counter to using devices in the repeatable way that a test demands. The tension between those unquantifiable, but still very real design considerations and the hard numbers that benchmarking produces is a challenge that every device maker has to deal with. Scoring highly in benchmarks is necessary, but not sufficient.

There's valuable information to be gleaned from benchmarks and other empirical tests. Looking at the top rankings on DxO's site, I see many of my favorite cameraphones, like the LG G4 and iPhone 6S Plus — but I also see the Nextbit Robin garnering an 81 when most who've used that phone would say the camera is probably its biggest weakness. Still, a technically inept cameraphone can't make it on the DxOMark top list, even if the benchmark sometimes misses important downsides. So the way I interpret DxOMark results is with a measure of skepticism and proportionality. DxO tells me who has technically great hardware, and then it's up to me to determine if the software and overall user experience match that quality.

Benchmarks are useful tools. You just have to know how to use them.

Verge Reviews: HTC 10