Geekbench CEO Fireside Chat pt.3: Performance Variance on Android, Upcoming Device Leaks on Databases & Apple Chipsets

We may earn a commission for purchases made using our links.

Geekbench 4 just released, and it has been making waves in the tech industry. It shows substantial improvements over Geekbench 3, and has seen a highly favourable reception.

The new version of Geekbench has resulted in a considerable shakeup of where devices rank in relation to each other. We’ve seen Qualcomm Snapdragon 820 chips fall, and ARM-based designs rise, leading to questions as to what changes were made to the benchmark to cause this shift.

We had an opportunity to sit down with the CEO of Primate Labs, John Poole, to interview him about the launch of Geekbench 4, and about the mobile ecosystem as a whole. In part 3 we talk about score variance between different runs, how Geekbench is targeted and designed, device leaks, and R&D spending.

Steven: I want to touch a little bit further on the topic of score variance. Variation of scores between different runs and different users. How much of it do you think is because of things like processor binning and manufacturer defects, or is it more because of ambient temperatures and background tasks and stuff like that.

John: I find mobile performance [to be] a lot more sensitive. On the desktop side of things, we sort of say “systems should score within about five percent of one another”, and then there’s always experimental stuff; if someone’s running Geekbench on their spyware infested machine, that’s obviously going to have a huge impact on the results. Mobile, I mean, there’s lots of different things. We saw this with the 5X and 6P, where they shipped with Android 6.0.0, and then when they shipped 6.0.1 they put in a bunch of fixes for their scheduler, and performance just went up. So it could be just slightly different version of the operating system. The schedulers, as I said, especially for big.LITTLE systems are horrendously complicated pieces of code, so they might occasionally might make the wrong call. What we find internally in our system is, you know we’ve got about 20 to 30 android devices that we just run every single build on, and we automate everything through Calabash so that we can build these giant spreadsheets to show a variability arc, and what we find is that some phones are better than others. We’ve got, not to pick on them, but we’ve got a Samsung S5 that’s plus or minus twenty percent easily run to run. Now we’ve got a Nexus 6, we’ve got a Samsung Galaxy S6, an S7 and they’re just plus or minus two percent per run. So we can just sort of see that. We also see the other thing come up every once in a while, we’ll see one benchmark scores at one particular place will just suddenly change, and when we dig into it, it’s always like a background task or something comes up, like “Oh hey, I’m going to go download the new Chrome in the background. I’m going to go do this, I’m going to go do that.” Especially when running multi-core benchmarks where we are using all the cores. That can have an impact on performance. It’s a lot easier on desktops; background tasks don’t seem to be as big of an issue. But on phones, just with all the different sensitivities and what not that they have, if you have a background task come up at just the wrong point, that’s going to kill your score.

Steven: Speaking of background tasks, it seems that a lot of reviewers that ran some of the first benchmarks of Geekbench 4, and I’m not going to name names here, but it seems like a lot of people have been not properly killing all background apps, not doing it from a fresh start, stuff like that. Do you guys care that some news sources aren’t really representing Geekbench properly?

John: We would like everybody to sort of take a certain amount of care. If you’re looking at hardware, if you’re trying to sort of say “We’re coming up with the platonic ideal. This is the best score for the device”, then yeah, I really care about like “Are you doing this in the right way?” Are you making sure your phone is completely up to date? Maybe, to make absolutely sure, are you turning off your Wi-Fi connection?”, something like that. Are you making sure you don’t have like Spotify running in the background pumping the latest jams (or whatever it is the kids call it these days)? At the same time, people run their phones in all sorts of environments. I mean you look at, again I will name names, you look at ASUS phones, and they just have a tremendous amount of crap on them. Whereas you look at a Nexus device, and that basically is sort of nice shiny, and you launch, it will update some things, and then you’re good to go.

Steven: We’re seeing that with the Note 7 right now. We just uploaded an article last week about the performance issues that it’s having. It’s taking twice as long to open some apps as the HTC 10, the LG G5, and the OnePlus 3, and other issues like that. There’s a bit of an uproar right now.

John: Yeah, I mean there comes a certain point where it’s like if you’re reviewing the phone, if you’re looking at Geekbench itself, maybe there’s value to running it with all the background tasks on and saying “Okay, well, this is what the phone is doing.” Unfortunately, we as a benchmark, we have no control over that. I mean, it’d be really nice to go “kill-all”, bring everything down, and do a fresh run like that, but at the same time, this is the environment in which most apps run. This is when you’re running Candy Crush or you’re running Facebook, or you’re running Instagram, or Snapchat, or whatever it is the kids run these days. I sound like the crazy old man who sits atop the mountain and shakes his cane at the kids, “Get off my hill!” But it’s all the social media, all that sort of stuff. When you’re running it, stuff will be running in the background and it’s going to be impacting your performance. If Geekbench doesn’t capture that, then we’re not a terribly useful tool for most people.

Steven: Makes sense. Jumping back a bit, you talked a bit about JIT compilation, and we saw change two years ago where there was a bit of a shift with ART. Where people thought “Oh, ok, fine. JIT’s gone now”, at least for apps, and you had ahead of time compilation instead. Now, we find out with Android 7, JIT’s coming back, at least for the first little bit after you install an app while the app is compiling for ART, with the goal of speeding up the installation of apps, speed up boot times, stuff like that. How does that shift in mentality affect the benchmark? Does it affect what you target?

John: We write most of our code in native code. So, we’ve got a native x86, we’ve got a native ARMv7, so the benchmark itself is more or less isolated from that. I mean, if we were just an android benchmark, then if we didn’t have a Dalvik benchmark in there, if we didn’t have a JIT benchmark in there, then we’re not doing our job. I think the current stats are something like seventy percent of the top 100 apps are like NDK-based apps or something like that. Once you go into the long tail of apps, it drops precipitously, but I don’t think Instagram is going to be running everything in Java. There are a non-trivial number of apps that use native code, and since we are cross-platform in all of our benchmarks we sort of do that. For native performance in and of itself, that would sort of change from us being a CPU and GPU compute sub-type benchmark into more of a system benchmark. I think there’s definitely some interesting stuff we could be doing there; we just don’t quite know what that would look like yet. It’s sort of one of those things where it would be really easy for us to prototype up something quick, but then it’s sort of like “How do you present this?” It’s a tricky thing. I mean, if you look on the Google Play Store, there are whole bunch of benchmarks on there where it’s literally somebody in their basement, which is how Geekbench started out so I mean you know…

Steven: It works.

John: Yeah, exactly. It’s sort of like “Oh, I’ve got this idea. I’m going to throw it at the wall and see what sticks.” And if you’ve got something compelling that lets people figure something out about their system, then great. But if you’ve got something that’s sort of like “Well, I just threw a bunch of stuff together and I don’t actually know what it means myself” and all that, I mean that’s kind of right, now if we were to do a Dalvik benchmark, it would be sort of something like “Well, here’s this thing, but we’re not quite sure when that stuff happens.” That’s a much trickier thing, and 2. since the system is shifting under you… we get people reaching out to us all the time saying “Hey, we just we just saw this result, what does this mean?” and since we’re not… we do okay with Android, but we’re by no means a full stack expert on Android. Just like we’re not a full stack expert on anything that we do. We have to be generalists. We have to understand all the platforms to a certain degree, and it won’t be necessarily like “Oh, yeah, in 6.0.1 they changed the scheduler to do this and that and the other thing.” We just don’t have resources to those deep dives, which is why we tend to think of things a little bit more higher level, more abstractly, and trying to think of generalities across large categories of apps. So, basically a long-winded way of saying we don’t really think about Dalvik insofar as we don’t think about any one particular application at a time.

Steven: More indirectly.

John: Yeah, exactly. Sort of a “Well, if you’re doing binary translation, if you’re taking Java and compiling that down to assembly, that’s kind of the same thing as doing JavaScript in a web browser and compiling that down to native code.” The semantics of like Java might be slightly different than JavaScript, and I mean I say this all, in the past life I actually worked on a JVM, so I mean I’ve seen how the sausage is made. There’s nothing inherently tricky about writing a JVM, the tricky bit is making it run really quickly, and then that’s when you start getting into platform-specific optimizations and all that sort of thing. Basically a long-winded way of saying “We don’t think about it insofar that we don’t think about any one particular app.” We don’t think about when we’re doing a web browser benchmark “What does chrome do?” That’s insightful, but we also have to think about Safari, we have to think about Edge, we have to think about Opera, or Firefox, or what have you. So have to think about “What’s the trend across all these?” I think android right now… well, I was going to say “it’s the only”, but there really are only iOS and Android right now, I mean iOS is all native apps. If they do interpreted stuff, it’s only through Webviews or something like that.

Steven: Speaking a little bit about competition, I noticed that Geekbench 4 for Android is free to install with no ads. Is there a monetization strategy for Android, or is it just driving sales for PC?

John: We’re using Geekbench for Android as sort of marketing almost. We used to have it as a paid app, and just basically people don’t pay for apps on Android the same way they do on other platforms. So we sort of made the call to go “You know what? We’ll make this free, we’ll try to make it a good app, but we’re not going to view this like our PC or even our iOS app”, which provides a non-trivial amount of our revenue. Business licensing and all that sort of thing are another thing where someone might come to us and say “Hey, we want to use Geekbench internally”, and it’s an “Okay. Well, if you don’t want to upload all your scores to our browser, buy a site license” kind of thing.

Steven: Speaking of which, a lot of the first views of chip seems to be scores popping up on Geekbench. How often do you think manufacturers are purposely leaking scores out? How often is it…

John: My guess is it’s entirely accidental.

Steven: How often have they contacted you about to removing the accidental leaks?

John: There’s been the odd time where we’ve had a hardware manufacturer reach out to us and say “Hey, we’d really like you to remove this”, and we’ll do it that first time and say “You really need to be more careful, we don’t want to be in the rumor business.” Just because a lot of the times when you’re running on pre-release hardware, the operating system isn’t finished, they’re all sorts of performance problems. Again, because I know there’s been some Samsung scores bouncing around recently, if Samsung uploads something six months ahead of time, 1. it’s bad for them because you know “Hey, the cat’s out of the bag”, and it might be an engineering sample or development device, and you’ve got lord knows what running on it, so people look at that and go “Well, the performance isn’t right”, and then other people might go and say “Well, Geekbench measured it wrong.” It’s just a huge headache that we’d rather avoid. At the same time, we sort of expect the hardware manufacturers to run a tight ship on their own, so the first time we’ll have a dialogue with them and say “Hey, we do provide licenses that don’t talk to the outside world, perhaps you would like one of these.” But even at companies where we’ve got that relationship and they’ve got access to these licenses, you can have somebody in marketing or someone who is far away from the technical side of things go “Oh, I’ve got the new S8, this is great. I’m going to go run Geekbench on it.” I know that happened, and I’ll name them because they’re not really relevant anymore, but BlackBerry. We shipped an app for BlackBerry that had the upload requirement, and basically apparently everybody in the company, despite a dialog box coming up first time “We will upload stuff to the internet”, they leaked their entire product roadmap for the next six months.

Steven: And they had some crazy lead times back in the day. I remember seeing the Z10 WAY before it was even really rumored, let alone announced.

John: Oh, yeah. It’s nuts. I mean I went to Waterloo and I used to work in Waterloo, so I knew all sorts of people who worked at RIM.

Steven: Yeah, I was at Laurier myself.

John: Perfect. So I’m sure you saw all the people heads down typing away, running into telephone poles before that was a thing. But you talk to anybody from RIM, anybody from hardware, and they talk about just the trouble they’d have, and how they’d be doing all these different SKUs for different vendors, and I know Samsung does a certain amount of that, like they’ll have the AT&T, and Verizon version, and the Sprint version, and that sort of thing, but mostly that seems to be limited to the chip and the cell radio. Whereas RIM would be doing crazy things where it’s like “It’s the Curve, but this one’s got an extra inch on the screen!” and it would be like “What are you doing?”

Steven: We saw that with the S2. There were 20 models or something like that.

John: Something like. And they’re still doing that. I think the S7 has got about half a dozen models at least.

Steven: They’ve done a lot to cut down on the… I worked a little bit with Samsung for the launch of the S7 before I started working for XDA, and Samsung did a little bit to cut down on the number of models. They actually went and set it up so that for any one market, there would be one model for the entire market, and when you put your SIM card in first time, it downloads the carrier’s software, the carrier apps, and the carrier customizations. So there’s no more Wind Mobile and Bell versions, just Canada. It’s a nice improvement.

John: I think that’s one of the things that Apple did for the industry. They said “Screw you, here’s your phone.” Whereas I think RIM came up in an era where the carriers were used to calling all the shots, and they said “Oh, we want this.” AT&T wants this and then Verizon wants that, Sprint wants something else. And it just turned into this huge… they were just spending so much time chasing down these customizations that you end up with these ridiculously lead times. It’s a lot harder to make ten phones than it is one phone.

Steven: It’s also aggravating for consumers. “Do I have the right frequency bands? Can I travel?” I’m reviewing the ZTE ZMax Pro right now, and it’s a nice phone, but it only has frequency bands for North America. I can’t take it to Europe. I can’t take it anywhere. No Band 1 LTE, no nothing. I mean, it’s a $100 phone with Type-C and a 1920 by 1080 display; it’s really nice for the price, but it has the very minimum number of frequency bands. Only what the network uses. You almost can’t even take it to a different network.

John: Ok, that makes it a bit better. It’s $100. So yeah, that’s basically our Android approach. It wasn’t a huge source of revenue for us in the first place, so sort of sacrificing it to get word of mouth out there. And the other great thing is that, with the more users we have…

Steven: More data.

John: More data. And I mean that’s one of the one of the big things that we’re seeing with Geekbench 3 in the last year or so. We’re adding about half a million results month, and a lot of this is android traffic. And we can build up these charts and we can do these comparisons, and it makes it a lot more powerful and a lot more compelling. It would be great if we could make it free everywhere, but then at the same time…

Steven: You still need a monetization strategy.

John: We still need to monetize it somehow. I like being able to eat and still do this. It’s one of those tricky things. We’re certainly not in this for world domination. You know, we’d like to be able to keep ourselves in hardware and shoes and stuff.

Steven: There’s been some interesting shifts in the industry, especially in the mid-range, going back to the ZMax that we were talking about, with some mid-range chips even outperforming last year’s flagship chips, like the S652 and the S810, where it’s almost embarrassing. What’s your view on what’s going on there? Do you see the trend continuing?

John: I think that’s a side effect of the high-end last year being weird. I think the 810 and other chips like that were rushed. Going back of course to Apple and the 64-bit stuff, I think roadmaps got thrown away and people just had to get something out. So I think that’s one of the reasons why you’re seeing such a shift. But at the same time, if you look at Apple, they launched the SE this year, and that’s the same performance as their flagship, and that’s arguably kind of a mid-range phone now.

Steven: You can pick it up in the U.S. for $200.

John: Nice. Is that with a contract?

Steven: Off contract with some… funky deals.

John: Ok. I think you’re definitely going to have the hundred dollar ZTE phones where it’s sort of like “Well, you get a phone. It’s got a screen, and off you go.” But I think living that mid-range spot, especially once you start getting into carrier subsidies and all that stuff. That drops to zero, but then going up to a high-end phone which may be amazingly better is only two hundred dollars. You’re going from $400 to $650 or something like that. That’s a bigger mental jump than $0 to $200, and it’s like “Oh look, it’s all shiny and new.” I think that’s why that mid-range is getting almost like previous gen flagship or something like that. Which I actually think Apple was doing an interesting job of, where they’d say “Okay, well here’s the 6S, it’s the new shiny. The 6 is now the mid-range type thing, and it was the flagship last year.”

Steven: The pricing wasn’t very midrange. It was only 100, 200 dollars less.

John: Yeah, but I think they’re rapidly coming to the realization that’s not really the way forward. But it’s an interesting idea. You’ll have good, better, best, and then you just shift everything down by one when the new best shows up. And that’s not a bad approach, but having mid-range phones that are nice phones in and of themselves, that maybe aren’t using all the latest bells and whistles, but still are competitive. I think that’s interesting. And to see the mid-range go more towards that, just sort of like “Okay, we’re going to cut some features.” It’s like “No, we’re going to do things a little bit differently.”

Steven: Well, you’re seeing Samsung and Qualcomm with 14 nanometer mid-range devices. The S625 and the 7870.

John: I think so. The last 3 or 4 months have been a blur just as we’re getting Geekbench 4 out. If it’s not on fire and someone isn’t coming to us and saying “Aaaaah!”, we’ve kind of ignored it.

Steven: And obviously with the A35 core coming out and the A73, we’ll see some interesting changes in the next little bit as well. Especially as we were talking about earlier with Intel opening up their 10 nm fabs.

John: Yeah, if people start fabbing on that, it will be interesting.

Steven: I mean, the names of the process nodes are a little misleading.

John: It’s interesting when you see people fully nerd out about this, and they’ll say “Well, the gate pitch is this, but the lithography is that”, and you start going “Oh my god, I died… this is insane.” But you know it’s sort of a… stuff is getting smaller still. We haven’t hit that wall yet, and I know everybody’s talking about… I think I’ve heard every year for the last 10 years “Moore’s law is dead in a year or two.” And they keep pushing that out back and back a little bit more.

Steven: We’re getting to the point where we’re going to have to move away from silicon if we want to keep some level of improvement going.

John: I think there’s going to be… something’s going to happen. Intel spends a ridiculous amount on process, and they’re really quite good at it.

Steven: Their R&D is like what, 10-15 billion a year?

John: Something crazy like that. I honestly am slightly surprised Apple’s not building their own fabs, but my guess is that’s just not a game they want to get into.

Steven: Well, with their cash hoard, it’s kind of surprising that they aren’t spending more money.

John: They’re building cars.

Steven: They’re spending like $6 billion a year, and it’s $6 billion across quite a wide range, whereas with Intel, it’s practically $10 billion on process node.

John: I mean, I hate to say this, because I think Apple does some really interesting stuff, but I view Intel as more of a pure science company, and I view Apple as more of a design company. And there’s nothing wrong with that, they’re just different approaches. Honestly, I mean, what they’ve done with their A series chips are quite frankly quite impressive. They’ve gone from, like I remember the A4 was kind of like “Oh we took an”… was it an A15? Or what was it?

Steven: The start for the Apple A series was kind of like “We took the Exynos and modified it”, and over the years they modified it further and further, and it became more of their own thing.

John: And then it was the A6 or something where they really came out with a custom design and said “Here we go. Isn’t this amazing!” And I mean, hey, it worked for them. And I’m impressed that they put that sort of time and energy into that, but I view that as kind of an outlier. I view a lot of other stuff as, and I mean they’re really quite good at this and I don’t want to sound dismissive, they take an idea, like with the iPod. What was CmdrTaco’s comment? It was a “Less storage, not as nice as a Nomad. Lame.” kind of thing. But the Nomad was an objectively horrible MP3 player. I knew about it for a year or two before I went out got my iPod. And I remember the first time I saw the iPod, it was like “I want one of those.” So they’re really good at that sort of stuff, but I can see a company like Intel, or GlobalFoundries, or TSMC or what have you actually pushing that node down because they’re more willing to make that investment in the basic science of it.

Steven: To some extent Apple’s R&D costs are a little hidden though also, because they get some of it through buying other companies. I think back in the day it was Intrinsity that they bought, the company that helped Samsung reach the 1 GHz mark. Apple bought that. They bought P.A. Semi which brought them Jim Keller and the base of the team that designs their cores. They bought the manufacturer of the fingerprint sensor for the Motorola Atrix, which resulted in the fingerprint sensor being dropped from the Nexus 6. They bought Siri. They bought Beats. A bunch of stuff like that. So part of their R&D is coming from buying companies that seem to be doing well. Although that’s not unique to Apple.

John: I remember in the 90s, Microsoft got a lot of flak for that. They’re like “Oh, you’re not innovating, you’re just buying people.” So, it’s how things work.

Steven: It’s a strategy that seems to work. Acquihiring and also buying for the actual ideas themselves. I mean, there’s ups and downs to it, but…

John: Absolutely. I mean, hey, if someone’s got proven technology it’s likely cheaper to buy them than try to start off from scratch.

That’s it for part three. Stay tuned as there is a lot of in-depth information and entertaining tidbits coming in the fourth part of our fireside chat. We hope you enjoyed the chat and learned a thing or two!