SMT Solving on an iPhone

Why buy an expensive desktop computer when your iPhone is a faster SMT solver?

A few days ago, I tweeted this:

fun experiment: my new iPhone runs Z3 faster than my (rather expensive) Intel desktop!



time to start doing all my formal methods research on my phone pic.twitter.com/9Faz9qNvAI — James Bornholt (@siderealed) October 31, 2018

I’ve been seeing discussion for a while about the incredible progress Apple’s processor design team is making, and how it won’t be too long until Macs use Apple’s own ARM processors. These reports usually cite some cross-platform benchmarks like Geekbench to show that Apple’s mobile processors are at least as fast as Intel’s laptop and desktop chips. But I’ve always been a little skeptical of these cross-platform benchmarks (as are others)—do they really represent the sorts of workloads I use my Macs for?

As a formal methods researcher, the only real compute-intensive workload I run regularly is SMT solving, usually the Z3 SMT solver. At this point I’ve spent a lot of time learning about Z3’s performance characteristics, and it also has some peculiarities benchmark suites won’t capture (Z3 is generally single threaded). I recently bought a new iPhone XS featuring Apple’s latest A12 processor. So, in a fit of procrastination, I decided to cross-compile Z3 to iOS, and see just how fast my new phone (or hypothetical future Mac) is.

The first test

Cross-compiling Z3 turns out to be remarkably simple, with just a few lines of code changes necessary; I open sourced the code to run Z3 on your own iOS device. For benchmarks, I drew a few queries from my recent work on profiling symbolic evaluation, extracting the SMT output generated by Rosette in each case.

As a first test, I compared my iPhone XS to one of my desktop machines, which uses an Intel Core i7-7700K—the best consumer desktop chip Intel was selling when we built the machine 18 months ago. I expected the Intel chip to win quite handily here, but that’s not how things turned out:

The iPhone XS was about 11% faster on this 23 second benchmark! This is the result I tweeted about, but Twitter doesn’t leave much room for nuance, so I’ll add some here:

This benchmark is in the QF_BV fragment of SMT, so Z3 discharges it using bit-blasting and SAT solving.

fragment of SMT, so Z3 discharges it using bit-blasting and SAT solving. This result holds up pretty well even if the benchmark runs in a loop 10 times—the iPhone can sustain this performance and doesn’t seem thermally limited. That said, the benchmark is still pretty short.

That said, the benchmark is still pretty short. Several folks asked me if this is down to non-determinism—perhaps the solver takes different paths on the different platforms, due to use of random numbers or otherwise—but I checked fairly thoroughly using Z3’s verbose output and that doesn’t seem to be the case.

Both systems ran Z3 4.8.1, compiled by me using Clang with the same optimization settings. I also tested on the i7-7700K using Z3’s prebuilt binaries (which use GCC), but those were actually slower.

What’s going on?

How could this be possible? The i7-7700K is a desktop CPU; when running a single-threaded workload, it draws around 45 watts of power and clocks at 4.5 GHz. In contrast, the iPhone was unplugged, probably doesn’t draw 10% of that power, and runs (we believe) somewhere in the 2 GHz range. Indeed, after benchmarking I checked the iPhone’s battery usage report, which said Slack had used 4 times more energy than the Z3 app despite less time on screen.

Apple doesn’t expose enough information to understand Z3’s performance on the iPhone, but luckily, Intel does for their desktop processor. I spent some time poking around using VTune to see where the bottlenecks were when running Z3 on the desktop. As Mate Soos observes, most SAT solving time is spent in propagation, which is very cache-sensitive. VTune agrees, and says that Z3 spends a lot of time waiting on memory while iterating through watched literals. So the key to performance here seems to be cache size and memory latency. This effect might explain why the iPhone is so strong on this benchmark—the A12 chip has a gigantic, low latency L2 cache, and also seems to have better memory latency after a cache miss compared to the 7700K.

The rapid march of Apple silicon

To test whether that diagnosis is correct, I ran a broader experiment, gathering all the Apple devices I could get my hands on. I also chose a benchmark about 10 times slower (i.e., 4 minutes on a desktop) to mitigate any concerns about mobile burst performance.

Here are the results for the devices I gathered, graphed according to their release date, and relative to the Apple A7, which was their first 64-bit custom CPU design:

The first thing to note is that the i7-7700K desktop processor beats the iPhone XS on this different, longer benchmark. But the iPhone is incredibly competitive, falling in between the 7700K and its predecessor i7-6700K, which was the fastest consumer desktop processor until just under two years ago.

For fun, I also added the Intel Core m7-6Y75, which is the processor in my 2016 MacBook. The iPhone XS is about 50% faster than my laptop at running Z3.

The really remarkable thing here is the trend for Apple: a fairly consistency 30% year-on-year improvement for this Z3 benchmark. Obviously we shouldn’t draw too many conclusions from this one silly benchmark, but it seems like it will only take one or two more iterations of this trend for Apple CPUs to make total sense for my workloads. I honestly didn’t expect it to be this close—modern smartphone architectures are incredible!

Thanks to Meghan Cowan, Max Willsey, and Eddie Yan for helping me track down more devices and run experiments.