Update (Jan. 23): Daniel Lidar, one of the authors of the “Defining and detecting…” paper, was kind enough to email me his reactions to this post. While he thought the post was generally a “very nice summary” of their paper, he pointed out one important oversight in my discussion. Ironically, this oversight arose from my desire to bend over backwards to be generous to D-Wave! Specifically, I claimed that there were maybe ~10% of randomly-chosen 512-qubit problem instances on which the D-Wave Two slightly outperformed the simulated annealing solver (compared to ~75% where simulated annealing outperformed the D-Wave Two), while also listing several reasons (such as the minimum annealing time, and the lack of any characterization of the “good” instances) why that “speedup” is likely to be entirely an artifact. I obtained the ~10% and ~75% figures by eyeballing Figure 7 in the paper, and looking at which quantiles were just above and just below the 100 line when N=512.

However, I neglected to mention that even the slight “speedup” on ~10% of instances, only appears when one looks at the “quantiles of ratio”: in other words, when one plots the probability distribution of [Simulated annealing time / D-Wave time] over all instances, and then looks at (say) the ~10% of the distribution that’s best for the D-Wave machine. The slight speedup disappears when one looks at the “ratio of quantiles”: that is, when one (say) divides the amount of time that simulated annealing needs to solve its best 10% of instances, by the amount of time that the D-Wave machine needs to solve its best 10%. And Rønnow et al. give arguments in their paper that ratio of quantiles is probably the more relevant performance comparison than quantiles of ratio. (Incidentally, the slight speedup on a few instances also only appears for certain values of the parameter r, which controls how many possible settings there are for each coupling. Apparently it appears for r=1, but disappears for r=3 and r=7—thereby heightening one’s suspicion that we’re dealing with an artifact of the minimum annealing time or something like that, rather than a genuine speedup.)

There’s one other important point in the paper that I didn’t mention: namely, all the ratios of simulated annealing time to D-Wave time are normalized by 512/N, where N is the number of spins in the instance being tested. In this way, one eliminates the advantages of the D-Wave machine that come purely from its parallelism (which has nothing whatsoever to do with “quantumness,” and which could easily skew things in D-Wave’s favor if not controlled for), while still not penalizing the D-Wave machine in absolute terms.

A few days ago, a group of nine authors (Troels Rønnow, Zhihui Wang, Joshua Job, Sergio Boixo, Sergei Isakov, David Wecker, John Martinis, Daniel Lidar, and Matthias Troyer) released their long-awaited arXiv preprint Defining and detecting quantum speedup, which contains the most thorough performance analysis of the D-Wave devices to date, and which seems to me to set a new standard of care for any future analyses along these lines. Notable aspects of the paper: it uses data from the 512-qubit machine (a previous comparison had been dismissed by D-Wave’s supporters because it studied the 128-qubit model only); it concentrates explicitly from the beginning on comparisons of scaling behavior between the D-Wave devices and comparable classical algorithms, rather than getting “sidetracked” by other issues; and it includes authors from both USC and Google’s Quantum AI Lab, two places that have made large investments in D-Wave’s machines and have every reason to want to see them succeed.

Let me quote the abstract in full:

The development of small-scale digital and analog quantum devices raises the question of how to fairly assess and compare the computational power of classical and quantum devices, and of how to detect quantum speedup. Here we show how to define and measure quantum speedup in various scenarios, and how to avoid pitfalls that might mask or fake quantum speedup. We illustrate our discussion with data from a randomized benchmark test on a D-Wave Two device with up to 503 qubits. Comparing the performance of the device on random spin glass instances with limited precision to simulated classical and quantum annealers, we find no evidence of quantum speedup when the entire data set is considered, and obtain inconclusive results when comparing subsets of instances on an instance-by-instance basis. Our results for one particular benchmark do not rule out the possibility of speedup for other classes of problems and illustrate that quantum speedup is elusive and can depend on the question posed.

Since the paper is exceedingly well-written, and since I have maybe an hour before I’m called back to baby duty, my inclination is simply to ask people to RTFP rather than writing yet another long blog post. But maybe there are four points worth calling attention to:

The paper finds, empirically, that the time needed to solve random size-N instances of the quadratic binary optimization (QUBO) problem on D-Wave’s Chimera constraint graph seems to scale like exp(c√N) for some constant c—and that this is true regardless of whether one attacks the problem using the D-Wave Two, quantum Monte Carlo (i.e., a classical algorithm that tries to mimic the native physics of the machine), or an optimized classical simulated annealing code. Notably, exp(c√N) is just what one would have predicted from theoretical arguments based on treewidth; and the constant c doesn’t appear to be better for the D-Wave Two than for simulated annealing. The last sentence of the abstract (“Our results … do not rule out the possibility of speedup for other classes of problems”) is, of course, the reed on which D-Wave’s supporters will now have to hang their hopes. But note that it’s unclear what experimental results could ever “rule out the possibility of speedup for other classes of problems.” (No matter how many wrong predictions a psychic has made, the possibility remains that she’d be flawless at predicting the results of Croatian ping-pong tournaments…) Furthermore, like with previous experiments, the instances tested all involved finding ground states for random coupling configurations of the D-Wave machine’s own architecture. In other words, this was a set of instances where one might have thought, a priori, that the D-Wave machine would have an immense home-field advantage. Thus, one really needs to look more closely, to see whether there’s any positive evidence for an asymptotic speedup by the D-Wave machine. Here, for D-Wave supporters, the biggest crumb the paper throws is that, if one considers only the ~10% of instances on which the D-Wave machine does best, then the machine does do slightly better on those instances than simulated annealing does. (Conversely, simulated annealing does better than the D-Wave machine on the ~75% of instances on which it does best.) Unfortunately, no one seems to know how to characterize the instances on which the D-Wave machine will do best: one just has to try it and see what happens! And of course, it’s extremely rare that two heuristic algorithms will succeed or fail on exactly the same set of instances: it’s much more likely that their performances will be correlated, but imperfectly. So it’s unclear, at least to me, whether this finding represents anything other than the “noise” that would inevitably occur even if one classical algorithm were pitted against another one. As the paper points out, there’s also a systematic effect that biases results in the D-Wave Two’s favor, if one isn’t careful. Namely, the D-Wave Two has a minimum annealing time of 20 microseconds, which is often greater than the optimum annealing time, particularly for small instance sizes. The effect of that is artificially to increase the D-Wave Two’s running time for small instances, and thereby make its scaling behavior look better than it really is. The authors say they don’t know whether even the D-Wave Two’s apparent advantage for its “top 10% of instances” will persist after this effect is fully accounted for.

Those seeking something less technical might want to check out an excellent recent article in Inc. by Will Bourne, entitled “D-Wave’s dream machine” (“D-Wave thinks it has built the first commercial quantum computer. Mother Nature has other ideas”). Wisely, Bourne chose not to mention me at all in this piece. Instead, he gradually builds a skeptical case almost entirely on quotes from people like Seth Lloyd and Daniel Lidar, who one might have thought would be more open to D-Wave’s claims. Bourne’s piece illustrates that it is possible for the mainstream press to get the D-Wave story pretty much right, and that you don’t even need a physics background to do so: all you need is a willingness to commit journalism.

Oh. I’d be remiss not to mention that, in the few days between the appearance of this paper and my having a chance to write this post, two other preprints of likely interest to the Shtetl-Optimized commentariat showed up on quant-ph. The first, by a large list of authors mostly from D-Wave, is called Entanglement in a quantum annealing processor. This paper presents evidence for a point that many skeptics (including me) had been willing to grant for some time: namely, that the states generated by the D-Wave machines contain some nonzero amount of entanglement. (Note that, because of a technical property called “stoquasticity,” such entanglement is entirely compatible with the machines continuing to be efficiently simulable on a classical computer using Quantum Monte Carlo.) While it doesn’t address the performance question at all, this paper seems like a perfectly fine piece of science.

From the opposite side of the (eigen)spectrum comes the latest preprint by QC skeptic Michel Dyakonov, entitled Prospects for quantum computing: Extremely doubtful. Ironically, Dyakonov and D-Wave seem to agree completely about the irrelevance of fault-tolerance and other insights from quantum computing theory. It’s just that D-Wave thinks QC can work even without the theoretical insights, whereas Dyakonov thinks that QC can’t work even with the insights. Unless I missed it, there’s no new scientific content in Dyakonov’s article. It’s basically a summary of some simple facts about QC and quantum fault-tolerance, accompanied by sneering asides about how complicated and implausible it all sounds, and how detached from reality the theorists are.

And as for the obvious comparisons to previous “complicated and implausible” technologies, like (say) classical computing, or heavier-than-air flight, or controlled nuclear fission? Dyakonov says that such comparisons are invalid, because they ignore the many technologies proposed in previous eras that didn’t work. What’s striking is how little he seems to care about why the previous technologies failed: was it because they violated clearly-articulated laws of physics? Or because there turned out to be better ways to do the same things? Or because the technologies were simply too hard, too expensive, or too far ahead of their time? Supposing QC to be impossible, which of those is the reason for the impossibility? Since we’re not asking about something “arbitrary” here (like teaching a donkey to read), but rather about the computational power of Nature itself, isn’t it of immense scientific interest to know the reason for QC’s impossibility? How does Dyakonov propose to learn the reason, assuming he concedes that he doesn’t already know it?

(As I’ve said many times, I’d support even the experiments that D-Wave was doing, if D-Wave and its supporters would only call them for what they were: experiments. Forays into the unknown. Attempts to find out what happens when a particular speculative approach is thrown at NP-hard optimization problems. It’s only when people obfuscate the results of those experiments, in order to claim something as “commercially useful” that quite obviously isn’t yet, that they leave the realm of science, and indeed walk straight into the eager jaws of skeptics like Dyakonov.)

Anyway, since we seem to have circled back to D-Wave, I’d like to end this post by announcing my second retirement as Chief D-Wave Skeptic. The first time I retired, it was because I mistakenly thought that D-Wave had fundamentally changed, and would put science ahead of PR from that point forward. (The truth seems to be that there were, and are, individuals at D-Wave committed to science, but others who remain PR-focused.) This time, I’m retiring for a different reason: because scientists like the authors of the “Defining and detecting” preprint, and journalists like Will Bourne, are doing my job better than I ever did it. If the D-Wave debate were the American Civil War, then my role would be that of the frothy-mouthed abolitionist pamphleteer: someone who repeats over and over points that are fundamentally true, but in a strident manner that serves only to alienate fence-sitters and allies. As I played my ineffective broken record, the Wave Power simply moved from one triumph to another, expanding its reach to Google, NASA, Lockheed Martin, and beyond. I must have looked like a lonely loon on the wrong side of history.

But today the situation is different. Today Honest Abe and his generals (Honest Matthias and his coauthors?) are meeting the Wave Power on the battlefield of careful performance comparisons against Quantum Monte Carlo and simulated annealing. And while the battles might continue all the way to 2000 qubits or beyond, the results so far are not looking great for the Wave Power. The intractability of NP-complete problems—that which we useless, ivory-tower theorists had prophesied years ago, to much derision and laughter—would seem to be rearing its head. So, now that the bombs are bursting and the spins decohering in midair, what is there for a gun-shy pampleteer like myself to do but sit back and watch it all play out?

Well, and maybe blog about it occasionally. But not as “Chief Skeptic,” just as another interested observer.