Since the first zero met the first one, people have been shrilly overestimating the effects of computers on our day-to-day lives. Most instances of wild exaggeration are eventually brought back down to earth (at least for a while). It happened with the wild estimates of economic harm done by piracy. The latest aspect of our shared interaction to be punctured is cybercrime, the extent and pervasiveness of which has been described in terms of near-Apocalyptic overstatement, according to the authors of a new report.

The report, "Sex, Lies and Cyber-crime Surveys" (PDF), by Dinei Florencio and Cormac Herley of Microsoft Research, has now been released.

In an op-ed in the New York Times that coincided with the report's release, they wrote, "One recent estimate placed annual direct consumer losses at $114 billion worldwide. It turns out, however, that such widely circulated cybercrime estimates are generated using absurdly bad statistical methods, making them wholly unreliable."

What initially attracted the authors' attention was the disparity between the huge figures and the fact that access to these resources (money, via hacking) is relatively easy.

"The demand for easy money outstrips supply. Is cybercrime an exception?" Florencio and Herley asked. "If getting rich were as simple as downloading and running software, wouldn't more people do it, and thus drive down returns?"

The problem, they discovered, was in the manner of gathering the cybercrime loss figures. They were put together via surveys.

"First, losses are extremely concentrated," they wrote in the report, "so that representative sampling of the population does not give representative sampling of the losses. Second, losses are based on unverified self-reported numbers. Not only is it possible for a single outlier to distort the result, we find evidence that most surveys are dominated by a minority of responses in the upper tail."

Because the survey results are not representative of the population as a whole, each reported loss in one of the surveys is extrapolated to a large, and unsupported, amount in the general population.

"One unverified claim of $7,500 in phishing losses translates into $1.5 billion."

A combination of the remarkably enduring fiction that anything with an "e" or an "i" in front of it is terra incognita, a land of mystery where precedents are nonexistent and the normal rules of space and time do not hold, combined with, well, lazy thinking, have, the authors maintained, created a common wisdom wildly out of sync with simple facts and basic mathematics.

"Our assessment of the quality of cyber-crime surveys is harsh," they conclude. "They are so compromised and biased that no faith whatever can be placed in their endings."

The repetition by the media, bloggers, and others of "unreliable data that is masquerading as reliable data" sustains the echo, and these operatically exaggerated claims of Brobdignagian cybercrime statistics take on the air of legitimacy and wind up being very hard to root out. Those who are invested in the original statistics, from the groups conducting the surveys to the analysts who build arguments on top of them, sometimes find themselves resistant to reversing their conclusions for fear of coming off foolish.

Changing your mind when confronted with new and better facts should never be considered a flaw. Hopefully, Florencio and Herley's report will make that change easier—at least when it comes to this aspect of our overblown regard for glorified calculators. And with luck, it will deter the future use of unverified survey data as building blocks for our models of cybercrime.

Listing image by Photograph by Brandon Anderson