What’s wrong with the statistic? Well, let’s take an example from law enforcement. Suppose I become the target of a government investigation. The government gets a warrant and seizes a year’s worth of my email. Looking at my email patterns, that’s about 35,000 messages. About twenty percent — say 7500 –are one-off messages that I can handle with a short reply (or by ignoring the message). Either way, I’ll never hear from that person again. And maybe a quarter are from about 500 people I hear from at least once a week. The remainder are a mix — people I trade emails with for a while and then stop, or infrequent correspondents that can show up any time. Conservatively, let’s say that about 25 people are responsible for the portion of my annual correspondence that falls into that category. In sum, the total number of correspondents in my stored email is 7500+500+25 = 8000 or so. So the criminal investigators who seized and stored my messages from me, their investigative target, and over 8000 people who aren’t targets.

AD

AD

Or, as the Washington Post might put it “7999 out of 8000 account holders found in a large cache of communications seized by law enforcement were not the intended surveillance target but were caught in a net the investigators had cast for somebody else.”

Maybe the Post is performing some far more sophisticated calculation, and they didn’t bother to explain it, despite its prominence in the story. If not, though, the inherent bias in the measure is such that it demands an acknowledgement . (After all, it allows you to say “half of all account holders in the database weren’t the target” if the agency stores just a single message sent to the target.) This is something that any halfway sentient editor should have recognized.

Which raises this question: I’ve heard of newspapers chasing stories that are “too good to check.” Does the Post think that Gellman’s are too good to edit?