Protecting Data Privacy through Hard-to-Reverse Negative Databases is an entertaining paper. Here's the problem it tries to solve: You want to store personal details but you don't want people to be able to mine the database. For example you might want to store brief personal details with a social security number so you can verify that someone is a member of some organisation, but for security reasons you don't want a hacker to be able to get a list of members even if they steal the entire database. That sounds like an impossible set of requirements. But the trick is this: instead of storing the personal details you store a list of all possible bitstrings representing personal details that members don't have. Insane right?



If the personal details are just 256 bits long, say, then to store details for 100 people, say, the database needs 2256-100 entries. That's vast. But the database of non-members is highly compressible. Let me lift an example directly from the papers. Suppose the personal information is three bits long and we wish to store the entries {000,100,101}. Then the non-entries are {001,010,011,110,111}. The non-entries can now be compressed using a wildcard scheme {001,*1*}. The *1* represents all of the entries fitting that pattern from 010 to 111. This turns out to be quite a compact representation. It's beginning to look plausible.





And here's the neat bit: SAT can be translated into the problem of finding valid database entries from the wildcard representation. So extracting entries is provably NP-complete, despite the fact that we can check entries in a reasonable amount of time (for suitable definitions of 'reasonable').





Anyway, I still have a whole slew of objections, and the actual compression scheme proposed has holes in it that can result in false positives. But what initially sounded implausible is beginning to look workable.





PS You can also read about this at the Economist but I've tried to spare you the irrelevant fluff :-)