Every Silicon Valley company wants more data. But today, tech firms are increasingly taking a paradoxical approach to filling that endless appetite. Thanks to an emerging branch of data science called "differential privacy," they can analyze mountains of user info without breaching the privacy of any individual user. And of all the companies eager to use that science to rehabilitate their reputation for controversial privacy violations, perhaps none has more at stake than Uber.

On Thursday, the embattled ride-sharing startup announced a new and well-timed advance in that privacy engineering field, releasing an open-source tool designed to give the ride-sharing firm---and any other company that adopts its technique---a new method of letting engineers gather statistical results from massive datasets while still remaining blindfolded to the personal details of any single user.

Elastic Start

The method, known as elastic sensitivity, was built with the help of a group of University of California at Berkeley researchers, who spent the last 18 months testing it against a collection of 8.1 million actual statistical queries Uber's staff made to their existing database, as those staffers analyzed everything from traffic patterns to revenue generated by different cities' drivers. The system they developed as a result, called FLEX, uses some mathematical tricks to set a limit on how much any of those statistical queries can reveal about any individual Uber rider or driver.

"The intent is for it to be used in cases where there's authorized access to some amount of data but we want to add additional protection on top of that," says Menotti Minutillo, Uber's head of privacy engineering. Whenever possible, Minutillo says, Uber will use its elastic sensitivity tool to limit the data access of staff who spend their days probing the company's data to make the service more profitable and efficient. Thanks to the properties of its new differential privacy tool, Minutillo says Uber's analysts can perform "statistical rollups, sums, averages, counts, things like that, without needing access to the raw data."

Uber's elastic sensitivity technique works by adding a certain amount of noise into responses to database queries. The system tailors the exact amount random "padding" to the question---the more potential for privacy invasion, the more noise added---to make it impossible to distinguish anything from the results about a single person.

So if an Uber business analyst asks how many people are currently hailing cars in midtown Manhattan---perhaps to check whether the supply matches the demand---and Ivanka Trump happens to requesting an Uber at that moment, the answer wouldn't reveal much about her in particular. But if a prying analyst starts asking the same question about the block surrounding Trump Tower, for instance, Uber's elastic sensitivity would add a certain amount of randomness to the result to mask whether Ivanka, specifically, might be leaving the building at that time. Ask about the address of Trump Tower itself, and the differential privacy system would likely add in so much noise that the answer would be altogether meaningless, says Noah Johnson, one of the Berkeley researchers.

"The idea is that if you were to remove any single person's data, the result wouldn't change very much," says Johnson. "So you can’t learn anything about individual trips, but you can learn a lot about aggregate populations of users and trips."

Privacy Trend

Those properties of Uber's differential privacy system aren't exactly unique: Companies like Google and Apple are all competing to build systems that gather broad user data while similarly obscuring every individual's traits. But Johnson says its efficiency sets Uber's elastic sensitivity work apart. By optimizing their technique for the large set of queries Uber shared with them, they were able to add only a tiny .03 percent in additional computation to each query while determining how much noise should be added to any given result.