There has been a proliferation of data vendors selling to investment managers in recent years. And where a select group of systematic or quantitative fund managers were once the vendors’ primary customers, much of the asset management industry is now taking a closer look.

Vendors offer a wide range of datasets, from shipments of commodities around the world to forecasts of crop yields derived from satellite images. Other vendors offer data that contains sensitive personal information, such as credit card records or location data.

Sensitive data of this kind is of interest to asset managers because it can potentially be analysed to reveal consumer spending trends and patterns. But it is the broad trends themselves, rather than the specific information on individuals, that may contain useful insights for financial market investment.

In fields other than finance, there are other legitimate uses of sensitive data that may benefit the public. Urban planning could be enhanced by data revealing trends in the movement of people, as shown in this research involving ride-sharing firm Uber. Nonetheless, there is a significant risk that some of the data on sale could be misused. For this reason, Winton has a policy of not acquiring data that could be interrogated to identify private individuals without their consent.

Gaining confidence in a technique that simultaneously protected people’s privacy while preserving overarching patterns in a given dataset might be one reason to revisit this policy. To that end, we launched a differential privacy study with leading academics in privacy protection at the University of California, Berkeley.





Traditional Approaches to Protecting Privacy

Companies are generally aware of their regulatory and moral obligations to protect people’s privacy when using datasets, and most make good faith efforts to do so. One typical measure is to remove personally identifiable information (PII) from the data in question. The table that follows below provides a stylized example: names have been removed, zip codes obscured and ages bucketed into ranges.

