The recent Facebook and the Cambridge Analytica leak is teaching us that giving away data is a high price to pay for the services provided. What if we could get the same valuable services we rely on from social media or other community platforms, but without giving the data away?

Freedom from giant data harvesting factories will take genuine data decentralization

The Cambridge Analytica incident is one of the clearest demonstration to date that technology meant to work on an individual level is routinely repurposed and exploited after it becomes available in a central repository. Data collected under the guise of a psychological test became one of the sources for a vote manipulating strategy. What we were told were privacy settings, were, in fact, publicity settings with no regards for actual privacy of data.

Centralization has its place

Sometimes, centralization is very effective, for example when collaborating on cleaning up and editing a common database which aims to hold some form of truth. OpenStreetMap or Wikipedia are good examples of this.

Community activity does not need centralization

Very often, however, data is only collected in a central place to enable the activities of a community or a social network, collecting individual activities, links, data, etc. As the size of such a database grows, a new purpose always emerges: mining by the collecting party and turning the locked-in community of users into a very profitable product.

Users of social networks or B2B giant databases have a choice: stop using the service and lose access to both their community and very valuable functions, or accept the Faustian bargain, submit to the monopoly and upload their data into a black box that will extract as much value as it can with little regards to privacy: Facebook, Waze, Google, Amazon, Apple, Booking.com, Hotels.com, Airbnb, Uber…

Sharing #deletefacebook does not free anyone, it does, however, signal to Twitter that user will react positively to ads which equate SUVs to freedom.

​Data should remain local

Unless the data itself is under the control of its creator, and remains there, it is impossible to claim each user can have full control of confidentiality. What is needed is for community members to store and process their data locally for their purpose. Furthermore they should drive value themselves by providing services such as responding to external requests for a fee, including from advertisers.

Data nodes vs. data centers

Storing and processing data locally implies running one’s own node or hiring one: a private datacenter. A technology, dockers, makes it trivial to scale a service from a tiny 20$ Raspberry Pi to the massive datacenters AWS or competitors provide. Lack of hardware dependency makes the actual service both cheap and easy to migrate, therefore giving the owner total and ongoing control on where and how data is stored and processed.

Leak proofing data Locally stored data can still provides the same level of service, in particular to its creator, without exposing a single point of failure to attacks, hack and breaches. When data and its processing is distributed, the risk of leakage, and therefore repurposing of the data, is contained. The result of a narrow query is unlikely to be reused: who else but a direct competitor would want for example to target women living in the Memphis interested in waxing ? Inversely, a suspiciously broad, raw data leaking query can be easily detected and prevented by the community before it occurs.

Qualifying sources With data solidly decentralized, processing and requests have to be distributed to where the data is. The first step in processing data for a purpose is identifying the appropriate data sources. Some of the metadata used to qualify are simple (sex, state…) but more complex requests (person with 5 friends in the same city) can require extensive processing. Providing such processing for free would not be sustainable for the nodes.

A distributed business model Distributing queries to the data incurs processing costs that need to be accounted for and passed onto the responsible party. Obviously, the value created by the data for its client also needs to be distributed fairly. Clients of the distributed data mining need to make a large number of small payments to individual nodes instead of a single large one to a giant database.

Enters blockchain Decentralizing storage and processing has existed for years, but such a business model was not possible as the banking system is not designed to handle what needs to be a massive number of transactions. Blockchain opens up the possibility of such decentralized trading between untrusted parties. Moreover, blockchain technology ensures the immutability of data, regardless of where it is stored. Blockchain committed digital data fingerprints ensure tampering will be detected even on distributed untrusted storage.

Local data labs vs. global data factory With these critical missing pieces, the time has come to introduce a new class of data collection and sharing community: a decentralized, blockchain based community of data labs. Data remains local and instead queries are distributed from the client to data holding nodes: data labs, for a fee. Precise queries are unlocked by discrete payments, making access control a lot more granular and secure than granting authenticated API access.