The race for use of Artificial Intelligence to execute policy, drive future economic infrastructure, and provide automated equipment for man-machine teaming for the U.S. Department of Defense is in full swing. What feeds the development of Artificial Intelligence is massive amounts of data. The use of data, both publicly available and private, is and will continue to be a tense conversation. If the originator of that data is the owner of that data, does that mean no one else can use it unless explicitly given permission? Are big data companies such as Facebook, Amazon, and Google doomed to potentially devastating data privacy laws which would limit their ability to compete against companies like Baidu and Alibaba in China, where private data (in the eyes of the government) doesn’t exist? Prime Minister Xi Jinping of China has stated his goal for China to become the world leader in AI by 2030. China will not limit AI development by preventing AI leaders in Baidu and Alibaba to use public, proprietary, or even private data.

If data is the new currency, and much of the United States’ data is inaccessible due to privacy laws, how can America compete? Western civilization and liberty centric ideologies have had to face this kind of question since their inception. The fragile balance of capitalism, liberty, and military may have clashed over the past 250 years but the ingenuity of free individuals have overcome those limits. Recent developments in blockchain and computer science could yet again preserve the fantasy of having the best of both worlds. In this case, the best of both worlds would be using data while maintaining privacy, and even control. Is it really possible to use the data without seeing it? Can AI provide analysis over data that isn’t fully accessible?

Source: Slator.com

Machine Learning and Artificial Intelligence are the current “buzz phrases.” Say this and Google may acquire your intellectual property before ink hits paper. It’s going to be part of our future but what does it mean? First, Machine Learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to “learn” with data, without being explicitly programmed. Machine Learning is the process that teaches AI. Artificial Intelligence is the ability to make decisions off of learned techniques. Machine Learning can be effectively broken down into three subsets:

Supervised: Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.

Unsupervised: Unsupervised machine learning is the machine learning task of inferring a function that describes the structure of “unlabeled” data (i.e. data that has not been classified or categorized).

Reinforcement: The idea behind Reinforcement Learning is that an agent will learn from the environment by interacting with it and receiving rewards for performing actions. Reinforcement Learning is just a computational approach of learning from action. Reinforcement Learning requires

Source: CleverRoad.com

Each of these subsets of Machine Learning need a healthy amount of data. Just how much data depends on the complexity of the problem you’re trying to solve. Per Caltech Prof. Yaser Abu-Mostaf, “the answer is that as a rule of thumb, you need roughly 10 times as many examples as there are degrees of freedom in your model.” In layman’s terms, an immense amount of data. If interested parties, to include the U.S. Government, want to use Artificial Intelligence to more efficiently govern, the data needed to provide an answer with accurate results will certainly be in the terabytes and petabytes realm. It would be too expensive and too time consuming to think that governing bodies, healthcare data scientists, or defense intelligence specialists would be able to process this data by themselves. Artificial Intelligence and big data tools are the answer to that problem. Even more so, trying to scrub all data that may contain personally identifiable information would prove to be too costly and insurmountable when in competition with those that ignore privacy ethics.

According to Irving Wladawsky-Berger, Visiting Lecturer in Information Technology at the MIT Sloan School of Management, whether physical or digital in nature, identity is a collection of information or attributes associated with a specific entity. Identities can be assigned to three main kinds of entities: individuals, institutions, and assets. For individuals, there are three main categories of attributes:

Inherent attributes are intrinsic to each specific individual, such as date of birth, weight, height, color of eyes, fingerprints, retinal scans and other biometrics.

Assigned attributes are attached to individuals, and reflect their relationships with different institutions. These include social security ID, passport number, driver’s license number, e-mail address, telephone numbers, and login IDs and passwords.

Accumulated attributes have been gathered over time, and can change and evolve throughout a person’s lifespan. These include education, job and residential histories, health records, friends and colleagues, pets, sports preferences, and organizational affiliations.

If we look to the recently passed General Data Protection Regulation (GDPR) we’ll see the need for the ability to use the data within the confines of the law. The General Data Protection Regulation (GDPR) (EU) 2016/679 is a regulation in EU law on data protection and privacy for all individuals within the European Union(EU) and the European Economic Area(EEA). It also addresses the export of personal data outside the EU and EEA areas. The GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU. GDPR limits Artificial Intelligence’s benefits on economic development and governing. The ethical data dilemma extends to corporations and companies that are reluctant to provide data due to fears of proprietary espionage. With all of these situations in mind, we look to privacy preserving technology to potentiate AI and comply with pre existing and future data privacy laws. Two such technologies are being developed out of the “Cryptocurrency” space to include Trusted Execution Environments (TEEs) and Secure Multi-Party Computing (sMPC). Both of these examples will be involving blockchain architecture.

A blockchain originally is a growing list of records, called blocks, which are linked using cryptography. Blockchains which are readable by the public are widely used by cryptocurrencies. Blockchains have been on the rise and FINTECH companies are salivating at the potential use of blockchains. One of blockchains features is that the information is inherently public in an effort to perform consensus. Obviously this poses a problem for any organization trying to protect private data or proprietary code and build decentralized applications on the blockchain. At least one project is trying to solve this problem for blockchain developers, MIT’s Enigma Project. Enigma is in the process of developing what are called “Secret Smart Contracts.” The original idea of “Smart Contracts” stems from Ethereum, a decentralized platform for applications that run exactly as programmed without any chance of fraud, censorship or third-party interference. Enigma is building off of the “Smart Contracts” concept and trying to improve on it by making it usable to organizations that need to preserve data privacy. Enigma aims to develop a TEE and sMPC infrastructure to give developers the choice to choose between the benefits of each model.

TEE: A TEE provides a fully isolated environment called an enclave that prevents other applications, the operating system, and the host owner from tampering with or even learning the state of an application running in the enclave. A TEE thereby provides strong confidentiality for smart contract data that blockchains cannot. Unfortunately, a TEE alone cannot guarantee availability or provide secure networking or persistent storage. Thus, it cannot alone achieve blockchains’ authoritative transaction ordering, persistent record keeping, or resilience to network attacks.”

Enigma’s TEE: MIT’s Enigma network provides a permissionless peer-to-peer network that allows executing code (secret contracts) with strong correctness and privacy guarantees. Another way to view the network is as a smart-contract platform (e.g., similar to Ethereum) that enables the development of decentralized applications (dApps), but with the key difference that the data itself is concealed from the nodes that execute computations. This enables dApp developers to include sensitive data in their smart contracts, without moving off-chain to centralized (and less secure) systems.

Enigma plans to incentivize the creation of “nodes” which will perform computation of data and execute the “Secret Smart Contracts.” Nodes will be created by individuals participating in the decentralized networks by providing a proof of stake (PoS). Proof of Stake (PoS) concept states that a person can validate block transactions according to how many coins they hold. This means that the more Enigma tokens are owned by a stakeholder, the more validation power the node has. Each system would require use of their tokens to provide to the nodes as economic incentives to provide the computation. As of now, these TEE require additional software security features like Intel Software Guard Extensions (SGX). Intel SGX is a set of central processing unit (CPU) instruction codes from Intel that allows user-level code to allocate private regions of memory, called enclaves, that are protected from processes running at higher privilege levels. Still, these are subject to side channel attacks and are not a full solution. TEEs, were they perfect, would provide us with a fast general-purpose computing schema which preserved privacy over data, however, there exist subtle flaws within TEE that allow for pulling out information through side channels. Enigma takes security one step further with the development of sMPC.

Enigma’s TEE and sMPC design will be able to be blockchain interoperable and agnostic where it serves as an extension to blockchain platform for off-chain computations. It does not need to be the 100% solution to all of blockchains problems but it solves scalability and privacy issues that have limited blockchain’s adoption. To ensure that data stays secure, information can be encrypted before being sent to the network and this off-chain layer is responsible for distributing this data across Enigma’s nodes and keeping it private. The blockchain’s public ledger only stores references to this data to provide proof of storage, but none of the data itself is public–it remains obfuscated, private, and split-up on the off-chain network.

Source: Microsoft

Enigma is looking to use sMPCs and an off-chain distributed hash-table (DHT) to ensure data privacy. The MPCs distributes data between nodes on the network, splitting the encrypted information into separate pieces to ensure its safety. The DHT, then, is responsible for storing this data in an off-chain database. The DHT stores the data while MPCs are responsible for handling and retrieving it, while both ensure that the data handled remains completely private. According to Enigma’s whitepaper, all data “is split between different nodes, and they compute functions together without leaking information to other nodes. Specifically, no single party ever has access to data in its entirety; instead, every party has a meaningless (i.e., seemingly random) piece of it.”

Machine Learning using Enigma’s sMPC to preserve privacy comes with the consequence of losing speed but not with TEE. According to CEO of Enigma, Guy Zyskind, “It (RL) would potentially be computationally expensive, and for this reason I believe having both TEE/MPC powered implementations is compelling for developers.” Enigma gives developers the ability to choose between TEE and sMPC depending on their situation and what best fits their applications. Other TEE projects are in the works but do not offer the flexibility that Enigma is developing.

Using Enigma’s technology, governments and organizations that value, or at least compelled to by law, can use this infrastructure to train AI on private data. This technological breakthrough could solve a multitude of issues and enable progression while simultaneously preserving the rights of citizens. Developing a technology that allows someone to conduct analysis on data that they cannot see is no easy feat nor is it a quick process. Fortunately, Guy Zyskind and MIT’s Dr. Alex “Sandy” Pentland have been working on this for years. Enigma’s mainnet for their TEE is set to be released in Q3 of 2018 and the mainnet for sMPC is scheduled for release in Q1 2019. The potential for this project, and project like these, cannot be overstated nor the potential fully understood but it could be a critical piece of technology that allows the U.S., Europe, and other privacy valued unions to keep pace with those who don’t.

Sources: Enigma White Paper, Enigma Developers Forum, TheCryptoRealist, https://gdpr-info.eu/

If you’re interested in the Enigma Project, consider resources:

Telegram: t.me/EnigmaProject

Reddit: reddit.com/r/EnigmaProject

Twitter: twitter.com/enigmampc

Discord: https://discordapp.com/invite/SJK32GY