Blog Post

AEIdeas

As regulators and policymakers set their agendas for 2019, privacy and the rights and obligations of the collectors of personal information will certainly be a priority. Interest is also sure to increase in how to govern the environment in which big data and artificial intelligence (AI) applications use personal information to create and distribute new wealth and welfare.

What principles should guide this endeavor?

There is a fine line to tread between an overly cautious approach that locks up personal data to the extent that the development and use of value-enhancing propositions are prevented (as has been argued with the European Union’s General Data Protection Regulation, or GDPR) and an overly liberal one in which the owners and operators of big data and AI applications face few impediments to using those tools in ways that are harmful to the long-term interests of the consumers that policymakers and regulators are charged with protecting.

At the core of the regulators’ dilemma in regulating big data and AI is an array of information asymmetries — in which one party to an interaction knows important relevant information that the other does not. Information asymmetries are not new — having been a feature of interactions between human actors from the beginning of time — but they have taken on a new significance in the world of AI and big data. For the most part, both the data and the algorithms used to manipulate them are opaque to both consumers and regulators charged with protecting consumers’ interests. Unsurprisingly, in the face of opacity, fears about potential abuses have proliferated. And somewhat ironically, these fears have likely had a chilling effect on the development and use of beneficial big data and AI applications, because consumers (and their agents) are unable to ascertain what their potential effects might be — in effect, a twenty-first century “market for lemons,” where poor-quality (exploitative) applications can crowd high-quality (benevolent and beneficial) applications out of the market.

Arguably, if the source of the problem is information asymmetry, one solution might be to require specific relevant information to be disclosed. Information disclosure is one of the least intrusive forms of regulation (albeit that it is not always easy to monitor and enforce compliance). Classic examples include the requirements for publicly traded firms to disclose financial and other performance information in a timely manner to those with vested interests in that performance (e.g. shareholders, banks, other creditors, employees, and taxation authorities), and the obligation for medical providers to furnish full information about the risks and benefits of care options to patients. Changing the distribution of information at least partially redresses the imbalance in the “market for lemons,” allowing easier identification of the beneficial and harmful applications.

What role then might information disclosure play in regulating big data and AI?

One possible place to start is with the algorithms used to manipulate personal data. Average consumers may not have the skills to interpret the source code for big data and AI applications, but if (some) source code was required to be revealed, markets would develop for suitably skilled independent individuals to interpret and report on it to interested parties, in exactly the same way mandatory disclosure of financial information has fueled the vibrant investment analyst industry.

However, those who have developed the algorithms are unlikely to be happy with such impositions. They may claim that mandatory disclosure requirements will compromise their intellectual property. There is some validity to this claim, as unlike the disclosure of financial data, the source code for big data and AI applications could be appropriated by rivals.

So if mandatory disclosure is to be required, it must come with some safeguards, reminiscent of the protections granted to those seeking patents and holding copyrights.

A solution in “source-available” agreements?

Fortunately, some precedents already exist in the broad range of governance arrangements for software licensing. While under the classic open-source software agreement the copyright holder grants users the right to “study, change, and distribute” the source code, under a more restrictive source-available agreement, the source code can be viewed, but the user has “no legal rights to use, share, modify, or even compile it.” Variations of source-available agreements have been used by a wide variety of software developers (including Microsoft) when collaborating with others on projects. The copyright belongs with the original firm, and users’ rights are strictly constrained.

At best, if AI and big data developers have nothing to fear from sharing their source code, because they have no intentions of taking advantage of the information asymmetry with end consumers, then they may voluntarily reveal the code under “source available” agreements. That is, the action may function as an incentive-compatible signal of high-quality, non-exploitative applications, in the same manner as observed with warranties attached to used cars in the original market for lemons. In this case, regulation may not be necessary. However, if such voluntary arrangements do not materialize, then regulators and policymakers may like to consider the potential for mandating the disclosure of AI and big data algorithms under agreements of this type.