49 Pages Posted: 7 Sep 2017

Date Written: September 4, 2017

Abstract

As automation supplants more forms of labor, creative expression still seems like a distinctly human enterprise. This may someday change: by ingesting works of authorship as “training data,” computer programs can teach themselves to write natural prose, compose music, and generate movies. Machine learning is an artificial intelligence (AI) technology with immense potential and a commensurate appetite for copyrighted works. In the United States, the copyright law mechanism most likely to facilitate machine learning’s uses of protected data is the fair use doctrine. However, current fair use doctrine threatens either to derail the progress of machine learning or to disenfranchise the human creators whose work makes it possible.

This Article addresses the problem in three parts: using popular machine learning datasets and research as case studies, Part I describes how programs “learn” from corpora of copyrighted works and catalogs the legal risks of this practice. It concludes that fair use may not protect expressive machine learning applications, including the burgeoning field of natural language generation. Part II explains that applying today’s fair use doctrine to expressive machine learning will yield one of two undesirable outcomes: if US courts reject the fair use defense for machine learning, valuable innovation may move to another jurisdiction or halt entirely; alternatively, if courts find the technology to be fair use, sophisticated software may divert rightful earnings from the authors of input data. This dilemma shows that fair use may no longer serve its historical purpose. Traditionally, fair use is understood to benefit the public by fostering expressive activity. Today, the doctrine increasingly serves the economic interests of powerful firms at the expense of disempowered individual rightsholders. Finally, in Part III, this Article contemplates changes in doctrine and policy that could address these problems. It concludes that the United States’ interest in avoiding both prongs of AI’s fair use dilemma offers a novel justification for redistributive measures that could promote social equity alongside technological progress.