Ramifications

The legal ramifications of this precedent could have far-reaching consequences. The decision of the 2nd District Court has given somewhat of a green light to tech companies to use copyrighted material in the development of deep learning algorithms largely because the use of this does not directly affect the earnings of the individual articles under the copyright. If I wrote one of the books that Google used to train their algorithm, I suffer no adverse effects from their use of my book in the training of their algorithm.

Silicon Valley is part of the 9th District Court, meaning that this is not a precedent there, but it does provide companies that are thinking of utilizing copyrighted data in their models with additional confidence.

One could then assume that this precedent would also extend to images, songs, and potentially any other data produced by individuals that is accumulated by tech conglomerates.

Things get more interesting when we go from a search algorithm, which are discriminative algorithms, to generative algorithms.

A discriminative algorithm takes the original data and essentially tries to break it down into a single result — think of a classification algorithm taking a data point and putting it into a certain group.

A generative algorithm takes the original data and uses this to make new data. In this sense, it is a data-generating process. Deep generative models such as generative adversarial networks and variational autoencoders are commonly used for generating and manipulating image data.

The Google Book Search algorithm is clearly a discriminative model — it is searching through a database in order to find the correct book. Does this mean that the precedent extends to generative models? It is not entirely clear and was most likely not discussed due to a lack of knowledge about the field by the legal groups in this case.

This gets into some particularly complicated and dangerous territory, especially regarding images and songs. If a deep learning algorithm is trained on millions of copyrighted images, would the resulting image be copyrighted? Similarly with songs, if I created an algorithm that could write songs like Ed Sheeran because I had trained it on his songs, would this be infringing upon his copyright? Even from the precedent set in this case, the ramifications are not completely clear, but this result does give a compelling case to presume that this would also be considered acceptable.

Of course, one could take a different view that using generative models and trying to commercialize these would directly compete with the copyrighted material, and thus could be argued to infringe upon their copyright. However, due to the black-box nature of most machine learning models, this would be extremely difficult to both prove and disprove, which leaves us in some form of limbo regarding the legality of such a case.

Until some brave soul goes out and tries generating movies, music, or images based on copyrighted material and tries to commercialize these, and is subsequently legally challenged on this, it is hard to speculate upon the legality of such an action. That being said, I am absolutely sure that this is not a matter of if, but when, this particular case will arrive.

The important things to take away from this case are: