The Discriminator(s)

A discriminator is a deep learning model that classifies (discriminates) between samples that result from differing generative processes. Discriminators are commonly used in Generative Adversarial Neural Networks (GANs) which work in conjunction with a generative model to iteratively improve the generation process, making its outputs more and more similar to the “real” examples. A similar type of discriminator was appealing for the purpose of identifying texts generated by GPT-2, but the process used did not involve the iterative training of a generator and discriminator in concert. Further, it is difficult for humans to identify generated texts, so there was much skepticism around the potential performance of such a model.

To approach the problem from a different angle, if generated texts and real articles shared the same subject matter, then a measure of similarity between generated texts and real counterparts might lend some insight into the features of generated texts. The first approach used to compare texts on the basis of “similarity”was a modified metric called bertScore which calculates the cosine similarity of words in two sentences.

This implementation, although on its surface seems promising, does not account for the “creativity” exhibited by the generative model used. While similarity can indeed be calculated and further analyzed, the generated texts diverged dramatically from the original texts in the content. A seed sentence taken from a piece of news about a new scholarship recipient might result in a generated text about a jewelry heist by a young and brilliant mastermind. Evaluating similarity in vastly different contexts proved unhelpful in providing insight, as the metric simply reports that the generated texts are very different from the original articles.

In evaluation of this similarity metric, it was observed that generated texts tended to use less complex vocabulary and often relied on repeating phrases. By contrast, the original news articles were more apt to flow through a logical story line, passing through diverse topics and ideas along the way. This observation spurred additional research into tools like the Giant Language Model Test Room which appeared to also hinge on complexity and diversity in generated texts.

The Giant Language Model Test Room (GLTR) was created through collaboration between Harvard NLP and MIT-IBM Watson AI Lab. This tool utilizes a model, either BERT or GPT-2, which goes word by word to see the probability of that word being picked next in the sentence. In the pictures below, green represents a word in the top 10 of most likely words, yellow is top 100, red is top 1000, and purple is greater than 1000. This means that the green words are something that the model would likely output as the next word and, as the word becomes less likely to be selected by the model, it falls into one of the other color bins. For example, the picture below is a snippet of a New York Times article written by a human. This article has several predictable words, but also includes many words that a generative model would be less likely to choose. That is because humans do not consider the most likely next word when writing; they consider which word best fits the context of the writing and the idea they are trying to convey.

The difference in word distributions is shown in the four images shown. When using the GLTR tool with a GPT-2 discriminator, fake articles (top-left) have a much higher distribution of highly-predictable words than their real counterparts (top-right). A similar pattern was found when using the BERT discriminator on the fake (bottom-left) and real (bottom-right) texts. With both discriminators there is a clear difference between the word choice of generators and humans. This supports our intuition that a generator will only construct text based on probability of words and not based on context. Although this project did not develop a numerical metric for determining if a text was computer-generated, looking at the output from GLTR can provide some insight.

In response to the limitations described above, a second kind of discriminator was sought. In researching fake news discriminators, some authors shared their success using a BERT encoder combined with some kind of classifier. A combination of the BERT encoder and BERT Binary Classifier were chosen for their output/input compatibility and the classifier was implemented in Python following this example. Although the author of the original Medium article expressed high hopes for this classifier, the model was computationally expensive (taking an estimated 15–21 hours to run one epoch), and in the experiment run for this project, was unable to classify “real” and generated text at a better rate than simply guessing. It is possible that our implementation was too simple, and should have included additional feature engineering on the inputs before passing to the BERT encoder. Alternatively, this feature augmentation could have been useful in between the BERT encoder and BERT Binary Classifier. However, the work done for this project provided no promising indication that this additional effort might generate the desired gains in accuracy.

The shortcomings of the BERT classifier are an indication of the excellent performance of state-of-the-art natural language generation models. There is potential in a discriminator which relies on measures of complexity and creativity, combined with a measure of overall coherence. Models like GPT-2 could fool a discriminator which simply sets a threshold for how often words are not in the top 10 or top 100 most-likely-to-be-chosen. One might do this by increasing the temperature of the model, giving it more freedom to explore words and concepts that the discriminator might deem “less likely”. But if the discriminator could also measure coherence of the entire piece, then it would more easily identify those texts as generated because more “creative” texts also tend to make the least sense to a person reading them.