Human-like creative writing is just one of three reasons why OpenAI’s latest research into AI for text is hugely significant.

Image courtesy www.craftyourcontent.com

CognitionX’s weekly news briefings

This work is part of CognitionX’s research into the impact of AI on speech and text. We have a weekly news briefing on this and 5 other topics. Sign up here.

CogX: our annual festival of AI

If you find the topic interesting, then consider coming along to CogX 2019: our annual festival of AI. We hold plain-English conversations of substance about the business impact of AI — 6500 came last year, expecting 15,000 this year. We have secured a growing number of top speakers, and will have a strong theme on AI for speech and text.

Our annual festival of AI. London, June 10–12, early bird ticket pricing still available

What happened that’s so important?

This week, OpenAI announced their latest progress using AI to understand everyday naturally-spoken language. Called GPT-2, it gained a lot of attention because while they published their papers and findings, they withheld some of the materials. They did this because of concerns around abuse of their work, which produces written prose that is, generally, indistinguishable in quality from what a human may write. Let’s see an example:

Human-authored start of story:

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. GPT-2’s response

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow. Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez. Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them — they were so close they could touch their horns. While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.” Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America. While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.” However, Pérez also pointed out that it is likely that the only way of knowing for sure if unicorns are indeed the descendants of a lost alien race is through DNA. “But they seem to be able to communicate in English quite well, which I believe is a sign of evolution, or at least a change in social organization,” said the scientist.

What does state-of-the art look like anyway?

CognitionX follows research into AI for speech and text closely (NLP, chatbots, etc) and there are a few frontiers being explored currently:

Structured data narratives eg. WordSmith (see also Arria and Narrative Science)

eg. WordSmith (see also Arria and Narrative Science) Art, in the form of free-form prose , e.g. Ross Goodwin’s 1 the Road in partnership with Google

, e.g. Ross Goodwin’s 1 the Road in partnership with Google General conversation chatbots (socialbots) e.g. Replika or Microsoft Zo.ai.

e.g. Replika or Microsoft Zo.ai. Debates: IBM’s debater project is possibly the closest in quality of text to GPT-2 but closed source: the debater constructs, on the fly, narratives that are not only understandable, but result in understanding a debate’s moot, the competitor’s claim, and an ability to construct a counterclaim. This is super impressive, for that particular specialised task, “debating”.

What that GPT-2 sample shows is astonishing. The first few things that particularly struck me with the prose being created are:

The length and fluency of the writing : while free-form systems have been generating sentences that can be grammatically-correct for some time, the sentences are rarely particularly long or well-structured.

: while free-form systems have been generating sentences that can be grammatically-correct for some time, the sentences are rarely particularly long or well-structured. Coherence: stability of characters and overall theme. The piece above is typical of the writing being produced by GPT-2 in its consistent reference to the central theme and particular characters. It’s about South America, it’s about unicorns, and it’s about them speaking English. Other systems typically introduce random incongruities that are confusing and disorienting, even in a single sentence.

The piece above is typical of the writing being produced by GPT-2 in its consistent reference to the central theme and particular characters. It’s about South America, it’s about unicorns, and it’s about them speaking English. Other systems typically introduce random incongruities that are confusing and disorienting, even in a single sentence. It makes sense: the reasoning over the generated prose makes sense. I read the story above and thought, that’s completely well-reasoned, plausible story.

This is a huge tipping point.

Yoshua Bengio, in a recent interview, shied away from the value of big announcements and underscored the reality of research being of incremental progress and collaboration:

Science moves by small steps. Thanks to the collaboration and community of a large number of people interacting and all the scientist who are experts in their field kind of know what is going on even in the industrial labs. Information flows and leaks and so on …

And yes, this work is no different, drawing upon 74 other papers, 7 of which were this year, nearly 60% of which were written since 2017. Source

That said, tipping points do happen. There are certain points where something new, an increment on other work, nonetheless captures the imagination of the general public in a new way, or, looking back years later, is seen as clearly a critical piece in the puzzle from which further work was then only possible.

I would argue that for the points I highlighted above, OpenAI GPT-2 is hugely significant.

It’s multi-talented too!

But there’s MORE. GPT-2 is not just good at one thing, it is state of the art in many different ways. Here’s a quick recap of how AI is working:

AI v1 : The first wave of AI has been about getting really good at a single task (e.g. translation, or summarisation; see below for examples). Benchmarking and competition is about performance against standard data for that specific result.

: The first wave of AI has been about getting really good at a single task (e.g. translation, or summarisation; see below for examples). Benchmarking and competition is about performance against standard data for that specific result. AI v2: Multi-task learning: a single design works with different tasks at the same time. Coming with this space in 2018 were multi-task benchmarks: DecaNLP and GLUE.

Multi-task learning: a single design works with different tasks at the same time. Coming with this space in 2018 were multi-task benchmarks: DecaNLP and GLUE. Future: AGI? Past multi-task learning it’s unclear. But the ultimate goal is artificial general intelligence, where software behaves as well or better as a human.

Though not the first, we’re squarely into AI v2 with this new work from OpenAI. GPT-2 bettered dedicated task solutions in an incredible seven of the eight NLP tasks below.

Creative writing of new text (as shown above, the original focus of the research)

(as shown above, the original focus of the research) Children’s Book Test : replace the missing word correctly from the sentence 93.3% common nouns, 89.1% named entities

: replace the missing word correctly from the sentence 93.3% common nouns, 89.1% named entities Long-range text coherence : referring to things accurately across broad stretches of text: LAMBADA 99.8 => 8.6 perplexity (prediction accuracy, 0 = perfect), 63.24% accuracy.

: referring to things accurately across broad stretches of text: LAMBADA 99.8 => 8.6 perplexity (prediction accuracy, 0 = perfect), 63.24% accuracy. Understanding ambiguous statements (Winograd Schema Challenge): 70.7%, +7%

(Winograd Schema Challenge): 70.7%, +7% Reading Comprehension (Conversational Q&A, or CoQA): F1 of 55. BERT is 89, but this required 127,000 manually collected samples. Key observation here was that while research into conversational dialogs is great, it may not strictly be necessary if you can generate intelligent conversation capability from a pool of text.

(Conversational Q&A, or CoQA): F1 of 55. BERT is 89, but this required 127,000 manually collected samples. Key observation here was that while research into conversational dialogs is great, it may not strictly be necessary if you can generate intelligent conversation capability from a pool of text. Summarisation : progress but not state of the art. This is hugely difficult and while the benchmarks say it’s low, the examples provided in the paper are super impressive.

: progress but not state of the art. This is hugely difficult and while the benchmarks say it’s low, the examples provided in the paper are super impressive. Translation : surprisingly good (given all non-English content was removed), not state of the art yet

: surprisingly good (given all non-English content was removed), not state of the art yet Answering factual questions about paragraphs of text: Question Answering (SQUAD 1.1 and 2.0)

One more thing… it learnt by itself!

But wait! There’s more! The third astonishing thing was that this system did not require any human involvement. This is when the wheels of the AI aeroplane really take off and becomes vastly more scalable. While BERT, a previous multitask contender could also do this, it’s no less amazing.

Should they have held the code back? It’s up for debate.

To to recap there are three things that made this particularly astonishing:

Human-like performance in creative writing

Exceptional versatility : amazing results in 7 other natural language processing tasks (translation, summarisation etc)

: amazing results in 7 other natural language processing tasks (translation, summarisation etc) Learned by itself: the dirty secret of today’s AI is that most of it is seeded from crazy numbers of hours of real people labelling things. That’s a cat, that’s a dog, that is the right answer to that question, etc. The holy grail in many respects is a system that can learn by itself.

Should they release it? Here’s why they held the full version back:

“We can anticipate how these models may be used for malicious purposes, and can conceive such systems being used to generate misleading news articles; impersonate others online; automate the production of abusive or faked content to post on social media; and other as-yet unanticipated uses,”

— Jack Clark, OpenAI Policy Director

I got one perspective recently that resonated: with the information that’s shared, all the major players can retrain. Pay the $US50k training costs (GPU and other infrastructure) and get on with it. So by not sharing it, it actually did the opposite of democratise their results. Food for thought.

Learn more

OpenAI original blog post

Open source demo (“small” version; large is what they have held back)

Keep up to date! CognitionX’s weekly news briefings

This work is part of CognitionX’s research into the impact of AI on speech and text. We have a weekly news briefing on this and 5 other topics. Sign up here.

Learn from the pros: CogX: our annual festival of AI

CogX is our annual festival of AI for holding plain-English conversations of substance about the business impact of AI. We have secured a growing number of top speakers, and will have a strong theme on AI for speech and text.