Tools, For Good or Ill?

This month, OpenAI and IBM each demonstrated a very impressive machine learning project. These projects are perfect exhibits for demonstrating the potential power of AI tools. As with most tools, they can be used positively or negatively. A hammer can be used to drive a nail or injure a person. Unlike a hammer, however, these tools can be scaled virtually without limit, a feature which gives them the potential to have massively positive or negative effects. In this post, I use these new AI tools as examples to explore why scale makes AI tools different than most.

The Projects

GPT-2

https://blog.openai.com/better-language-models/

OpenAI announced a powerful language modeling project which has several very impressive abilities:

Using a short prompt as a seed, generate a story that reads like it was written by a human. Accurately answer questions about a passage of text. Comprehend ambiguous statements. Answer factual questions. Predict the next word in a sentence.

The OpenAI blog linked above includes samples for each of these abilities. I will repeat one example for the first ability.

Prompt: A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its whereabouts are unknown.

The incident occurred on the downtown train line, which runs from Covington and Ashland stations. In an email to Ohio news outlets, the U.S. Department of Energy said it is working with the Federal Railroad Administration to find the thief. “The theft of this nuclear material will have significant negative consequences on public and environmental health, our workforce and the economy of our nation,” said Tom Hicks, the U.S. Energy Secretary, in a statement. “Our top priority is to secure the theft and ensure it doesn’t happen again.” The stolen material was taken from the University of Cincinnati’s Research Triangle Park nuclear research site, according to a news release from Department officials. The Nuclear Regulatory Commission did not immediately release any information. According to the release, the U.S. Department of Energy’s Office of Nuclear Material Safety and Security is leading that team’s investigation. “The safety of people, the environment and the nation’s nuclear stockpile is our highest priority,” Hicks said. “We will get to the bottom of this and make no excuses.

Some observations:

It knows Cincinnati is in Ohio.

It quotes the same person twice, demonstrating a consistent memory.

It intersperses the article with facts and quotes like a real news article.

The quotes sound real.

The departments, commissions, and administrations are real.

On a quick glance, I could mistake this for real news.

IBM’s Project Debater

LIVE DEBATE – IBM Project Debater

Project Debater is a program that can hold a debate, in real time, with a Human. In this live debate, Project Debater debates a world-class, human debater (Harish Natarajan). While Harish “wins” the debate, Project Debater demonstrates abilities that rival even those of GPT-2:

Argue the position it is given. Speak in well-formed English sentences. Employ both factual and moral arguments. Hold a consistent position, never presenting arguments that weaken its own position. Rebut Harish’s arguments, making comparative statements that contrast its own arguments with those of Harish.

The topic of the debate was “We should subsidize preschools.” Project Debater was tasked with arguing for the motion, Harish with arguing against. Each side had 15 minutes to prepare arguments. Below are some choice Project Debater quotes with my own commentary added.

I sometimes listen to opponents in wonder, what do they want? Would they prefer poor people on their doorsteps begging for money? Would they live well with poor people without heating and running water? Giving opportunities to the less fortunate should be a moral obligation of any human being and it is a key role for the state. To be clear, we should find the funding for preschools and not rely on luck or market forces.

Note the use of a moral argument.

Its English is mostly well formed (punctuation is my own but you can hear it in the speech).

I think that Harish Natarajan raised the following issue: there are more important things than preschools to spend money on. The state budget is a big one and there is room in it to subsidize preschools and invest in other fields; therefore, the idea that there are more important things to spend on is irrelevant because the different subsidies are not mutually exclusive.

The re-statement of Harish’s argument is accurate.

The use of non-zero sum argumentation is very impressive. I’ve had arguments with adults who failed to understand non-zero sum situations. Even more impressive is that this is a valid rebuttal of Harish’s position.

In December 2015, researchers at Duke University concluded that investing in preschool helps both students and educators. Long term, they found that students who enroll in preschool education are 39 percent less likely to be placed in special education programs as third graders.

This is a good use of evidence that supports the position Project Debater is tasked with arguing for.

Optimism

My initial reaction to both of these demonstrations was pure wonder. These are brilliant pieces of technology that have amazing potential for good. Off the cuff, I can come up with a bunch of valuable applications that hardly scratch the surface of the types of problems these tools could help solve.

The stories by GPT-2 could be written by a human. Some of them are even entertaining. Imagine:

Assisted writing: An author comes up with ideas and GPT-2 helps write the story.

On-demand fiction: You input a prompt to GPT-2 and it writes a story for you.

Better news: A journalist gives GPT-2 data and a short description, the system emits a news story.

GPT-2’s other abilities have amazing potential as well. Imagine:

Study: A student feeds it a book and can ask questions about the text.

Grading: A teacher feeds it an essay and can ask questions about the text.

Language comprehension: Non-native speakers can ask it to resolve ambiguous sentences.

Help chat: Dialog based help systems that find and contextualize answers from help pages instead of requiring you to find the information.

Q&A: Answer any question instantly (I guess we have this one with Search Engines?)

The arguments made by Project Debater are rational, consistent, and well-formed. Imagine:

Belief Augmentation: Ask the system to debate itself from both sides of an argument. A human can decide which side they think has better arguments. The AI system lacks bias for either side which may engender confidence in the presented arguments.

Public Debates: Project Debater could be used as a real-time fact checker for live debates. It could also bring up alternative views not already mentioned by the humans doing the debating.

Augmented debates: Like with Chess, a human augmented by Project Debater would likely be an even more effective debater than just a human or just Project Debater.

Pessimism

My second reaction to these technologies was to imagine them in the hands of people with malicious intent. Imagine these worlds. With these technologies, a single person might be able to wreak havoc.

Fake News 2.0:

GPT-2: Generate hundreds of news articles with fake news that supports their agenda, posted to multiple locations.

Project Debater: Write bots that argue rationally for these articles on social media, forums, article comments, etc.

Deep Fakes: Generate images (videos are not likely to be far off) that go with the articles/comments.

Automated Phishing:

GPT-2: Write new email scams.

Project Debater: Robo-calls that sound human and can argue to convince you it’s not a scam.

Automated Bullying:

Both: Write messages targeting individuals insecurities using the context of a Facebook profile or discussions with the targeted person.

Promotion of Violence:

Both: Generate convincing arguments on why a violent protest should be supported.

A Tradeoff: Trust & Access

In each scenario illustrated above, the tools affect either access to or trust in information. When used positively, these tools have the potential to increase both. Used maliciously, they could cause the opposite. This poses a question, are the benefits of these new tools worth the potential downsides.

Typically, it’s easy to evaluate if the benefit of a tool is worth the cost. A hammer can be used to drive nails but it can be used to injure people. The majority of people will use it to hammer nails with the net positive impact far outweighing the net negative impact.

In other situations, it’s more difficult to make that tradeoff. Is the benefit of nuclear energy worth the cost of nuclear weapons? Nuclear energy is one of our only hopes for preventing climate change but nuclear weapons could end human civilization. It’s not clear if this tradeoff is worth it.

What’s the difference between these two scenarios? I believe that it is one of scale.

Scale

Scale is a buzzword but in this situation it’s apt. In the examples above, a hammer does not scale up while a nuclear weapon does. A single individual with a hammer can harm, maybe, a few people. A single individual with a nuclear weapon can erase a city.

The key feature of software is that it scales. AI tools have this feature.

Used for good, tools like GPT-2 and Project Debater can scale up so that every individual has a host of personal assistants, each helping them find the best information from the most reputable sources.

Used for ill, these tools enable a single individual to run massive disinformation campaigns or create new, more powerful, fully automated scams. The important thing is not that these tools can be abused, it’s that, with the aid of these tools, a single person can have a negative impact that used to take dozens or hundreds of individuals.

The Result

While I’m enamored of the potential for GPT-2 and Project Debater, their potential negative impact is large. The general public is already struggling with trusting online information. I’m reluctant to think that we should release tools that can make an already fraught situation worse. The possibility of Fake News 2.0 drives me to agree with OpenAI that we do not yet have a way to handle such powerful tools being generally available. We badly need a way to ensure online information is trustworthy before these types of tools are built by people with malicious intent.