How to Test a Chatbot — Part 1: Why Is It so Hard ?

This article series is about Chatbot testing. Common caveats and problems are described, common solutions are shown and common tools are presented. The first article in the series points out the differences from testing a Chatbot to testing other kind of software.

UPDATE 2020/06/15: As Chatbots grow in importance, automated testing solutions will remain critical for ensuring that Chatbots actually do what their designers intend. We’ve been busy working on a product that allows testers to have visual insights and deeper understanding in their Chatbot’s performance, offering several solutions to boost their interaction!

Botium Coach will be introduced to the market as part of our online event on the 24th of June.

Why Should I Test a Chatbot ?

A picture says more than a thousands words …

From Wikipedia:

Software testing is an investigation conducted to provide stakeholders with information about the quality of the software product or service under test.

There are well estabilished software testing techniques and metrics, but what makes testing chatbots different ? What’s the difference to testing a website or smartphone app ?

What Makes Testing Chatbots Different?

There are at least 4 major differences:

Reason 1: Learning cloud services

Most chatbots are built on top a learning cloud services, which by definition keeps changing its behaviour. NLU/NLP-Services (Natural language understanding and processing) like Dialogflow, Wit.ai or LUIS are subject to constant training and improvement. Having a non-deterministic component in the system under test will make software testing useless, as soon as you cannot tell the reason for a failed test case — a defect in the chatbot software or an improvement in the cloud service.

And even more important, the test itself can and will have an impact on the cloud services as well: presenting a cloud service with the same test cases over and over again will distort the cloud services assumption of “real-life interactions”, giving the test cases higher priority than they should have.

❌ Cloud service training has impact on software tests. Deal with this dependency.

Reason 2: Non-linear input

This only applies to chatbots operated with a voice interface.

There are 7,5 billion humans out there, and there are 7,5 billion different voices out there. For a website, it doesn’t matter who clicks a button — Elon Musk himself or King Louie, the website doesn’t notice a difference. But for a chatbot, it does matter what voice is in action.

✔️ Speech recognition technologies are evolving fast. Chatbot developers can rely on industry leaders to provide acceptable solutions.

Reason 3: Non-deterministic user interactions

Dealing with non-determinism is a critical topic in software testing. Due to the nature of human language it is impossible for software tests to cover all possible situations.

❌ Give up the 100%-test-coverage goal. Make sure the tests cover the most common situations.

Reason 4: No barriers for users

When using a chatbot, either with voice or with text interface, there are no interaction barriers for users. Websites and smartphone apps allow predefined means of interaction with common user interface components (clickable hyperlinks, buttons, text entry boxes, …). Chatbots have to cover all kind of unexpected user input in a decent way.

User: “CleverBot, please book a flight from Vienna to Berlin.”

CleverBot: “Sure, when do you want to leave ?”

User: “Apples and Bananas”

CleverBot: “…”

✔️ Design test cases with robustness for unexpected user input in mind.

Next to Come: Selecting Appropriate Testing Techniques

The next article in this series will suggest appropriate testing techniques for chatbots. Spoiler: As always, there isn’t a single source of truth, but the mixture of testing techniques is a key to good software quality.

Can You Name Other Differences from Chatbot Testing to Website Testing ?

If you think I missed out important differences, I would be happy to discuss in the comments section!