Machine Learning Panacea

One of the reason ChatBots got such a hype, is because machine vision has improved so much in such a short amount of time, thanks to Neural Networks, that people became very optimistic about their potentials in other fields. Since there is a near infinite amount of available text corpus on the internet, and Neural Networks seem to scale pretty well and perform better on large datasets, using them for modeling language was a self-evident thing to do. After all, if text classification can be done with Naive Bayes algorithm, using a machine learning model that can also learn non-linear relations should only improve on things. So the idea was, that by getting a lot of examples, one could classify text in many categories with higher accuracy, thus understanding the intent of the user. This way different sentences could trigger the same response on a server, allowing the creation of automated help desks and such. After all, most of the issues customer service employees have to answer are repetitive (like asking for directions to the toilet and such).

So in theory, one can create a pizza ordering service where you just tell a ChatBot what kind of pizza you want and it will send it to you. You teach it examples like “get me a pizza”, “I want a large pizza”, “Hi, I would like to order a pizza.” and similar sentences will also trigger the “pizza” intent on the backend. The problem with this idea is that:

Language is so flexible, there is always another way to word the same intent that was never present in the training set. Like “Hey, I’m hungry, what have you got?”

Users sometimes don’t know what they are looking for, so they can’t even state their actual question. “And add some of those small, green and black things please… You know…”

Users don’t always communicate with help desks in simple sentences. They usually ask about multiple complicated issues that only a human with domain knowledge can answer. “I want to order one for my friend who lives downtown, and another one for me, but I will only be home by 7 or so, so please don’t send your delivery guy before that, oh and do you accept credit cards?”

Finding named entities, dates and addresses is still an issue that is not solved by text classification alone. “It’s the second street next to the bus stop, can’t miss it. I’m on floor 7 in either room 404 or 405 depending on the time of the day.”

This approach also implies that you have labels for everything you want to classify. There will always be things someone wants to ask your ChatBot about that are out of its domain, causing it to fail miserably. “I don’t want a pizza, I’m more of a Taco guy.”

Most companies don’t have datasets consisting millions of labelled examples for each class to load it into their machine learning models. And coming up with a few keywords to match with regular expressions on the backend (like simply matching r’pizza\w*’) is much easier than somehow generating thousands of labelled example sentences, no matter how sophisticated your machine learning solution is.

Language constantly changes, and subcultures use languages of their own, which makes it even harder to have an up to date text classification model. We can create and imagine things using language that never existed before, creating the need for further labels for classification. These issues weren’t really foreseen by developers when the ChatBot hype has started, creating unrealistic expectations about their abilities. So even if we have better language processing, sentiment analysis, text generation and machine translation tools due to the advancements in RNN / LSTM Neural Networks, language is still not yet nearly solved. And even if we were to detect known intents 100% accurately all the time, the number of possible intents is infinite, which is a problem that can not be solved using classification alone. But even if we could find a solution to this, writing responses to all questions is still the job of the developers. Except if you are literary Google.

While no one is willing to admit this, despite all the technological advancements, most ChatBots and SmartAssistants are just engineers patching a spaghetti code of intent matching rules manually, every time something significant has been misunderstood. It’s easier to get funding for your StartUp by calling this process “Artificial Intelligence that’s almost there just needs to be taught a little” so sadly, it’s become common practice. Shout out to the guys at Google Duplex whose AI assistant making appointments, which is beyond my understanding of the current possibilities of NLP. I do want to believe it’s actually real.

NLP in languages other than English

The main idea behind Natural Language Processing is that you can use mostly the same tools and methodology to turn a text corpus into a series of measurable units that a computer program can work with, in any language. Since words also tend to occur in inflected forms, the researcher’s first task is to find the stems of all inflected words so “cats” and “cat” wouldn’t be two different entities when doing calculations with them (for instance measuring the frequency of their occurrence in the text). Initially, researchers believed that by getting all the grammatical rules of language, text processing can be automated. What they’ve found is that despite all languages clearly having rules, there are always exceptions to each and every one of them, and these ever changing rules need to be applied based on context (the plural of “leaf” isn’t “leafs” but “leaves”, however the word “leaves” might also mean that someone is “leaving”, etc.). The distribution of words also follows the Zipf-law so there will be a few words that occur in almost every sentence (these words, like “the”, “a” or “and” are called stopwords and are partly removed from most models for this reason) and a lot more that occur rarely. Surprisingly, one can still do effective document classification (for instance, telling automatically which e-mail is spam based on its contents) or sentiment analysis using only these limited set of language processing rules.

Now take a look at the following example in Hungarian. It is an agglutinative language, which means it has morphemes with different prefixes and affixes attached to them, that change a word’s meaning:

Úgy kicsinállak, becsinálsz!

The sentence roughly translates to “I will crack you up so hard, you will crap your pants!”. The word “úgy” is a stopword, and both “kicsinállak” and “becsinálsz” are derived from the “csinál” stem, which simply means “to do”. So if an algorithm was to remove stopwords and get rid of the verb prefixes (“ki” and “be”) and the conjugation affixes (“lak” and “sz”), we would be left with just the verb “csinál” appearing twice, which would simply translate to “to do to do”. We are constantly losing information while turning sentences into measurable features, and the amount of information lost depends on the grammar of the actual language. The very same NLP rules can not be applied to all languages with the same efficiency. Since machine learning models can only work with the features you provide them, it is also unreasonable to expect a model to work equally well on all languages: after all, how would you, as a human interpret a sentence that only has “to do” in it twice? To push my point further: Google Translate actually translates this example to “I’m small, you figure it out!”. Since word order in Hungarian is pretty flexible, “Becsinálsz, úgy kicsinállak!” would mean the same, yet Google translates it differently. They seem to use models that measure which character follows which, causing it to accidentally find “kicsi” in “kicsinállak” which indeed means “small”. So character-based models also fail languages with really long words and lots of inflections.

Most NLP solutions require a dictionary of words, their classes and possible forms to operate. However this is unnecessary for ChatBots, because the goal isn’t to match every single word’s every possible form when looking for intents and entities only. We are looking for the presence or absence of certain keywords and their inflected forms. If you know where to cut corners, NLP tasks are somewhat easier, faster and cheaper for ChatBots, because messages are generally shorter, less complex sentences, and one can assume that they are:

either in interrogative mode or in imperative

that are articulated in first-person singular/plural and the object is the ChatBot in second-person singular

are most likely phrased in an informal way (except when your ChatBot is doing customer service)

Would the messages be different, the possible number of valid affixes would also be larger. This simplification works not only in Hungarian grammar but in German, Finnish, etc. as well. So if you are working on a machine learning algorithm for finding intents in ChatBot messages that aren’t in English, make sure you create feature vectors based on the grammar used in these conversations, rather than just blindly accepting whatever the NLP toolkit of your choice automatically does. Keep in mind that online language is different from literary language, so you can’t just feed your model all kinds of example sentences hoping it will do well in chat conversations.

Designing dialogues

There are 3 tricks one can implement to make their ChatBot feel a lot smarter and social than it actually is.

Have multiple answers for the same response. Similarly how multiple intents invoke the very same logic in your ChatBots’s code, the same answers should be worded in multiple ways also. For example: “what’s the weather like”, “weather forecast”, “how’s the weather today” should all invoke the same weather logic, but instead of just answering “25 degrees Celsius” it should also occasionally reply with “It’s 25ºC right now” or “Currently it’s 25 degrees in Celsius” or “Right now it’s 25ºC, with a clear sky” or even “It’s 25ºC but you’ll might need an umbrella later today!”.

Understand context, add memory to your ChatBot. Something as simple as remembering the very last intents, will make your ChatBot way more efficient. It’s annoying to be forced to rephrase a question every time one needs additional information on a topic. “What’s the weather like today? — 25ºC, sunny.” “And what about tomorrow? — Tomorrow is Wednesday.”. To handle conversations in certain areas, I have also created a FIFO memory module, that resembles how our short-term memory works: have a FIFO storage of 7 items, where everything new would make you forget the oldest item, if there are already 7 items present in the memory. This way you could still access things in the conversation’s history that a human would remember. But you’ll usually do okay with only 1 or 2 items.

Have a hierarchy-based dialogue-tree. Certain intents should be more important and be answered first. Note that caching becomes essential when relying on larger hierarchy trees, because more simple, less relevant intents will only be answered after all other intents were tested and ignored (so if you only test for greetings like “hi” or “bye” at the very end, these simple utterances will take the longest to respond to, which feels unnatural).

Hierarchy-based dialogue-trees with memory can also be used to detect unwanted intent matching. For instance, let’s say the read news logic is above the small talk logic in hierarchy, and both have the word “news” as a valid intent. A phrase like “well that’s news to me” would trigger such a ChatBot to talk about the news first. If the user was to repeat their question, saying “no, I’m just saying what you said was news to me”, knowing that the read news logic was already triggered beforehand, it can be ignored and the following small talk logic can now be triggered instead. You didn’t have write further rules, or do any additional matching (in fact, both intents were simply triggered by finding the word “news” in the user’s sentences), and the answer would still satisfy the user (although only after their second try).

Context-based hierarchy-trees (dialogue-trees dynamically changing based on the previously received intents) make ChatBots feel way smarter, yet they take a lot of extra effort to make. By taking context into account and having multiple responses for the same logic, development time could rise exponentially. Testing these dialogues is also a lot harder, since on the one hand, adding anything new to them could potentially mess up the existing logic, and on the other hand, getting to trigger them and see if they work as intended is a lot more effort. Testing if there is a certain branch with a certain wording that should be triggered by a certain logic, in a certain context, if other, overriding requirements are not met, which could instead invoke more important logics higher up in the hierarchy — is a lot of extra effort and can not be totally automated. For this reason, only work on such dialogue-trees once you have tested your ChatBot on every other level and have a solid foundation to build upon. Finding the root cause of an unwanted answer in a system where any module could work against the intentions of the programmer could take a lot of time. If you have even more time for development, in addition, you could make dialogues in a way, that they would somewhat adapt to the user: if they talk a lot, make the answers longer, if they only give your ChatBot short, formal commands, answer in a short and formal way.

As I’ve mentioned earlier, social dialogues should also be included to enable small talk (but they should be lower down in the hierarchy). People who want to feel attached to your ChatBot, as if it was somewhat sentient should have the option to do so: allow them to thank, greet, love and feel like they are somewhat educating your ChatBot. Jokes are also a great way to entertain the user and make them less frustrated about unwanted answers. However, all additional chit-chat should be secondary, and posted only after answering the user’s main request, in order to reduce delay. Small talk functions are two edged swords, since they make conversations longer while not providing any real utility, yet they are requirements for lifelike social interactions. My former comic-posting Napirajz ChatBot would send quotes from the returned matches after posting their URLs, and would also post related strips as error messages (for example characters from the comic saying something went wrong, or being unable to answer). Some of my users would find them funny and even believed that the ChatBot actually understood them to some level, but still refused to correctly answer to somewhat mock them.

When no intents were found in your dialogue-tree, instead of answering either “I don’t know” or “I am not so sure about that” randomly, create excuses based on context and the question words included in the sentences. For example, if you get a question you have no predefined answer to, shortly after asking about the news, the ChatBot could reply with “I couldn’t find further articles on that topic.”. Or if the user asks “do you know how to cook an omelet?” the ChatBot could reply “I don’t know how to do that.”. These tricks don’t always work out correctly, but they can make your ChatBot feel a lot smarter and keep the conversation going. Users are also more forgiving if they get a proper explanation about why a feature is lacking (like “Sorry, I do not have a recipe database yet.”), than if they would get “Sorry, I don’t know that.” as an answer, multiple times in a row.

Getting data

Language isn’t the only reason Smart Assistants aren’t widely available outside English speaking countries. They all work using data either from other companies or from publicly maintained datasets. Both of which are harder to come by in other languages. The quality, diversity and amount of data is usually less and APIs are harder to come by. It’s so much easier to develop a music player that feels smart, by connecting it to the Spotify API that does everything for you: you can jokingly ask Alexa to “play hardbass” and the Spotify API will return the XS project track, Bochka, Bass, Kolbaser because the recommendation engine behind Spotify is so damn smart it even works for memes. You can also ask her to play “sad songs” and “songs to study to” because these abstractions (moods related to actual playlist of popular songs, automatically created by frequently crunching large amounts of user data) are hidden from you, and therefor do not require any further development on the ChatBot’s server side. Getting your Smart Assistant to read interesting facts about your favorite movies in English is as simple as getting trivias from IMDB, but if there aren’t any similar databases available in your language, then it’s up to you to manually gather the data and create a similar database for answers. Even if you are getting data that is approved by other sources, I would still recommend you to double check whether or not you accidentally came across some troll content, full of swear words or racist ideas, before using it as a valid answer for your user’s question.

Since only a few Hungarian online sources have their own APIs, I had to write crawlers to grab the requested information from websites. Obviously this is less reliable because certain changes on the frontend’s DOM structure could potentially render the crawlers useless. Relying on free APIs (for example: using free open weather APIs) is somewhat a gamble also, as these servers usually take longer to respond and have a higher risk of being down, due to being free and not owing their users anything. Large IT companies have a clear, unbeatable advantage when it comes to data, by having their diverse, ever growing databases generated partly by their extensive userbase, and by being able to put through deals with other companies that ensure their APIs would be available and have beneficial information.

Me in real life.

By getting data from sources that are outside of your control, you are also empowering these sources and allow them to reach larger audiences through you. Even if you don’t know anything about Hungary, you have most likely heard about a few Hungarian political decisions that generated huge uproars. Since the political landscape is divided, even something as trivial as getting the RSS feeds of online news sites could intentionally offend a large group of your users, because they automatically disagree with the contents of news sites that hold opposing views to theirs. People say they prefer objectivity and do their research to fight fake news, but deep inside our primate brain still prefers clickbait headlines that make us enraged or scared — getting objective news is boring, hence the lack of a large body of objective sources. It was painful to see Stella automatically talk about headlines that I personally find harmful and untrue, and felt uncomfortable about the fact that my future users might get them as flash news briefings. It felt unfair that Alexa could simply just play a sound file from Reuters TV and feel smart, while I had to develop some extra crawlers, and still end up having Stella reading propaganda messages on a robotic voice.

One of the selling points of Smart Assistants is their ability to connect to a large amount of IoT devices seamlessly. This obviously requires a lot of extra development, as devices and device hubs differ (so do their preferred ways for connecting to other devices). Try as hard as you can, you will never write your own, open source Chromecast alternative that automatically connects all your Android devices with all the TVs in the house. It’s also worth noting that IoT security is an issue often dismissed by, well basically everyone. People rarely take their time to update their Smart Fridges to have it dispense better ice cubes or just to prevent them from turning into botnets used for guided DDOS attacks. Patchy workarounds sending and receiving data from a limited set of IoT devices aren’t the same as properly connecting a huge variety of devices to your Smart Assistant with security in mind. The latter obviously demands much more work than the former. Keeping up with APIs and IoT devices’ firmware updates will also consume a lot of your resources.

You might have heard urban legends about AI casually beating human doctors in detecting cancer, despite its creators having no education in medicine. This is bullshit. Implying AI is a magical black box that one can load random data into and will figure out things on its own is harmful and dangerous. You should not tackle machine learning tasks alone in areas you have zero domain knowledge in. Having a deeper understanding of the subject you are trying to give answers to by using machine learning is crucial, since the selection and conversion of features, the manipulation and replacement of missing data, dimension reduction and the selection of the correct models are all tasks that depend on the data scientists’ understanding of both the subject and the available data. Same goes for designing dialogue options in areas you are unfamiliar with. You or your coworkers should be experts in the fields your ChatBot supposed to have competence in.

Testing and presentation

As Stella and her parts became more and more robust, the implementation of automatic tests was inevitable. You wouldn’t know if a small change in intent detection, entity recognition, memory or context based dialogue tree would end up opening a box of snakes somewhere else in the process. However, not all test can be automated. I had automatic test cases for the NLP functions but to be certain that your ChatBot can correctly identify intents can only be tested by letting people other than you use it. Before adding any responses, I’ve made a test site where the testers task was to ask certain questions from the ChatBot and it would save the list of intents that were identified in their questions. Although it helped me improve my regular expressions and identify typos, the testers were initially left confused when the intents were not correctly identified in their messages, thinking they’ve sent the wrong questions, so some of them has stopped testing after a chain of failures. I then had to tell them that the point of these tests was to see if the ChatBot would fail, so they were doing the right thing, and should go back to sending more messages. Note that the more a person uses your ChatBot the better the rate of successfully identified intents will be, not because your ChatBot became smarter on its own, but because the users eventually adapt to your ChatBot and learn how to communicate with it efficiently. For this reason, you should try to find additional testers as time goes on.

Most of the Smart Assistants you can get only come with a limited set of instructions, so most of their abilities are unknown to the customer. So the user never knows if they can expect a correct response to their question, so there is an element of surprise and exploration in it as well. It feels way better when a question you would think is too hard for a ChatBot actually gets you the expected results — it’s like if you were opening treasure chests (or loot boxes) all the time, not knowing what lays inside. However, this strategy is indeed a gamble, because if your ChatBot fails to answer correctly during the first interactions 2 to 3 times in a row, users will just ignore it and find it terrible. They will most likely give up on trying or just stay for trolling it with even harder questions. For this reason, you should make sure that at least some of its abilities are clear to the users and those abilities work as intended most of the time.

The same is true with test subjects. If you ask people to just ask questions from your ChatBot (because you want to see if there are commonly asked intents and topics you have not covered), the ones who have little to no idea about the limitations of the technology will simply find it terrible and stop interacting with it (despite your intentions being to point out its shortcomings in the first place). It’s better to tell them to test for certain intents and let them gradually sway away form this task so they can think of your ChatBot as something that just needs more testing, instead of a form they have to send random text into to get irrelevant messages in return.

Some people will try to communicate with your ChatBot using complex sentences, as if they were talking to actual humans. Some of them will make these sentences even more complicated as they should be, either because they enjoy trolling bots or they genuinely want to teach it. When parents talk to their children, they subliminally make the sentences slightly more complicated than what the child is able to understand on their current level, to subtly teach them. People also do this to pets if they really like them. The problem with this, is that no ChatBot will automatically be better by receiving the same sentences in more and more complicated ways, as we have no way to automatically label the intent of the users and use their messages as examples for improving machine learning models. So sadly, your ChatBot is pretty much always doomed to not satisfy the parental instincts of its users who are most fond of it and would gladly teach it.

If a ChatBot fails to understand a user’s question, if the user is sure that the ChatBot is indeed capable of processing their request, they will rephrase the message to be slightly less and less complex. They would go as far as to phrasing the question in a way they would type it into Google Search. The problem with this, is that even if you improve on your ChatBot’s abilities later on (either add an answer to the question or elaborate on the ways the intent is extracted from the question), these users will seldom retry them. They have learned the limitations of your ChatBot and will try to remain in the perceived boundaries during their future interactions.

It was rather surprising to me how users tend to believe that whatever they tell to the ChatBot or Smart Assistant remains between them. Only a small fraction of people are actually aware that their conversation is at least partly recorded for research purposes and being read or listened to by engineers (and also Facebook page administrators when interacting with Messenger ChatBots) who want to improve their solutions. With all these hype around ChatBots, I have yet to came across an article that discussed the ethics of listening to someone’s conversations without them knowing it. You might think people have common sense and have at least partly read the shortened legal messages that summarize how a conversational AI has to process information in order to work, but that’s not really the case. You might think you are only developing a Pizza ordering ChatBot, but you are actually opening a space for people to discuss their love life with something that will unconditionally listen to them, and maybe occasionally ask if they wanted pineapples as extra topping by accident. Maybe they would even send dick pics to your lovely Virtual Assistant, that will stay in the Facebook page’s messages folder forever, as deleting customer messages is not a real option. If you find this funny, you might also want to contemplate on what’s the correct way to respond to someone’s discussing suicide or telling your ChatBot about illegal activities they are involved in? Even if these things are rare, it’s all a numbers game, so you will eventually run into these scenarios once enough people use your solution. Maybe you’ve got this neat StartUp idea for a ChatBot that notifies young girls to take their pills? It gets pretty cringe-worthy once you realize that a large group of hairy male engineers will have to read all of their conversations in the background.

What I’ve also learned the hard way, is no matter how well your ChatBot works, and how much you test prior, presenting your ChatBot live to a large audience could go really wrong, and all kinds of formerly unknown bugs could magically appear, due to Murphy’s law.

Conclusion

I have mixed feelings about Stella. Despite her working better than expected in many areas, I can clearly see the limitations that the current level of Natural Language Processing will not be able to solve. I now feel less enthusiastic about improving her, because I know, that no matter how hard I try, there are limits to how smart she can ever be. Or any Smart Assistant for that matter. Will ChatBots ever take our jobs? As much as I want to see conversational AI and Messenger Apps take off, I can’t ignore the fact that even Poncho, the most famously known weather forecaster bot is down, which marks the end of an era for Messenger ChatBots (yet, they are quite popular in China and Smart Assistants are doing surprisingly well in the West, so there is still a chance that the technology will soon find its use-case). My bets are on hybrid solutions: ChatBots that don’t directly message the users every time, but rather present the customer service employee options that they can choose from, to speed up their job. Unlike employees, AI can remember all the customer’s history and previous conversations, and offer personalized recommendations during the interaction. It’s up to the human assistant to send one of the AIs recommended answers or override them and compose their own messages instead whenever they fail. These centaurs would be more efficient than their parts alone. Designing improved ChatBot GUIs would also be a better solution than simply going full NLP, where custom menus, buttons and cards are shown based on context, besides allowing free text queries (something similar to what Google Assistant is trying to achieve).

As you can clearly see, ChatBots and Smart Assistants are much more than just finding intents in the users’ questions and sending responses based on them. It should also be clear why only the largest IT companies are able to afford the development and maintenance of Smart Assistants (not to mention the huge task of developing custom hardware for the speakers and establishing the logistics to distribute them). Since language itself is not yet nearly solved, conversational AI still consists of mostly engineers listening into conversations, fixing intent matching and adding more features to their codebase until it inevitably becomes incomprehensible. Even if you’ve decided not to work on ChatBots for this reason, you will find, that the tricks and tips I’ve shared will still be useful for improving any narrow AI solution that needs to do social interactions while serving humans — or when working on AI for video games.

So all that said, I’m still looking for use cases for Stella and ways to make her available to the public, as scaling a Smart Assistant (especially scaling the speech2text, text2speech modules and the ability to get enough data from third parties) is an ever more robust project to overtake. Making a Smart Assistant for your office, with custom code and hardware fitting your own needs is one thing, moving her outside a sandbox environment is another. In any way, I hope my experience will help you develop better conversational AIs in the future and prevent you from doing unnecessary work in the field.

If you do anything awesome based on this article, let me now! 🤖