“One of the recent findings in the area of NLP, and AI in general, has been that as you scale, as these neural network models larger and larger, they tend to perform better,” Stephen Roller, a research engineer at Facebook’s AI lab (FAIR), told Engadget. “We had a number of issues when we were trying to train this thing. When you start to get that large, these things no longer are able to fit on a single GPU anymore.”

Instead, the team had to devise a means of siloing various aspects of the neural network being trained and working them in parallel across a number of devices while maintaining the overall efficiency of the networks as a whole. “There are a lot of sophisticated techniques that you have to use in terms of how you chop this thing up,” Roller continued. “If you split it over different devices and chop it up like the wrong way, then you're going to lose that efficiency that you have and you're not going to be able to scale to these terabyte-sized data sets that we've been working with.”

However, size is only important up to a point. “One interesting thing we found is scale isn't everything,” Roller said. “For instance, our 2.7 billion parameter model, actually slightly outperforms the 9.4 billion parameter model. What seems to be really important is that you scale these things at least until a certain point, but then it becomes supercritical to imbue these things with specialized skills and behaviors [to further refine its performance].”

FAIR has focused on three specific behaviors -- the ability to display empathy, personality and knowledge -- to further humanize Blender’s responses. But it’s not so much that Blender can produce those three behaviors so much as it can switch seamlessly between them as the conversation progresses thanks to its unique Blended Skill Talk feature.

“We, in the past two years of research, have designed tasks for each one of these skills,” Emily Dinan, a research engineer at FAIR, told Engadget. “This is the first time we've really shown that you can blend all of these aspects of conversation seamlessly in one. Our evaluation setup showed that models that were fine-tuned on these nice conversational skill datasets are more engaging and consider more human, more lifelike than models which were not.”

This means that Blender is emotionally smart enough to know to congratulate you if you tell it you just got a promotion at work and offer condolences when you reveal that your dog just died. FAIR has also taught it to give more than rote cursory responses when asked about a particular subject. For example, if you ask Google Assistant about Led Zeppelin, it will typically read off the first couple of lines from the band’s Wikipedia page. “So, we designed a data set that was intending to go beyond this sort of surface level of chitchat and go more in depth about a topic, and sort of imbue these models with more knowledge about the world,” Dinan said.

Facebook

The research team first pre-trained Blender using the Wizard of Wikipedia system to impart a broad general knowledge base into the chatbot, then fine tuned it with data that “specifically was collected by having two humans talk to each other about a given topic,” Dinan continued. This helped to normalize the bot’s speech patterns so it sounded more like a natural reply rather than a robot reading encyclopedia entries at you.



“Getting this great supervised conversational data is really expensive,” Dinan admitted. “And so we typically only have small amounts of this data, relative to the data that we use to pre train the model.”

And if Bladerunner taught us anything, it’s that your synthetic lifeform is only as convincing as its best backstory, which is why FAIR has started adding them to its chatbots. “This is a way of imbuing the bots with a specific personality and showing more personality,” Dinan said. “And so this data set was designed by giving to human character descriptions like… being a basketball lover from Michigan with three kids.” Keeping the bots “in character” reduces the rate of logical faults produced by the AI system, such as first saying that it owns three cats and then turning around with its next response and denying ever having lived with a trio of felines.

So far, people who have interacted with Blender seem to prefer it to other open-source models. During a recent test between it and Meena using the ACUTE-Eval method, 67 percent of respondents found Blender to sound more human, and 75 percent would rather engage with Blender than with Meena for longer conversations.