In this day and age, the accessibility of information is an all-time high. Within a couple of clicks, you can:

check the latest news;

be in the know about the most recent events;

find information about anything (from fun facts to legit science articles).

It is all there - within a couple of clicks. There is a media outlet for every need and demand. But with the increased availability of information comes the inevitable problem - the manipulation of data, the distortion of the facts, fake news.

The thing is - the facts presented in a particular manner enable the manipulation of the public opinion, which is dangerous and not how the media are supposed to function.

And this is something that can be solved with the assistance of cutting edge technologies. Natural language processing can expose the machinations behind the words and point out the perpetrators.

But first, let’s explain why media manipulation is a serious issue.

What’s the problem with public opinion and mass media?

Public opinion is one of those entities that bring the shade onto certain concepts and behavior within the social groups and in society in general. As a result, it legitimizes specific ideas or actions.

Mass media is the primary shaper of public opinion. Media sets the current of the opinion's river.

The main goal of the mass media is to inform on topics relevant to the target audience segments. Still, there is also a certain degree of interpretation of facts based on the values of the outlet and its target audience.

A simple example would be the topic of “smoking.” The general public opinion is that smoking is bad for health. A constant stream of articles reinforces the idea that smoking is terrible with facts and figures that seem convincing to the target audience. As a result, a particular segment of the public is convinced that smoking is bad for health, which is then commented by the mass media.

It is a ping pong loop - mass media and free bounce off each other and as a result, reinforce each other’s stance on the subject.

Why is it dangerous to manipulate facts?

The key thing to understand is that the goal of the fact interpretation is the public good first and foremost. The problem is that the interpretation of facts often serves other kinds of interest, namely, the government. That’s where fact distortion and downright fake news occur.

The government’s needs are easy to follow:

To maintain control over public opinion,

To justify the government’s actions

To neutralize the opposition.

In the media, these goals are achieved using propaganda - a very particular point of view on things. The facts interpreted in the “right way” pave the way to many questionable things, such as:

Limitations and violations of civil rights;

Justification of government or corporate corruption;

Putting blame and villainizing the other;

Reasoning an intervention to foreign countries internal affairs under the guise of national interests.

The fact manipulation and fake news often occur. But to keep the public aware of what is going on is vital.

How would you do that, if the public is so deeply entrenched in fact-manipulated propaganda? You show two sides of the coin and let the public compare them.

That’s what the project AI Versus conversational interface is aimed at.

What is AI Versus Chatbot?

AI Versus, a high-concept conversational interface, was designed to show two different points of view in the political situation in the Russian Federation.

The idea was to show the contrast of interpreting and presenting information between:

independent liberal media;

Government-funded propaganda media;

As a result, it is supposed to show a vertical slice of two conflicting mindsets formed by two drastically different types of presenting facts side by side.

But just showing the contrast was not enough. There was a need for making it engaging to target audience beyond the simple comparison of the facts.

Our app development company was approached to make this concept a reality.

Our solution - Q&A Conversational Interface

Our primary goals on the project were to develop:

natural language processing model based on two diametrically opposite datasets;

conversational interface users can easily talk to.

After thorough research, the team decided that the best way to show the contrast engagingly is a question-answer system with a direct comparison of two answers on one question. This way, the user is in control and gets the information he wants to get from two generalized streams of data.

Let’s look at the major development stages of the project.

Selecting the NLP Model

The biggest challenge of the project was to choose the right NLP Model. Due to the nature of the project, the primary language was Russian, which meant certain limitations in terms of NLP library application.

The key was accessibility and efficiency. We needed to find a practical solution, and the search took a better half of the development time.

To determine the best AI chatbot solution, we have tried and tested several natural language processing tools:

First option: Deep Pavlov. It was resource-heavy and bit clumsy at text generation. On its own, there was always something off about the answers.

Second option: ODQA. While it was serviceable, the results weren’t all that inspiring.

Third option: ELMO. The results were commendable, but it had trouble with scalability.

Fourth option: word2vec and subsequently doc2vec. While our initial experiments were unassuming, over time, the quality of results was inspiring. Doc2vec had proven to be the most suitable solution for the project in terms of performance and overall flexibility.

In the end, we combined Doc2vec with DeepPavlov, and that became the foundation of the system.

Model Training

Training the model on the dataset was another big challenge. We did not have to look for a solution itself for a long time, but the training process took a while still.

Natural Language Processing applications are like diamonds in the rough. They need high polish before they show their sheen. This project was no different.

Our primary goals were to:

Train the system based on contexts in the dataset

Train the for question recognition on the experimental interface

Classify incoming questions

Provide tagging for additional navigation

We have used a combination of DeepPavlov and Doc2Vec to train the model and explore the datasets, which resulted in a system capable of providing the answer to the query.

For question classification exercise, we have developed a testing interface on the Telegram platform. It was used to broaden the training process and add much-needed diversity to the queries.

As a result, we’ve trained a model that is capable of recognizing even the most far-fetched questions and finding relevant answers for it.

Model Optimization

The testing and optimization process was probably the most adventurous part of the entire development cycle (even compared with the choice of the model).

In this case, the best way to optimize the model and fix as many problems as possible was through applying checkpoint-based testing.

The procedure looked like this:

We had a selection of questions

Multiple users generated the answers

Results were compared and analyzed for weak or otherwise problematic points.

To maximize effort, we had a back and forth between the client’s team and our development team. It helped to identify and fix the problem points in less time.

In addition to that, we have tested the system on multiple levels of the corpus. We have tested:

On the entire corpus

Then on the corpus that was broken down to separate articles.

Then on the corpus that was broken down to paragraphs

Then broken down to sentences.

The most effective approach was by testing on the paragraphs of the corpus.

Tech Stack for AI Chatbot

The resulting system is comprised of the following elements:

Google Cloud Platform

DeepPavlov & Doc2Vec/Word2Vec NLP Model

Custom API for database management

Conversational User Interface

Team Members

1 Project Manager

2 Software Developers

1 QA Engineer

Conclusion

This project is a big milestone for our team. Over the years, we have worked on different aspects of a natural language processing and developed many projects that involved this technology. This project gave a chance to create a high-concept application that works with a unique kind of information for the public good.

During the development of this project, we have utilized streamlined workflows that allowed us to make the whole turnaround much faster. We have to deploy an operating prototype of the system ahead of the planned date and dedicated more time to its testing and refinement.