This article presents an overview of scientific works with my participation related to quality assessment of Wikipedia in different languages. I decided to share my knowledge and experience in this area with the Medium audience. I will be glad to hear comments and suggestions on this topic, maybe someone will be interested in cooperation in this area. In the next articles, I plan to discuss in more detail separate methods and algorithms for analyzing the quality of articles in different languages. Also, plan to post code samples (mostly in Python) that can be useful for extracting and analyzing data from Wikipedia.

Automatic assessment of the quality of Wikipedia articles in different languages

Distribution of quality scores for three Wikipedia language versions (English, German, and French) in 12 considered topics.

Despite the fact that Wikipedia is often criticized for its poor quality, it still is one of the most popular knowledge bases in the world. Currently, this online encyclopedia is on the 5th place in the ranking of most visited sites (after Google, Youtube, Facebook, Baidu). Articles in this encyclopedia are created and edited in about 300 different languages. Currently Wikipedia contains more than 46 million articles about various topics.

Every day the number of articles in Wikipedia is growing. They can be created and edited even by anonymous users. Authors do not need to formally demonstrate their skills, education and experience in certain areas. Wikipedia does not have a central editorial team or a group of reviewers who could comprehensively check all new and existing texts. For these and other reasons, people often criticize the concept of Wikipedia, in particular pointing out the poor quality of information.

Despite this, in Wikipedia you can sometimes find valuable information — depending on the language version and subject. Practically in every language version there is a system of awards for the best articles. However, the number of these articles is relatively small (less than one percent). In some language versions, there are also other quality grades. However, the overwhelming majority of articles have are unevaluated (in some languages more than 99%).