Artificial Intelligence will be used.

Concerned that there isn't sufficient representation of scientific topics on Wikipedia in Indian languages other than English, the Department of Science and Technology (DST) is planning to translate — via a combination of artificial intelligence-based software, translators and scientists — scores of articles into Hindi.

There are about 50 lakh Wikipedia articles in English and only 125,000 comparable ones in Hindi, according to the DST, the nodal agency that funds civilian science research. “We’d like to increase that significantly, using machine learning and scientists,” said Ashutosh Sharma, Secretary, Department of Science and Technology.

The Wikipedia project will first involve translating a large number of science-based wikis and eventually move on to creating original content in Indian languages. Hindi would be the beginning but it would branch out to other languages, Mr. Sharma added.

Machine learning

Machine learning would be used to train software to rapidly translate large tracts of text and for creating new articles, and the services of scientists as well as subject experts would be employed. In the first phase of the project, about ₹7-10 crore would be invested over 3 years, said Nisha Mendiratta, a senior advisor in the ministry who is involved with the project.

The choice of articles to translate would be the most popular science articles. “Say quantum theory,” Mr. Sharma said, “The first article would be something that would be an accurate translation with all the citations, etc. Of course, the nature of Wikipedia is that it is editable but there are a whole lot of topics that don’t even have basic information.”

According to an entry in Wikipedia on Hindi language entries, the Hindi Wikipedia had 55,000 unique categories. The maximum number of articles were in the “Nature” (27%) and “Science” (16%) category. In Hindi Wikipedia, articles related to ‘Business and People’ had the highest average quality.