Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features

Version 1 : Received: 2 January 2018 / Approved: 3 January 2018 / Online: 3 January 2018 (02:03:51 CET)



A peer-reviewed article of this Preprint also exists. Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham. Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham. Copy Journal reference: Communications in Computer and Information Science 2018, 920, 546-558

DOI: 10.1007/978-3-319-99972-2_45

Cite as: Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham. Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham. Copy CANCEL COPY CITATION DETAILS

Abstract

Wikipedia is the most popular and the largest user-generated source of knowledge on the Web. Quality of the information in this encyclopedia is often questioned. Therefore, Wikipedians have developed an award system for high quality articles, which follows the specific style guidelines. Nevertheless, more than 1.2 million articles in Polish Wikipedia are unassessed. This paper considers over 100 linguistic features to determine the quality of Wikipedia articles in Polish language. We evaluate our models on 500,000 articles of Polish Wikipedia. Additionally, we discuss the importance of linguistic features for quality prediction.

Subject Areas

Wikipedia; Polish; information quality; linguistic features; linguistics; data mining; NLP

Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.