“On the Self-similarity of Wikipedia Talks: a Combined Discourse-analytical and Quantitative Approach”

Reviewed by Maik Stührenberg

This paper[1] is thoroughly structured and combines the theory of web genres with dialogue theory to examine Wikipedia talk pages. Since Wikipedia is a web genre, “Wikicussions” (as the authors call them) form a subgenre. In this context, talk pages are examined further, including the quality of cooperation between Wikipedia users, that can be linked to social differentiation regarding roles and statuses of Wikipedians (content- vs. administration-related users). These group-related processes can be seen as a mediating layer between external parameters (system requirements for Wikipedia’s user community) and the structure and dynamics of WP’s subgenres.

Unlike face-to-face dialogue, the authors argue that Wikicussions stand out due to a publicly available common ground (derived from dialogue theory), which may provide a reason for the structures they found.

The paper is enriched with a number of high-quality figures that support and underpin the findings.

Frequency distribution of talk posts over time within the German Wikipedia (blue: registered users; red: anonymous users; green: bots; black: all users). Unsigned posts (without timestamps) are excluded. Posts dated by posters outside of the valid time-frame (before the date of creation of the discussion or after the date of its download) are also excluded. (Figure 7 from the paper, by Alexander Mehler, Rüdiger Gleim, Andy Lücking, Tolga Uslu and Christian Stegbauer, CC BY-SA 4.0 )

“How Sudden Censorship Can Increase Access to Information”

Reviewed by Bri and Tilman Bayer

Our intuition might tell us that government censorship causes reduced access to online information. But recent research indicates that the effect can be exactly the opposite. Using data gathered from Wikipedia page views and other sources, researchers William Hobbs and Margaret Roberts found that:

[…] citizens accustomed to acquiring this [forbidden] information will be incentivized to learn methods of censorship evasion […] millions of Chinese users acquire[d] virtual private networks, and subsequently […] began browsing blocked political pages on Wikipedia, following Chinese political activists on Twitter, and discussing highly politicized topics such as opposition protests in Hong Kong.[2]

Specifically, the authors studied the impact of a block of Instagram in China on September 29, 2014, following protests in Hong Kong, on Chinese Wikipedia pages that were already blocked in the country. (This predates the 2015 total block of the Chinese Wikipedia and the switch of all Wikimedia sites to full encryption with HTTPS around the same time, which made such per-page blocking impossible.) The censored Chinese Wikipedia pages with the largest increase in views “shows that new viewers accessed pages that had long been censored including those related to the 1989 Tiananmen Square protests”,[2] i.e. “viewing patterns that would be more typical of new users who had just jumped the firewall, rather than of old VPN users who had presumably consumed this information long ago.”[2] Here is an excerpt of the full list examined in the research, the top 10 for the second day of the block, linked here to their English Wikipedia equivalents:

The researchers propose to name this phenomenon the “gateway effect”, a “mechanism through which repression can backfire inadvertently, without political or strategic motivation”,[2] because it incentivizes people to learn how to evade censorship and thus “have more, not less, access to information and begin engaging in conversations, social media sites, and networks that have long been off-limits to them.”[2] They distinguish it from the Streisand effect, where individuals specifically seek out information that is being hidden.

The second author of the study, Margaret Roberts, is also the author of Censored: Distraction and Diversion Inside China’s Great Firewall (Princeton University Press, 2018; print ISBN 978-0-691-17886-8, e-book ISBN 978-1-400-89005-7).

Marketing, social media, and Wikipedia

Reviewed by Barbara Page

This study was able to “characterize” the interests of Wikipedia editors and the editors’ social media activity on Twitter to facilitate:

A marriage between editor editing topics and Twitter (and possibly Facebook) will result in targeted marketing tailored just for you!

(Photo: Harland Quarrington/MOD, OGL)

[…] building rich user profiles, which can be conveniently used in order to provide personalized contents and offers.” and “[…], i.e., the detection of the user’s core interests and, therefore, allows for product and service recommendations far more tailored than those stemming from other (usually) extemporary actions on the Internet, like flight ticket purchases and hotel reservations. In this light, it is important to notice that such a profiling potential associated to social login remains nowadays largely unused and enabling its exploitation is one of the main goals of the present work.[3]

Conferences and events

See the community-curated research events page on Meta-wiki for other upcoming conferences and events, including submission deadlines.

WMF research showcase

Recent presentations at the monthly Research showcase hosted by the Wikimedia Foundation included the following:

“Conversations Gone Awry: Detecting Early Signs of Conversational Failure”

Antisocial behavior can exist in online social systems and may include harassment and personal attacks. A new paper[4] by seven researchers from Cornell University, Jigsaw, and the Wikimedia Foundation describes how the prediction of undesirable negative exchanges may be able to prevent the deterioration of a discussion. Prediction may be possible at the start of a conversation to prevent its deterioration. One of the authors also gave an interview published on the Wikimedia Foundation’s blog,[supp 1] and the paper was covered in popular media; see In the media § In brief.

Case studies in the appropriation of ORES

From the announcement (by Aaron Halfaker):

Presentation slides about the use of the ORES platform (video)

ORES is an open, transparent, and auditable machine prediction platform for Wikipedians to help them do their work. It’s currently used in 33 different Wikimedia projects to measure the quality of content, detect vandalism, recommend changes to articles, and to identify good faith newcomers. The primary way that Wikipedians use ORES’ predictions is through the tools developed by volunteers. These javascript gadgets, MediaWiki extensions, and web-based tools make up a complex ecosystem of Wikipedian processes – encoded into software.

The presentation covered “three key tools that Wikipedians have developed that make use of ORES”: Wikidata’s damage detection models, exposed through Recent Changes; Spanish Wikipedia’s PatruBOT; and WikiEdu tools from User:Ragesoss that incorporate article quality models.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer

“On the Effects of Authority on Peer Motivation: Learning from Wikipedia” [5] – From the abstract: “We show that lateral authority, the legitimacy to resolve task‐specific problems, is welcomed by members of an organization in the resolution of coordination conflicts, the more so (1) the fiercer the conflict to be resolved, (2) the higher the competence‐based status of the authority, (3) the lower the tenure of, and (4) the more focused the organizational members are. Analyzing the discussion behavior of members of Wikipedia between 2002 and 2014, we corroborate our allegations empirically by analyzing 642,916 article–discussion pages.”

– From the abstract: “We show that lateral authority, the legitimacy to resolve task‐specific problems, is welcomed by members of an organization in the resolution of coordination conflicts, the more so (1) the fiercer the conflict to be resolved, (2) the higher the competence‐based status of the authority, (3) the lower the tenure of, and (4) the more focused the organizational members are. Analyzing the discussion behavior of members of Wikipedia between 2002 and 2014, we corroborate our allegations empirically by analyzing 642,916 article–discussion pages.” “A Comparison of the Historical Entries in Wikipedia and Baidu Baike“ [6] – From the abstract: “This research purposefully chose 6 entries and developed a framework to evaluate their performance in accuracy, breadth, depth, informativeness, conciseness and objectiveness. The result shows that: Wikipedia is superior in most cases while Baidu Baike is a little better in the entries on Chinese history. The operating mechanism is the main reason for it.”

– From the abstract: “This research purposefully chose 6 entries and developed a framework to evaluate their performance in accuracy, breadth, depth, informativeness, conciseness and objectiveness. The result shows that: Wikipedia is superior in most cases while Baidu Baike is a little better in the entries on Chinese history. The operating mechanism is the main reason for it.” “Sentiments in Wikipedia Articles for Deletion Discussions” [7] – From the abstract: “We performed sentiment analysis on 37,761 AfD discussions with 156,415 top-level comments and explored relationship between outcomes of the discussion and sentiments in the comments. Our preliminary work suggests: discussion that have keep or other outcomes have more than expected positive sentiment, whereas discussions that have delete outcomes have more than expected negative and neutral sentiment. This result shows that there tends to be positive sentiment in the comment when Wikipedia users suggest not to delete the article.”

– From the abstract: “We performed sentiment analysis on 37,761 AfD discussions with 156,415 top-level comments and explored relationship between outcomes of the discussion and sentiments in the comments. Our preliminary work suggests: discussion that have keep or other outcomes have more than expected positive sentiment, whereas discussions that have delete outcomes have more than expected negative and neutral sentiment. This result shows that there tends to be positive sentiment in the comment when Wikipedia users suggest not to delete the article.” What are these researchers doing in my Wikipedia?’: ethical premises and practical judgment in internet-based ethnography” [8] – From the abstract: “The article reflects on the heuristics that guided the decisions of a 4-year participant observation in the English-language and German-language editions of Wikipedia. […] it interrogates the technological, social, and legal implications of publicness and information sensitivity as core ethical concerns among Wikipedia authors. The first problem area of managing accessibility and anonymity contrasts the handling of the technologically available records of activities, disclosures of personal information, and the legal obligations to credit authorship with the authors’ right to work anonymously and the need to shield their identity. The second area confronts the contingent addressability of editors with the demand to assure and maintain informed consent.” (See also the Wikipedia essay “What are these researchers doing in my Wikipedia?“)

– From the abstract: “The article reflects on the heuristics that guided the decisions of a 4-year participant observation in the English-language and German-language editions of Wikipedia. […] it interrogates the technological, social, and legal implications of publicness and information sensitivity as core ethical concerns among Wikipedia authors. The first problem area of managing accessibility and anonymity contrasts the handling of the technologically available records of activities, disclosures of personal information, and the legal obligations to credit authorship with the authors’ right to work anonymously and the need to shield their identity. The second area confronts the contingent addressability of editors with the demand to assure and maintain informed consent.” (See also the Wikipedia essay “What are these researchers doing in my Wikipedia?“) “Digging Wikipedia: The Online Encyclopedia As a Digital Cultural Heritage Gateway and Site” [9] – From the abstract: “[…] this article introduces Wikipedia as a digital gateway to and site of an active engagement with cultural heritage. We have developed the open source and freely available analysis architecture Contropedia [website ] to examine already existing volunteer user-generated participation around cultural heritage and to promote further engagement with it. Conceptually, we employ the notion of memory work, as it helps to treat Wikipedia’s articles, edit histories, and discussion pages as a rich resource to study how cultural heritage is received and (re)worked in and across languages and cultures. […] The analysis facilitated by Contropedia […] sheds light on the contentious articulation of perspectives on tangible and intangible heritage grounded by conflicting conceptions of events, ideas, places, or persons. Technologically, Contropedia combines techniques based on mining article edit histories and analyzing discussion patterns in talk pages to identify and visualize heritage-related disputes within an article, and to compare these across language versions.” (cf. earlier coverage: “‘Contropedia’ tool identifies controversial issues within articles“; “Towards better visual tools for exploring Wikipedia article development – the use case of ‘Gamergate controversy)

– From the abstract: “[…] this article introduces Wikipedia as a digital gateway to and site of an active engagement with cultural heritage. We have developed the open source and freely available analysis architecture Contropedia [website ] to examine already existing volunteer user-generated participation around cultural heritage and to promote further engagement with it. Conceptually, we employ the notion of memory work, as it helps to treat Wikipedia’s articles, edit histories, and discussion pages as a rich resource to study how cultural heritage is received and (re)worked in and across languages and cultures. […] The analysis facilitated by Contropedia […] sheds light on the contentious articulation of perspectives on tangible and intangible heritage grounded by conflicting conceptions of events, ideas, places, or persons. Technologically, Contropedia combines techniques based on mining article edit histories and analyzing discussion patterns in talk pages to identify and visualize heritage-related disputes within an article, and to compare these across language versions.” (cf. earlier coverage: “‘Contropedia’ tool identifies controversial issues within articles“; “Towards better visual tools for exploring Wikipedia article development – the use case of ‘Gamergate controversy) “Use of Louisiana’s Digital Cultural Heritage by Wikipedians” [10] – From the abstract: “This case study details an analysis of Wikipedia links to online resources from Louisiana cultural heritage institutions [also known among Wikimedians as GLAMs] in order to determine what types of cultural heritage resources users are citing on Wikipedia, what is the content of the Wikipedia articles with Louisiana CHI citations, and how this can influence the work of CHI. The results of the study include findings that digital library items and archival finding aids are the most cited sources from cultural heritage institutions on Wikipedia and are particularly popular for Louisiana-specific Wikipedia articles on society and the social sciences and culture and the arts.”

– From the abstract: “This case study details an analysis of Wikipedia links to online resources from Louisiana cultural heritage institutions [also known among Wikimedians as GLAMs] in order to determine what types of cultural heritage resources users are citing on Wikipedia, what is the content of the Wikipedia articles with Louisiana CHI citations, and how this can influence the work of CHI. The results of the study include findings that digital library items and archival finding aids are the most cited sources from cultural heritage institutions on Wikipedia and are particularly popular for Louisiana-specific Wikipedia articles on society and the social sciences and culture and the arts.” “The Conceptual Correspondence between the Encyclopaedia and Wikipedia” [11] – From the abstract: “This study […] focuses on the roles and attributes of both printed encyclopaedias and Wikipedia. First, we analyse the roles and attributes of an encyclopaedia by conducting a review of research related to them. Then we analyse whether or not Wikipedia fulfills the same roles and has the same attributes as the encyclopaedia by reviewing academic work that investigates and analyses Wikipedia from various perspectives. The results show that Wikipedia does not conceptually correspond to an encyclopaedia, except in cases where people use it for one-time searches. In the world of digital media, Wikipedia does not have the same status that the encyclopaedia holds in the world of print media.”

– From the abstract: “This study […] focuses on the roles and attributes of both printed encyclopaedias and Wikipedia. First, we analyse the roles and attributes of an encyclopaedia by conducting a review of research related to them. Then we analyse whether or not Wikipedia fulfills the same roles and has the same attributes as the encyclopaedia by reviewing academic work that investigates and analyses Wikipedia from various perspectives. The results show that Wikipedia does not conceptually correspond to an encyclopaedia, except in cases where people use it for one-time searches. In the world of digital media, Wikipedia does not have the same status that the encyclopaedia holds in the world of print media.” “Structural Differentiation in Social Media: Adhocracy, Entropy, and the ‘1 % Effect[12] – From the text: “Over the study period (2001–2010), we observed 235,701,162 edits completed by 22,792,847 unique contributors. Of these, 19,680,637 users were anonymous, identified only by their unique IP addresses. The rest (3,112,210) were registered users who were logged into their respective accounts. […] logged-in users were the clear minority group, yet they contributed far more edits than the anonymous users—all told, those logged-in individuals were responsible for almost two-thirds (68%) of the observed revisions. Even more importantly, the top 1% of all contributors were responsible for 77% of the collaborative effort based upon the extent to which the text of articles was actually changed (i.e., the contribution delta). [… The] simple answer to research question 2 (RQ2), ‘What is the social mobility (or its inverse, elite “stickiness”) of functional leaders on Wikipedia over time?’ is that on average, across the entire 9.5-year period, an individual who was a top contributor at a given point in time had a 40% probability of remaining in the top contributor group 5 weeks later. Twenty weeks later, that individual would have a 32% chance of still being a top contributor, and after 30 weeks, this figure would be at 28%.” In a press release by Purdue University, one of the authors commented: “What we saw is that a clear leadership has emerged, but it’s a leadership that cycles. We have a group of individuals who shape the content by working the hardest and clocking the most hours. The agenda is shaped by these people, and they’re driven by a sense of mission, much like political or religious movements.”[supp 2]

References

Supplementary references:



