We are pleased to announce the launch of the Wikipedia Participation Challenge, a data modeling competition to develop an algorithm that predicts future editing activity on Wikipedia. The competition is hosted by Kaggle, a platform for data modeling and prediction competitions. The Participation Challenge is open to community members and anyone else who is interested in analyzing Wikipedia data. This is the first of two data competitions the Wikimedia Foundation will sponsor this year.

The goal of this competition is to gain a better understanding of the factors that encourage or discourage people from editing Wikipedia. Increasing the number of active editors is one of our strategic priorities. Both the Wikipedia communities and the Wikimedia Foundation stand to benefit from models that quantify the factors that determine whether a Wikipedia editor is likely to continue contributing. The competition asks contestants to develop a model to predict the number of edits a given editor will make in six month’s time.

The data used in this competition comes from the publicly available English Wikipedia XML data dump. An anonymous donor has generously contributed $10,000 as prize money. There will be a Grand Prize for the best prediction, as well as special prizes awarded for the use of open source software. The Grand Prize winner will also be given the opportunity to present their prediction model at the 2011 IEEE International Conference on Data Mining. The competition starts today and will continue until September 20, 2011.

Head over to our competition portal, download the data, and start crunching the data! And don’t forget to follow us on Twitter: #wikichallenge and @dvanliere.

Howie Fung

Senior Product Manager, Wikimedia Foundation

Diederik van Liere

Research Consultant, Wikimedia Foundation

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.