A Guest post to R-bloggers by Bart Blaszczyk.

* * * * * * * *

This week a new data competition for the best recommendation system begins. Similar in a form to the famous Netflix Prize, asks data scientists, algorithm geeks and statisticians to devise the most accurate algorithm that suggests in personalized way what movies may be of interest for visitors of VideoLectures.net – the “YouTube for geeks” website. The algorithm will help scientists and academics navigate through vast volumes of educational contents gathered in the service.

The competition is a scientific event, open for everyone, featuring prizes of $8,000 in total. Winning algorithms will be publicly disclosed after the end, at one of the major scientific conferences in the field of data mining and machine learning, ECML-PKDD. The contest is organized by European Union project e-LICO and hosted by TunedIT, the first open platform for data mining competitions and a collaboration site for data scientists.

Contest website: http://tunedit.org/challenge/VLNetChallenge

VideoLectures.net is a free and open access multimedia repository of video lectures, mainly of research and educational character. The lectures are given by distinguished scholars and scientists at the most important and prominent events like conferences, summer schools, workshops and science promotional events from many fields of Science. The portal is aimed at promoting science, exchanging ideas and fostering knowledge sharing by providing high quality didactic contents not only to the scientific community but also to the general public. All lectures, accompanying documents, information and links are systematically selected and classified through the editorial process taking into account also users’ comments.

The competition is organized in order to improve the website’s currentrecommender system. The challenge consists of two main tasks and a “side-by” contest. Due to the nature of the problem, each of the tasks has its own merit: Task 1 simulates a new-user and new-item recommendation (cold-startmode), while Task 2 simulates a clickstream-based recommendation (normal mode).

Data from VL.Net website does not include any explicit nor implicit user profiles. Due to the privacy-preserving constraints implicit profiles embodied in viewing sequences (clickstreams) are transformed, so that no individual viewing sequence information can be revealed or reconstructed. There are however other viewing-related data included: co-viewing frequencies; pooled viewing sequences; and content-related information: lecture categories taxonomy, names, descriptions and slide titles (where available), authors, institutions, lecture events and timestamps. The dataset, including the Leaderboard and the test set, will remain publicly available for experimentation after the end of the contest.

Authors of the best algorithms will be awarded with prizes worth USD 8,000 in total, they will also have a chance to present their solutions during ECML/PKDD conference. Organizers expect very intense and interesting competition. According to Marcin Wojnarski, founder and CEO of TunedIT: “No doubt, this contest will touch a chord with all data specialists who spent long hours, days and months trying to reach the magic level of 10% improvement in the Netflix Prize. Personally, I wish I could participate myself, as the topic is still very challenging and attractive. I can’t wait to see what progress has been achieved in recommendation systems technology since Netflix Prize era”.

The contest is organized and sponsored by European Union project “e-LICO” and hosted on TunedIT platform for data competitions. It will last till July 8th, 2011. Everyone is welcome to participate.

Competition web page: http://tunedit.org/challenge/VLNetChallenge

e-LICO is a virtual laboratory for interdisciplinary collaborative research in data mining and data-intensive sciences, built through an EU FP 7. e-LICO lab consists of three layers: the e-science layer, data mining layer and application domains, one of which is related to the domain of multimedia repositories and problem of recommendation.

ECML/PKDD – The European Conference on “Machine Learning” and “Principles and Practice of Knowledge Discovery in Databases” provides an international forum for the discussion of the latest high quality research results in all areas related to machine learning, knowledge discovery in databases, data mining and new innovative application domains. ECML/PKDD 2011 is held in Athens, Greece on September 5-9.

TunedIT is the first online laboratory dedicated to development of intelligent algorithms through crowdsourcing – the easiest and most cost-efficient way of performing Research & Development, in massively parallel manner. TunedIT specializes in the fields of Data Mining, Machine Learning, Computational Intelligence and Statistical Modeling. TunedIT runs online data mining competitions that address selected real-world problems of data analysis. Contests are open to the whole scientific community and attract hundreds of teams every time.