The traditional crowdsourced machine learning tournament depends on a holdout dataset. The holdout data is some historical data known to the tournament organizer and unknown to the data scientists participating in the tournament. Data scientists’ submissions are graded and paid based on their ability to predict this holdout dataset. This creates an incentive to predict the holdout set as closely as possible, but there is no incentive to build models that generalize to the future. Data scientists are being rewarded to predict the past. This incentivizes overfitting, the primary enemy of data-driven endeavors.

In data science, logloss is a standard metric for measuring how good a set of predictions are. Data science competitions use logloss to rank competitors. On each submission, a data scientist is given a public logloss to indicate how good the predictions performed on the public holdout dataset. The major problem with this approach is that getting consistent feedback from the competition enables competitors to tailor their predictions to the feedback itself rather than solving the actual problem. This enables the overfitting that is incentivized by holdout dataset-based rewards.

There are many attempts to mitigate the overfitting that data scientists are incentivized to achieve in this tournament format. Most approaches involve complicating the selection of holdout sets and diminishing the usefulness of the logloss reported to the data scientists. Rather than bring together thousands of data scientists to achieve a good logloss on the past, Numerai’s only interest is to predict the future.

Incentivizing Generalization

To perfectly align incentives with data scientists, Numerai no longer has a holdout dataset or a leaderboard, either public or private. Rather than hide information from the data scientists, Numerai gives data scientists all known information. Instead of grading data scientists on a fixed set of past data, data scientists are graded on future data once it becomes known. Four weeks after a tournament begins, the actual outcome of what was being predicted is known. Data scientists are then ranked and paid both USD and Numeraire based solely on their ability to predict those four weeks. This makes the overfitting problem the direct adversary of the data scientists.