Today, Numerai is open sourcing our originality, concordance, and consistency criteria. The code may be found here.

In Numerai’s tournament, data scientists compete to build machine learning algorithms that power our hedge fund. In order for a submission to be eligible for payout, it must be original, concordant, and consistent. They are described on the help page. For example, here’s the definition of originality:

Originality is a measure of whether a set of predictions is uncorrelated with predictions already submitted. Numerai wants to encourage new models over duplicate submissions.

Rationale

We are open sourcing this for four reasons:

We believe transparency is critical to trust, and this is another step in that direction. You don’t have to take our word that our calculations are accurate — you can check the code yourself. It’s important that our data scientists understand what they’re being graded on.

If data scientists understand these checks, they will better be able to create predictions that don’t fall afoul of them, and are therefore more useful to Numerai. In general, we would like data scientists to be able to focus more on improving their models and less on passing these checks.

Our algorithms are imperfect, and sometimes gameable. We would like to improve them so that they are simultaneously more effective at filtering bad submissions and more permissive of genuinely good submissions.

Our code isn’t as clean or performant as we would like. For example, a pain point for many of our users is that the originality score takes a long time to compute.

One objection would be that botters may be able to game these checks since they can read the source code. However, security through obscurity isn’t effective beyond a certain scale. Our tournament attracts sufficient interest that, if there’s a way to game it, cheaters will put in significant resources to do so. We believe the improvements gained by open sourcing the code will do more to thwart bad actors than any advantage they may gain from being able to analyze the source code.

Helping out

The number one reason why organizations don’t open source code they don’t want to keep secret is that most internal code isn’t “fit for public eyes”. They delay until they can clean it up, document it, and make it look respectable, which often never happens.

We’ve chosen instead to release it with essentially zero changes. We’re not claiming the code is perfect, or even very great. It has a lot of room for cleanup, documentation, and improvement. If you see something you don’t like, let us know by creating issues in the repository. If you can fix it (or someone else’s issue), be bold, and send in pull requests.

Numerai is built on the idea that a large group of disparate data scientists will produce better results than any small team. This applies not only to the data analysis within the tournament, but also to the rules and scoring criteria of the tournament itself. We believe you can make significant improvements to the scoring criteria.

We also believe in compensating the data scientists who improve tournament results. As a result, we will provide bounties for some tasks. Bounties will be paid out on a first-come, first-served basis, and will only be denominated in Numeraire, Numerai’s cryptocurrency.

Examples of some initial bounties:

Document each check.

Improve the originality and concordance checks according to the suggestions in this document.

Create a process for easy benchmarking of speed.

Speed up the code.

Over time, new bounties will be added, and existing bounties may be modified. All issues that currently have a bounty on them will be tagged on the issues page. The rules for the bounty program are specified, in Q+A form, here.

Recap