Data

At the core of Numerai is a data science problem - the problem of predicting the stock market.

In the provided training_data , each id corresponds to a stock with a set of obfuscated features . The target represents future performance. Rows are grouped into eras that represent different points in time.

Your goal is to train a machine learning model to predict the target given new features .

Read the analysis and tips notebook for an in-depth exploration of the dataset.

numerai_training_data.csv

Modeling

Below is an example how to train a model on the training_data to make predictions on the tournament_data .

Check out the example-scripts repo for more advanced examples.

import pandas as pd from xgboost import XGBRegressor ​ training_data = pd . read_csv ( "numerai_training_data.csv" ) . set_index ( "id" ) ​ tournament_data = pd . read_csv ( "numerai_tournament_data.csv" ) . set_index ( "id" ) feature_names = [ f for f in training_data . columns if "feature" in f ] ​ model = XGBRegressor ( max_depth = 5 , learning_rate = 0.01 , \ n_estimators = 2000 , colsample_bytree = 0.1 ) model . fit ( training_data [ feature_names ] , training_data [ "target" ] ) ​ predictions = model . predict ( tournament_data [ feature_names ] ) predictions . to_csv ( "predictions.csv" )

Submissions

Every Saturday at 18:00 UTC , a new round begins and new tournament_data is released. Submit your predictions to Numerai to enter the tournament.

The submission deadline is Monday 14:30 UTC . Late submissions will not be eligible for payouts.

Use our tools and libraries to connect with our GraphQL API.

predictions.csv

Scoring

Your submission is scored on the correlation between your predictions and the true targets. The higher the correlation the better.

scoring_function.py scoring_function.py ranked_predictions = predictions . rank ( pct = True , method = "first" ) correlation = np . corrcoef ( labels , ranked_predictions ) [ 0 , 1 ]

Your submission will also be scored on your metamodel contribution or mmc .

See the metamodel contribution section for details.

Staking and Payouts

You can stake on your submission to start earning payouts . You can either stake on correlation or corr plus mmc .

Staking requires you to lock up NMR in an Erasure smart contract agreement. This gives Numerai the ability to burn your stake if your model performs poorly.

You earn or burn a percentage of your stake based on the score you are staking on. For example, if you stake 100 NMR on correlation and your score was +0.05 , then you will earn 5% of 100NMR = 5NMR . The maximum you can earn or burn is 25% of your stake each round.

corr_payout = stake * clip ( corr , - 0.25 , 0.25 ) ​ mmc_payout = stake * clip ( corr + mmc , - 0.25 , 0.25 )

See the staking and payouts section for details.

Each submission will receive daily updated scores starting from the first Thursday after the submission deadline to the Wednesday 4 weeks after. For example, if you made the blue submission on Sun 7th , you will receive your first score on Thur 11th and your final score on Wed 7th of the next month.

If you staked on your submission, you will also receive daily updates on your payouts. But only your final score and final payout will count.

submission and scoring calendar

Reputation and Leaderboard

Your rank on the leaderboard is based on your reputation , which is a weighted average of your correlation scores over the past 20 rounds.

See the reputation section for details.

Support

Need help?

Find us on RocketChat for questions, support, and feedback!