By now, we already know the importance of Version Control, let’s go ahead and implement it to see the real use.

Before going ahead, make sure that we have DVC installed in the system. We can check that based on the operating system we are using. Since, DVC can only be installed using pip command, it is necessary to make sure the Pip is installed in the system. It can be checked using the command:

$ pip -V

Once we are sure that the pip is installed, we can go ahead and install DVC by using the following command:

$ pip install dvc

Once we have DVC installed in the system, let’s go ahead and take a real life case and see how it works.

Example with Steps

For this I am taking a code from Numerai, which allows any data scientist to build machine learning models on their data, and submit predictions to control the capital in their Hedge Fund.

Numerai abstracts its financial data, data scientists do not know what the data represents and human biases and overfitting are overcome.

They also have an unique way of providing benefits with their own crypto currency which they call Numeraire.

In February, Numerai announced Numeraire, a cryptographic token to incentivize data scientists around the world to contribute artificial intelligence to our hedge fund (see Forbes, Wired, Smith+Crown). Earlier today, the Numeraire smart contract was deployed to Ethereum, and over 1.2 million tokens were sent to 19,000 data scientists around the world. — Source

I will not talk more about Numerai here, but I will definitely mention it again in my next post in details about Cryptocurrency and Blockchain technology.

Once signed in, Numerai data can be downloaded from the website of Numerai, where they update the data after every 4 days. So, it might be possible that by the time you are reading this post, new data set is available by now.

Irrespective of what dataset you download, the following steps will almost be same with a few modifications:

Steps

First initialize a git repo and put the downloaded code there.

$ mkdir numerai_code #create a repo

$ cd numerai_code

$ git init # initialize git

$ git add numerai_code_downloaded # Add downloaded data to git repo

$ git commit -m 'Numerai code added'

$ git push Install and Initialize DVC repository

$ pip install dvc

$ dvc init ‘numerai_training_data.csv’ and ‘numerai_tournament_data.csv’ will be present in the dataset downloaded which can be used to train and predict the results respectively. Now, the time is to create a prediction model that predicts the data based on the dataset available and then put that file in the same repository (numerai_code) as above. Let’s call it ‘prediction.py’

For the prediction model, I used an LSTM (Long Short Term Memory) Recurrent Neural Networks in Python with Keras.

To know more about the RNN and LSTM architecture and other deep learning terms, please have a look at this post:

5. Run the python code within dvc with the following command:

$ dvc run python prediction.py

6. The model saves the checkpoints in a CSV file with the name assigned and the saved CSV can be submitted on the Numerai submission page.