I wanted to show the steps leading to the analysis presented in the notebook to demonstrate how easy it is to adapt (retrain) models in TensorFlow. But the mere length of the README might suggest something different.

The reason is that there’s a fair amount of “bookkeeping” needed for the main analysis: ensuring all required packages are installed, downloading the data and retraining the model.

The idea is to use Anaconda Project to really “get to it” as fast as possible, simply by skimming over the bookkeeping part of the project and running everything with a single command. In this post, I’m going to show how to wrap an existing repo, without modifying it, in such a way that Anaconda Project: clones the repo, takes care of the bookkeeping and runs the final Jupyter Notebook for the User to play with. Alternatively, you might want to modify an existing repo and make it an Anaconda Project. Or, if you start a new project, it would probably be more convenient to make it an Anaconda Project and build it from the ground up, as you go along.

Initializing the Project

Let’s first create a directory for our Project, initialize it and add the repo with the PyCon presentation.

To initialize the Anaconda Project, run:

mkdir pycon_presentation_anaconda_project

cd pycon_presentation_anaconda_project

anaconda-project init

This creates the anaconda-project.yml file, which is the heart of the Project. This is the configuration file that will be modified after each anaconda-project command.

Next, we’ll clone the repo with the PyCon presentation:

The --recursive flag will also update the submodule tensorflow within it.

To ensure reproducibility, we need to checkout a particular commit:

cd PyCon-motorcycle-transfer-learning

git checkout 5c00c5b

cd ..

Adding commands

But wait… we can automate the process of updating the submodules and checking out a particular commit. Let’s do that by adding the first command to our project:

anaconda-project add-command clone_repo "git clone --recursive https://github.com/daftcode/PyCon-motorcycle-transfer-learning.git ; cd PyCon-motorcycle-transfer-learning; git checkout 5c00c5b"

We’ll be asked what kind of command is this:

Is clone_repo a (B)okeh app, (N)otebook, or (C)ommand line?

and since this is a command ran in the command line, we choose (C). We can also specify the command type with the --type option. Note, however, that in our case --type unix is the equivalent of choosing (C).

There’s no point in running the command (as we have already taken care of the submodules “manually”), but you could do it by typing:

anaconda-project run clone_repo

Because the directory already exists (you did clone it, right?), nothing is going to really happen. But you might also notice that you get an annoying message:

Potential issues with this project:

* anaconda-project.yml: No commands run notebooks PyCon-motorcycle-transfer-learning/presentation.ipynb, ...

and if you don’t like to be reminded about it every single time you run any command, you can edit the .projectignore file and add the following line:

**.ipynb

Next, we’ll add a command that downloads images needed for retraining. For that, let’s create the following script download_images.sh :

cd PyCon-motorcycle-transfer-learning

for type in classic cross cruiser superbike

do

python image_download.py motorcycle $type

done

that calls the image_download.py script in the PyCon-motorcycle-transfer-learning repo. To add this script to the Project’s commands, type:

anaconda-project add-command --type unix download_images "bash download_images.sh"

This command will download 400 images of four classes of motorcycles (100 each): classic, cross, cruiser, and superbike. Let’s run it:

anaconda-project run download_images

Now, there is a time for the workhorse of this Project — the retrain command:

anaconda-project add-command --type unix retrain "cd PyCon-motorcycle-transfer-learning; python retrain_wrap.py"

but in order to run it correctly, we need to add an additional package: tensorflow .

To add this package so that it’s loaded during the run along with other necessary packages, we’ll use the add-packages functionality:

anaconda-project add-packages tensorflow

The retrain_wrap.py script actually calls the tensorflow/tensorflow/examples/image_retraining/retrain.py script, which downloads the necessary pre-trained model (by default the largest MobileNet available), splits the data into train, validation and test sets, computes the “bottlenecks” (activations on the neurons of the last hidden layer) and then takes care of the whole retraining procedure.

However, we should inspect how the retraining went. For example, we might have overfitted the training data and could have gotten a better model by running the procedure in fewer steps. To inspect the retraining we’ll use TensorBoard (for which the retrain.py script conveniently gathers summary statistics):

anaconda-project add-command --type unix inspect_with_tensorboard "cd PyCon-motorcycle-transfer-learning; tensorboard --logdir summaries"

Note, that you’ll need a browser to view the app at: http://localhost:6006. Once you’re done with analyzing the graphs, hit Ctrl+C to terminate TensorBoard.

If you would like to deploy a web app to a server (TensorBoard, for example), it’s possible to customize the default behavior (as long as the app communicates via HTTP); for details see the Anaconda Project reference.

For our final command, we’ll add the presentation.ipynb notebook:

anaconda-project add-command --type notebook presentation "PyCon-motorcycle-transfer-learning/presentation.ipynb"

Again, we’ll need to add one tiny package to be able to run it: lime . But this is a little bit more tricky because we’re interested in the latest version — 0.1.1.23, which I used for this project — whereas the latest version at the conda-forge channel is 0.1.1.18. However, if we go to www.anaconda.org and search for lime, we’ll see that there’s a channel called “viascience” that offers the coveted, latest version:

anaconda-project add-packages --channel viascience lime

Note, that this time we’re telling Anaconda where to look for it (the “viascience” channel).

We can now run the command:

anaconda-project run presentation

open the notebook and run all commands within it.

Putting it all together

OK, let’s list out the commands we’ve defined:

anaconda-project list-commands

The output should look something like the following:

Name Description ==== =========== clone_repo git clone --recursive https://github.com/daftcode/PyCon-motorcycle-transfer-learning.git ; cd PyCon-motorcycle-transfer-learning; git checkout 5c00c5b download_images bash download_images.sh inspect_with_tensorboard cd PyCon-motorcycle-transfer-learning; tensorboard --logdir summaries presentation Notebook PyCon-motorcycle-transfer-learning/presentation.ipynb retrain cd PyCon-motorcycle-transfer-learning; python retrain_wrap.py retrain cd PyCon-motorcycle-transfer-learning; python retrain_wrap.py

Great, but… what now? Should we then replace our previous README with several anaconda-project run ... calls? Or can we do better?

For me, better means: shorter, so that my README says literally this: “To run the Project, type: anaconda-project run all ”. Frankly, I was expecting there would be a functionality that allows anaconda-project run to run all or several commands in some defined order. But come to think of it, we might simply define an all command like this:

anaconda-project add-command --type unix all "anaconda-project run clone_repo; anaconda-project run download_images; anaconda-project run retrain; anaconda-project run presentation"

(I’ve left out the anaconda-project run inspect_with_tensorboard since it requires that the User hits “Ctrl + C”, which, unfortunately, kills the rest of the anaconda-project process.)

We can now formulate our README.md file:

# To run the project ...

... type:

```

anaconda-project run all

```

## To inspect the retraining process ...

... type (after running `all`):

```

anaconda-project run inspect_with_tensorboard

```

That’s what I call a short README!

Uploading the Project to Anaconda Cloud

Before we upload the Project, we want to make sure it’s as tidy as possible. We can use the:

anaconda-project clean

command to get rid of the envs/ directory with all the required libraries, but notice it left out the PyCon-motorcycle-transfer-learning repo. To exclude it manually, open the .projectignore and add the following:

PyCon-motorcycle-transfer-learning

Note, that the clone_repo command will take care of cloning this repo (along with its submodule tensorflow ) and checking out the right commit.

Now, once our project is as tidy as it can be, let’s upload it with the following, final command:

anaconda-project upload

Et voila, our project is out there: https://anaconda.org/mdzi and once it’s downloaded, in can be unpacked with:

anaconda-project unarchive pycon_presentation_anaconda_project.tar.bz2

Pretty neat :)

For a final touch, I’ve also uploaded the presentation.ipynb notebook to Anaconda Cloud with:

anaconda upload presentation.ipynb

so that you can see the final (though static) result of the Project right away.

Summary

We now know how to use Anaconda Project to wrap up an already existing repo, add commands and dependencies, so that the User can download the Project and run everything with a single command:

anaconda-project run all

Anaconda Project is easy to work with and really allows you to simplify the flow for an end User.

However, if the set-up required for running your project isn’t complicated, you might simply export your environment (instead of defining anaconda-project commands), write a bash script run.sh packing it all up, so that the README would look something like this:

conda env create -f environment.yml

bash run.sh

But Anaconda Project isn’t just adding commands and specifying dependencies— it allows you to: automatically download large files (that you don’t want to keep in version control), pack up web apps (that communicate with HTTP) and run them on a server, handling environmental variables that contain credentials and adding services like Redis (this feature is in a demo phase, though).

It’s worth to try it out and see how you can use it for your Project!

If you enjoyed this post, please hit the clap button below 👏👏👏

You can also follow us on Facebook, Twitter and LinkedIn.