1) (Method one) Download the python .whl package

If you were a noob like me, a whl file is basically a zip that contains the library files that pip will automatically download and install for you when you install things normally with python.

The first step is get the .whl pkg of the library or package you want. This can be down with this simple command. Note the lirary we want is fuzzywuzzy 0.17, which is used for fuzzy string matching in NLP. Run the command in a new folder in bash:

pip download -d . fuzzywuzzy==0.17

pip = Python package installer

-d = save to this directory

version = you can specify the version you would like.

Now we have downloaded the whl and it looks like something like this:

Downloaded .whl package from pip

We also need the dependencies for this file. To check the dependencies, I found a custom library on pip called pkginfo.

pip install pkginfo

Now we can run the following commands inside a python shell:

As show, we need another dependency called “python-levenshtein (≥0.12)” so lets go ahead and download it in the same folder:

pip download -d . python-levenshtein==0.12

Dependencies for the original library

Now we can see we also got the setuptools library. Lets see if this whl is necessary.

2) Extract the necessary files and detective work

We do not actually need all of the files that are included inside the .whl file. If we extract the whl using winzip or 7zip, we will see some packages include extra dist-info folders or files that we do not need.

We only need the folders that contain the essential files:

Extra dist-info folder we do not need

FuzzyWuzzy came with both a folder of its same name, containing the functions we want, and a *dist-info folder which isnt used.

fuzzywuzzy folder

To make sure what dependencies we actually need, lets investigate the file imports for each of the fuzzywuzzy files.

fuzzy __init__.py

fuzzy fuzz.py

fuzzy StringMatcher.py

We have found a dependency! We can see that in the StringMatcher.py we need the Levenshtein import so lets extract it using winzip again.

python Levenshtein dependency extract

We can see the extracted Levenshtein folder here but also a setup.py file. The setup.py uses setuptools and that is why that .whl was included when we ran the pip download. If we check the Levenshtein folder and investigate the .py files we can see that they do not depend on any more files.

So now we are good to go! Lets create a new folder which has only the “essential” files we need for the library. These are the same as only those folder with the name:

Top needed folder

Tree contents (tree / f in cmd)

3) Zip folder and send to Azure ML

Lets zip up the folders and upload to Azure ML under datasets. This makes it available to us in the pipeline. Make sure the zip contains these two folder right as its opened up. No parent folder necessary.

Notice how the data set is connected. It is in the third import for the Execute Python Script option.

Now inside the execute Python Script make sure you import the functions you need. This will automatically search for the dependencies inside the zip.

4) Check log output to verify its working

The imported library will now work and you can use its functionality inside of ML studio!

I made this because I count find any help on the documentation that made sense, I used this stack overflow post to guide through.