Why environments matter for Data Science

Environments are the space in which developers work, learn, and create. For example, if you plan to run Python code, you must have some local software setup to practice Python. That setup is called a programming environment. These environments contain the specified tools required for a developer to create and test code. For example, an environment might contain Python and some packages. Once an environment is set up correctly, a developer can work unimpeded and seamlessly share environment specs with others.

Newer developers often install everything at the system level do to a lack of understanding of, or experience with, virtual environments. Packages installed with pip are placed at the system level. The result of doing this for every project is a bloated and unmanageable singular Python environment.

Effective environment management saves time and allows developers to create an isolated software product such that collaborators or contributors can recreate your environment and run your code.

Pipenv combines package management and virtual environment control into one tool for installing, removing, tracking, and documenting your dependencies; and to create, use, and manage your virtual environments. Pipenv is essentially pip and virtualenv wrapped together into a single product.

We have all encountered this error in our development process.

ModuleNotFoundError: No module named 'pandas'

This error implies that the module, which is also called a dependency or package, can not be found by Python. You are sure you installed pandas at some point or another, but where is it?

The primary purpose of Python virtual environments is to create an isolated environment for Python projects. Proper isolation means that each project can have its dependencies, regardless of what dependencies every other project has. The above error and many others can be avoided with proper maintenance of environments and dependencies. Sanitary environment management practice reduces dependency version conflicts between your projects and keeps the base development environment from becoming bloated with packages.

Data science and deployment issues

Data Scientists are often interdisciplinary and have not been formally taught to work collaboratively with others and push projects into production. Hence good environment and module management skills are often lacking. This can cause issues with code reproducibility or difficulty advancing or sharing a project. Reproducible data science projects are those that allow others to recreate and build upon your analysis and to reuse and modify your code easily.

Sanitary environment management practices reduce dependency version conflicts between your projects and keep the base development environment from becoming bloated and unmanageable, helping users to create reproducible projects.

Courtesy of xkcd

Pipenv: a better workflow

Pipenv’s combination of package management and virtual environment control into one tool makes it a fantastic tool for data scientists and developers.

When you begin a project with Pipenv, the tool automatically creates a virtual environment, a Pipfile, and a Pipfile.lock. The Pipfile, which is similar to a requirements.txt, handles dependency management. The Pipfile is automatically updated with the new dependencies when you use the Pipenv install.

To manage complex dependencies, Pipenv keeps a tree of our project’s dependencies in a file called Pipfile.lock; for example, old versions of dependencies that depend on other old versions of dependencies. Pipfile.lock also verifies that correct versions of dependencies are used in production.

Finally, with Pipenv, you present to others a standardized way to install project dependencies and testing and development requirements.

Pipenv is an environment manager and a package manager. This means that Pipenv makes it possible to create an environment with Python then download and install packages into an environment with pipenv install .

This command will look to the Pipfile to create an environment the Pipfile exists; if not, Pipenv will create a Pipfile for this environment.

Packages appended to this command will be added to the Pipfile.