Modin

Modin is an early-stage project at UC Berkeley’s RISELab designed to facilitate the use of distributed computing for Data Science. It is a multiprocess Dataframe library with an identical API to pandas that allows users to speed up their Pandas workflows.

Modin accelerates Pandas queries by 4x on an 8-core machine, only requiring users to change a single line of code in their notebooks. The system has been designed for existing Pandas users who would like their programs to run faster and scale better without significant code changes. The ultimate goal of this work is to be able to use Pandas in a cloud setting.

Installation

Modin is completely open-source and can be found on GitHub: https://github.com/modin-project/modin

Modin can be installed from PyPI:

pip install modin

For Windows, one of the dependencies is Ray. Ray is not yet supported natively on Windows, so in order to install it, one needs to use the WSL(Windows Subsystem for Linux).