Statistics about one of the most used Python libraries

From Computer Science grads to Math majors, from data scientists to software engineers, from mechanical engineers to architecture students, Numpy is the quintessential Python library. Literally everyone knows about it. We all know it is famous. But upon Googling for a few days, I found it tough to get a sense of how big was the Numpy community.

So, here’s an attempt to capture Numpy in numbers.

Numpy being a Python package is stored on PyPi. PyPi, often referred as Cheese Shop, stands for Python Package Index. It is a repository of software for the Python programming language (remember: Dockerhub, Maven?)

A good way of estimating the reach of the project is to go to its source and know how many people download it. Now, there are lots of other ways to count community (# of active users). Also there are drawbacks with looking at PyPi downloads (not all PyPi downloads translate into “users”). But given the data that I was able to collect, it seems like a fair assessment.

30-day Most Downloaded PyPi packages

Rank 15: numpy (45k downloads in last 30 days)

30-day Numpy Download by Category

In the last 30 days, which category has contributed the most to the numpy downloads.

Following SQL query does exactly that!

Pip is the most used project for Numpy PyPi download (43M)

download (43M) Homebrew languishes at the #11 spot with mere 2.5K downloads

Until today, I knew Bandersnatch as

Bandersnatch (Google search)

Today, I found bandersnatch is also a PyPi mirror client.

pip install bandersnatch

Table with Heatmap (left) | Treemap (right)

Weekly Numpy downloads by Python versions

Python2 officially discontinued as it reached its EOL (End of Life) in Dec 2019. For further info — https://www.python.org/doc/sunset-python-2/

Having said that, we continue to track rest of the python 3+ versions over the course of past 4 weeks. Surprisingly, there’s still a lot of py2.7 numpy’s in January 2020 (4M+ weekly).

Python 2.7 continues to be in usage (hovering around 4.5M mark)

Python 3.6 is the standard for Python 3 version.

Stacked Column Chart

Geographical distribution of Numpy Downloads

In order to know the Numpy downloads by country, this query was used. It revealed interesting things

US accounted for 32M downloads followed by IE (Ireland) 3M. Japan & Germany (1M each) cap the unicorns of Numpy countries .

. Australia, China, India, Singapore (Asia Pacific) countries are in the 0.5M to 1M range.

I’ve to admit, I was surprised to see Ireland at #2 and China, India way below with a mere 0.5M, compared to the 32M of United States of America.

Geomap | Numpy

Positive about Google’s Data Studio is it’s ability to detect country using the country code and convert that to the value for a GeoMap. However, one big limitation of Google’s Data Studio is limited features and restriction for shades. Tableau on other hand is much more powerful with lots of customization and enhancements.

Temporal distribution of Numpy Downloads

In order to capture numpy downloads spread over time, SQL’s group by construct is used.

December was the highest grossing month for Numpy downloads till Jan 19 data with 46M downloads.

With just 19 days in month of January, it is poised to beat December number as it is already 30M currently (16M to go).

Last 6 month data shows a positive (growing) trend