Visualizing data helps in building a much deeper understanding of the data and fastens analytics around the data. There are several mature paid products available in the market. Recently, I explored an open-source product name Apache-Superset which I found a very upbeat product in this space. Some prominent features of Superset are:

A rich set of data visualizations.

An easy-to-use interface for exploring and visualizing data.

Create and share dashboards.

After reading about Superset, I wanted to try it, and as Superset is a python programming language based project, we can easily install it using pip , but I decided to set it up as a container based on Docker. You can follow my other post on how simple is to explore Superset using Docker. Apache-Superset GitHub Repo contains code for building and running Superset as a container. Since I want to run Superset in a completely distributed manner and less modification is possible in the code(my opinion), I decided to modify the code so that it could run in multiple different modes. Below is a list of specific changes/enhancements done in the code

A different version of the Superset image can be built using the same code.

Superset configuration can be easily edited and mounted into the container, with no need of rebuilding the image.

Asynchronous query execution through Celery based executor and managing it through Flower UI

Exploration made easy

While for exploring a project, development mode is an excellent choice, however, it would be great if initial exploration happens with all the features for instance, in-case of Superset, running queries in async mode, and storing the result in the cache. You can explore Superset smoothly by the below commands.

First, pull the docker-superset image from docker-hub

pull docker-superset image from docker-hub

Get docker-compose.yml and superset-config.py from code-base and follow the same directory structure as in a code-base.

Lastly, start a Superset image as a container in a `local` or `prod` mode using `docker-compose`:

start the container using docker-compose

Running Superset in a complete distributed mode

As per my understanding, running a Superset in the production environment for serving thousands of end-users setup should be distributed in nature and can be easily scalable as per the requirements. The below image depicts such setup

Superset setup in a complete distributed mode

Published docker-image of Superset can be leveraged to achieve the above-depicted image

Load-balancer in front for routing the request from clients to one server container.

Multiple containers in server mode for serving the UI of the Superset. Starting a server container using docker run can be done as

Run container as a server

Multiple containers in worker mode for executing the SQL queries in an async mode using Celery executor. Starting a worker container using docker run can be done as

Run the container as a worker

Centralized Redis container or Redis-cluster for serving as cache layer and Celery task queues for workers.

Centralized Superset metadata database.

I found setting up a Superset as Docker container is quite easy and the same can be used for different environments. You can similarly explore Superset.