For a pretty decent amount of time, doing analytics with MongoDB required additional overhead compared to modern SQL RDBMS and Data Warehouses associated with aggregation pipeline and MapReduce practices. While this approach allowed one to craft advanced custom tailored aggregation algorithms it required additional knowledge of how to build and maintain it.

To fill this gap, MongoDB released the MongoDB connector for BI, which acts as a MySQL server on top of your MongoDB data. Under the hood it bridges existing aggregation mechanisms to the MySQL protocol, allowing standard MySQL clients to connect and issue SQL queries.

In this short 30-minutes tutorial we’ll setup the MongoDB connector for your local MongoDB instance and spin up a Node.js application that provides an analytics API, query cache, and orchestration using the Cube.js analytics framework. Cube.js can be used as a standalone server or it can be embedded as part of an existing Node.js web application. You can learn more about it here.

Setting up the MongoDB connector for BI

To install the MongoDB connector for BI locally you can use either quickstart guide or one of the platform dependent installation guides.

Please make sure you use the MongoDB version that supports the MongoDB connector for BI. In this tutorial we use 4.0.5.

If you don’t have a local MongoDB instance please download it here. The BI Connector can be downloaded here.

After the BI connector has been installed please start a mongod instance first. If you use the downloaded installation it can be started from its home directory like so:

$ bin / mongod

The BI connector itself can be started the same way:

$ bin / mongosqld

Please note that mongosqld resides in another bin directory. If everything works correctly, you should see a success log message in your shell for the mongosqld process:

[ initandlisten ] waiting for connections at 127.0 .0 .1 : 3307

If you’re using the MongoDB Atlas you can use this guide to enable BI connector.

Importing test dataset

You can skip this step if you already have data in your DB that can be analyzed. Otherwise you can use the zip code test dataset from MongoDB to populate your DB with some test data.

Download zips.json and run mongoimport from the MongoDB home directory:

$ bin / mongoimport --db test --collection zips --file <path/to/zips.json>

Please make sure to restart the MongoDB BI connector instance in order to generate an up-to-date MySQL schema from the just added collection.

Spinning up a Cube.js application

We’ll use Cube.js to make analytic queries to our MongoDB instance. To install its CLI run:

$ npm install - g cubejs - cli

To create a new Cube.js application with MongoBI driver run:

$ cubejs create mongo - tutorial - d mongobi

Go to the just created mongo-tutorial directory and edit the just created .env file: replace placeholders with your MongoDB BI connector credentials.

By default it should be something like:

CUBEJS_DB_HOST = localhost CUBEJS_DB_NAME = test CUBEJS_DB_PORT = 3307 CUBEJS_DB_TYPE = mongobi CUBEJS_API_SECRET = 941 ed7ad8a49bec1b3f87a9f48bff2a5e549e946fc22fcc5f18c3a17bf62c64ed3398e99b271cd66d4521a6fd7caa4bfd268dfccea59ffd9c67de660f3967338

Now generate Cube.js schema files for the zips collection from the test data set or for your own collection:

$ cd mongo - tutorial $ cubejs generate - t zips

In order to start a Cube.js dev server you’ll also need a locally running Redis instance, which is used for cache and query queue orchestration. You can download it and run it using these instructions.

If everything went smoothly, you’re able to run the Cube.js dev server:

$ npm run dev

If the server started successfully, you can now open http://localhost:4000 and navigate to the Cube.js dev environment. There you should see a working example of a pie chart.

Building a Dashboard

The Cube.js dev environment example contains all essential client pieces to build an analytics dashboard. Let’s modify it a little bit so it looks more like a dashboard and uses the zips collection.

Replace the contents of index.js in your dev environment code sandbox:

import React from "react" ; import ReactDOM from "react-dom" ; import "antd/dist/antd.css" ; import { Row , Col , Card , Layout , Spin } from "antd" ; import cubejs from "@cubejs-client/core" ; import { QueryRenderer } from "@cubejs-client/react" ; import { Chart , Axis , Tooltip , Geom , Coord , Legend } from "bizcharts" ; const renderChart = resultSet => ( < Chart height = { 400 } data = { resultSet . chartPivot ( ) } forceFit > < Coord type = "theta" radius = { 0.75 } / > < Axis name = "Zips.count" / > < Legend position = "bottom" name = "category" / > < Tooltip showTitle = { false } / > < Geom type = "intervalStack" position = "Zips.count" color = "x" / > < / Chart > ) ; const cubejsApi = cubejs ( "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1NDkzMDk3NzMsImV4cCI6MTU0OTM5NjE3M30.eXEdfUa_ek2V9MlGTpBMOd_AFfs8laaZj8ZsuM1wqqo" , { apiUrl : "http://localhost:4000/cubejs-api/v1" } ) ; const { Header , Footer , Sider , Content } = Layout ; const App = ( ) => ( < Layout > < Header > < h2 style = { { color : "#fff" } } > MongoDB Dashboard < / h2 > < / Header > < Content style = { { padding : "25px" , margin : "25px" } } > < Row type = "flex" justify = "space-around" align = "top" gutter = { 24 } > < Col span = { 24 } lg = { 12 } > < Card title = "Zip count by state" style = { { marginBottom : "24px" } } > < QueryRenderer query = { { measures : [ "Zips.count" ] , dimensions : [ "Zips.state" ] } } cubejsApi = { cubejsApi } render = { ( { resultSet , error } ) => ( resultSet && renderChart ( resultSet ) ) || ( error && error . toString ( ) ) || < Spin / > } / > < / Card > < / Col > < / Row > < / Content > < / Layout > ) ; const rootElement = document . getElementById ( "root" ) ; ReactDOM . render ( < App / > , rootElement ) ;

In the development environment, Cube.js doesn't enforce the use of the token to authorize queries, so you can use any string for your token here. You can learn more about using and generating tokens in the production environment here in the docs.

If everything worked well, you should see the following dashboard:

Why Cube.js?

So why is using Cube.js better than hitting SQL queries to MongoDB directly? Cube.js solves a plethora of different problems every production-ready analytic application needs to solve: analytic SQL generation, query results caching and execution orchestration, data pre-aggregation, security, API for query results fetch, and visualization.

These features allow you to build production grade analytics applications that are able to handle thousands of concurrent users and billions of data points. They also allow you to do analytics on a production MongoDB read replica or even MongoDB main node due to their ability to reduce the amount of actual queries issued to a MongoDB instance. Cube.js schemas also allow you to model everything from simple counts to funnels and cohort retention analysis. You can learn more about it here.

Performance considerations

In order to be able to handle a massive amount of data, Cube.js heavily relies on pre-aggregations. As of now, the MongoDB BI Connector doesn’t support Create Table as Statement, which is required to materialize query results right in your database and create pre-aggregations. If you need to analyze well over 100M of data points in MongoDB please consider using Presto with the MongoDB Connector, which is also supported by Cube.js.