Recently we announced that we were going Back to Basics in 2018 (read more here). This wasn’t only on the product front. In 2018, Kavin Bharti Mittal (our founder/CEO) decided it was also time to re-tool our systems and re-build our culture with one big loaded question:

“How do we supercharge the productivity of our teams so that they can move incredibly fast, take high-quality decisions, to build incredible experiences that our users will absolutely love?”

As a result, there is an incredible amount of work ongoing at Hike working backward of this question.

Today we’re particularly excited to talk about the incredible progress that our team has made on our Data Systems to do exactly that — Supercharge Productivity.

The Start: “Analyst Bandwidth Nahi Hai”

One of the most common phrases we heard from our PMs, EMs & DMs in 2017 was: “Analyst bandwidth nahi hain”. In English, that means “We don’t have any analyst bandwidth”. For a long time, we thought this was ok. It meant that a lot of work was happening and that we had to prioritize. Well, not so much.

Turns out, we had to completely rethink our data systems, given the complexity of our system, and the billions of rows of data being generated everyday. From our OKRs, you can see we had 3 important goals:

Data Query should be Extremely Quick (seconds to minutes for most queries) Everything should be Automated & Self-Serve (Anyone should be able to pull out data) Reduce Cost while doing all of this Make sure “Analyst Bandwidth nahi hai” was a phrase that was never used again :)

Our move to BQ was key to achieving this goal.

Hue + Hive — Our Previous Data Processing System

Hike is generating more than 10 Billion analytics events a day and which comes to around 4 Terabytes in size and we have accumulated more than 2 Petabytes of data. For processing this amount of data, we were spanning more than 100s of Virtual Machines.

Cost & Performance were clearly 2 big nightmares for us.

Previously our data storage backend was Google Cloud Storage or GCS and the processing unit was Hive on Dataproc. We used to run HQL batch jobs on Hive. All the summarized tables were stored in Hive and BigQuery. We were using BigQuery (BQ) only for the Business Intelligence Tool (Tableau). For all ad hoc analysis we were using HUE .