This is Part 2 of a feature story profiling the ‘Allocations’ team at GOJEK. For Part 1 of the story, please click here.

The Infinite Onion

Every onion layer you peel is accompanied by more tears. It seems like an endless problem. And just when you think it’s done, there’s another layer. For the next 3 months, it was onion after onion, layer after layer across teams at GOJEK. Downtimes were the new normal by the beginning of 2016.

Back to square one.

The ‘Broadcast algorithm’ the bid engine team was relying on was failing. But how?

Every driver was seeing the same order multiple times. The algorithm ‘broadcasted’ the same order across its driver database. So if there were 100 orders in a specific area and 200 drivers, each driver would see the order, but not necessarily be able to fulfil it. There was a three-fold problem to the algorithm: Accountability, High-concurrency and promoting Unhealthy competition.

Accountability: How can we reward the drivers who are doing more orders, zero cancellations etc… when he/she simply couldn’t accept the order? How can we deny bonus, because by design, a driver was not getting an order due to a dozen reasons? There was no accountability for the driver, or the business fundamentals.

High-concurrency: The sheer volume of orders meant drivers were missing out on orders because it was blasted across phones. Some orders were not being fulfilled because of multiple blasts and server loads. More orders, less drivers = some orders not being fulfilled, which resulted in a poor customer experience.

Note: The location-based orders are a peculiar problem for GOJEK.

Why? In a distance of 20metres, you’ll spot more than 30+ GO-RIDE scooters, as opposed to maybe a maximum of 10 cars.

Unhealthy competition: Once you’re blasting an order to all, you’re not factoring in quality drivers for customers. We were also not getting the nearest driver for an order. This breeds unhealthy competitiveness among drivers.

There is an adequate probability of doubt in the nature the algorithm was designed, and other constraints that are outside of the realm. Who gets the order became a function of the phone — better GPS, hardware, Internet, software; all played a critical portion. And that was unfair. So zero accountability and high congestion of drivers meant things were going awry.

10x growth, 100% failure

When Niranjan pulled a couple of all nighters and and rewrote the code, the core portion was rewritten to make it a SPIKE. What is a spike? You break the rules and throw caution to the air with the objective of shipping something out to keep the company afloat. The problem with SPIKE is that it wasn’t the end-solution. And that meant more downtimes and more failures. But, the team was in murky waters by late 2015.

At this point, GOJEK was managing 300,000+ orders every day. Failures were routine. Again. Wherever Nadiem went, he was questioned on why the app was crashing or users could simply not find customers. At this point, the tech team was made up of around 10 people, who were firefighting every day. When Shobhit, one of our star programmers, went to a Domino’s store nearby to grab a quick bite, drivers started questioning him. Anyone who wore a GOJEK T-shirt became the unofficial complaint box. Something needed to change, and fast.

This was again an underestimation of how much Indonesians relied on GOJEK. Everyone wanted to use GOJEK. It made life easier in the traffic-congested glut that was Indonesia. Importantly, jobs and lives depended on it.

Decisions…

Nadiem’s internal mail

“No project has a budget and impact as big as this in GOJEK’s history”

The big rewrite — The Perfect Allocation

The team needed to work on a different algorithm: 1–1 personalisation, pin accountability on drivers, identify what a perfect driver looks like, and ideate on how to frame this persona. The big rewrite began in the middle of 2016. The ‘bid engine’ team was now rechristened as the ‘Allocations’ team. At this point, we were still losing customers. There were leaky faucets that were not sealed. After all, the work of the Allocations team criss-crossed all of GOJEK’s products and services. It was time to revisit the mothership.

Hello Clojure.

Back to square one. Back to taking risks. By now, the core team was all too familiar with handling high-pressure timelines and live codebases. Clojure was an obvious choice because of the specific complexities it intended to solve.

“Only two in the team knew Clojure then, but it solved an important business problem. We went with it and we all had to learn. Back to school. Again.” — Niranjan Paranjape

The first task was to replicate the bid engine logic. A 6-member team got to work with Clojure. Why Clojure? Because the language designs better abstractions for a specific problem the team needed to solve. While Golang was the modern superbike that had it all, Clojure was the cruiser — really simple and capable of designing complex code. Clojure ushered this idea of getting organised and ensuring good software development practices.