The Bump In the Road

Overall the transition to a microservice architecture had proved beneficial for the company. Gemma had successfully architected a solution that broke the website into 3 distinct problem domains; the accounts service, the store service and the comic-book viewer service.

The Reduction of Downtime

After 3 months in production the comic book company had seen an end to any downtime thanks, in part, to their green-blue deployment strategy. There were no more cases of a “big bang” deployment going live and subsequently dying, bringing down the entire site.

With green-blue deployments, Gemma and her team were able to deploy new versions of their systems alongside their existing, stable releases and perform a multitude of different tests such as canary tests to ensure that when they made the final switch over, everything was working nominally.

The Difficulties with Deployments

However, as Gemma and her team moved to a microservice based approach, they found that the amount of time they spent doing deployments increased.

They were no longer performing “one” release of their monolithic application, they were deploying anywhere from 2 to 6 different services every time they performed a release. This consisted of both the new green-blue instances of each of their 3 services.

This obviously wasn’t ideal, what they had gained in terms of less downtime and more resilient architecture, they had lost in terms of developer productivity. Being a small team, if one developer was being dragged into 4 hours worth of deployments every 3 days then it really starts to hurt team productivity.

The Rise of Kubernetes

After hitting this stumbling block, Gemma was forced to research further and find a way they could automate releases in such a manner that they could reclaim this lost time and focus more on delivering key business value.

This is when she discovered Kubernetes. She had a look at Kubernetes: Up and Running and started getting herself well versed in the art of using Kubernetes.

What Kubernetes essentially allowed her and her team to do was define their entire system as code. Running this on top of AWS’s new managed Kubernetes service this allowed them to save a lot of time with regards to managing their overall estate.

The only issue this raised was that the team required some time to learn the ins and outs of the managed Kubernetes service and the underlying Kubernetes technology. With the amount of time this would save further down the line, Gemma saw no issue with this.

AWS’s EKS — Managed Kubernetes Service

The Loss of Traceability

Another somewhat troublesome issue that arose when moving to a microservice based architecture was the loss of traceability with regards to what went wrong.

As the number of services in any application grows, being able to trace requests across the various microservices constructing that application becomes troublesome.

If something was to go wrong within one of the services, how could they trace it through all the subsequent systems it calls easily?

Initially a lot of time was wasted using inefficient debugging mechanisms. Every time an engineer was called up at the weekend due to an issue they would loathe coming online and trying to debug what had gone on. Any issue faced took a minimum of an hour to debug, even in the situations where it was a non-issue.

The Answer to Traceability: Zipkin

Gemma turned to a system called Zipkin which is based on the ideas from Google’s own tracing system, Dapper. Zipkin essentially gives you a very detailed tracing of all interservice calls and essentially allowed Gemma and her team to trace through any and all issues faced within their services.

Again more time had to be sunk into learning this new tool and ensuring all the developers in her team were comfortable with effectively debugging using the tool.

However, once they were familiar with Zipkin they were able to utilize these pretty awesome looking call traces.

Zipkin traces typically look like this

3 Months Later — The Conclusion

6 months after moving towards a microservice based architecture the team had seen massive improvements in terms of the speed at which they were able to make changes.

They had seen massive improvements in terms of resiliency and customer experience was vastly improved. They were able to quickly deploy changes and improvements that users loved and overall the company was able to dominate the online comic book market.

The team spent a lot of time learning the newer technologies but the time investment made initially paid real dividends in the end.