

Fri Mar 01 2019 by Wolfram Hempel ( @wolframhempel Fri Mar 01 2019

Service Mesh VS API Gateway VS Message Queue - when to use what?

Let's skip the pitch for microservices - you already know what they are and why they make sense. In fact, few topics have received as much coverage in recent years as the unsurprising fact that breaking down a big thing into many small ones can make it easier to handle.

The trouble is: once we've shattered our monolith, how do we put it back together into a larger system that still makes sense? Despite what Istio, Kong or Kafka enthusiasts will tell you, there's more than one answer to this question and different solutions are differently suited for different needs.

This post aims to shed some light onto the various ways to organize communication amongst microservices and when a Service Mesh, an API Gateway or a Message Queue might be the best solution for your needs.

But before we talk about the solution, let's talk about the problem:

So - what's the problem?

To function properly, microservice-based architectures have to tackle a number of challenges specific to their distributed nature:

Resiliency

There might be dozens or even hundreds of instances of any given microservice - each of which might fail at any point in time for any number of reasons.

Load Balancing & Auto-Scaling

With potentially hundreds of endpoints capable of fulfilling a request, routing and scaling are anything but trivial. In fact, one of the most effective cost-saving measures for large architectures is to increase the precision of routing and scaling decisions.

Service Discovery

The more complex and distributed an application, the harder it becomes to find existing endpoints and to establish a communication channel with them.

Tracing & Monitoring

A single transaction in a microservice architecture might travel through multiple services, making it hard to trace its journey.

Versioning

As systems mature it becomes paramount to update available endpoints and APIs while simultaneously ensuring that older versions remain available.

The solutions

Alright, time to meet the contenders for solving these problems: Service Meshes, API Gateways, and Message Queues. Of course, there's also a number of other approaches, ranging from simple static load balancing and fixed IPs to central orchestration servers - but for the purpose of this post, let's look at the currently most popular and in many ways most sophisticated options.

API Gateways

An API Gateway is the bigger brother of the good old reverse proxy for HTTP calls. It is a scalable, usually web-facing server that can receive requests from both public internet and internal services and forward them to the best suited microservice instance. API Gateways usually come with a number of helpful features, including load balancing and health checks, API versioning and routing, request authentication & authorization, data transformation, analytics, logging, SSL termination and more. Examples for popular open source API Gateways are Kong or Tyk. Most cloud providers offer their own implementation as well, e.g. AWS API Gateway, Azure Api Management or Google Cloud Endpoints.

Benefits

API Gateways are powerful in features, comparatively low in complexity and easily understood by seasoned web-veterans. They provide a solid layer of defense against the public internet and offload a lot of repetitive tasks, such as user authentication or data validation.

Downsides

API Gateways are fairly centralized. They can be deployed in a horizontally scalable fashion, but unlike service meshes, they still require a single point to register new APIs or change configuration. Seen from an organizational perspective, they are likely to be maintained by a single team

Service Meshes

Service Meshes are decentralized and self-organizing networks between microservice instances that handle load balancing, endpoint discovery, health checks, monitoring, and tracing. They work by attaching a small agent, referred to as a "sidecar" to each instance that mediates traffic and handles instance registration, metric collection, and upkeep. Whilst conceptually decentralized, most service meshes come with one or more central elements to collect data or provide admin interfaces. Popular examples include Istio, Linkerd or Hashicorp's Consul.

Benefits

Service meshes are more dynamic and can easily shift shape and accommodate new functionalities and endpoints. Their decentralized nature makes it easier to work on micro-services within fairly isolated teams

Downsides

Service meshes can be quite complex and require a lot of moving parts. Fully utilizing Istio, for instance, requires the deployment of a separate traffic manager, a telemetry gatherer, a certificate manager and a sidecar process for each node. They are also a fairly recent development, making something that constitutes the very backbone of your architecture worryingly young.

Message Queues

At first glimpse, comparing service meshes to message queues seems like comparing apples to oranges: They are completely different things, but they solve the same problem, though in very different ways.

Message Queues allow you to establish complex communication patterns amongst services by decoupling sender and receiver They achieve this using a number of concepts, such as topic-based routing or publish-subscribe messaging, as well as buffered task queues that make it easy for multiple instances to process different aspects of a task over time.

Message Queues have been around for ages, resulting in a wide selection to choose from: Popular open source alternatives include Apache Kafka, AMQP Broker like RabbitMQ or HornetQ and Cloud Provider versions like AWS SQS or Kinesis, Google PubSub or Azure Service Bus.

Benefits

Simply decoupling sender and receiver is a potent concept that makes a number of other concepts such as health checks, routing, endpoint discovery or load balancing unnecessary. Instances can pick relevant tasks from a buffered queue as and when they are ready to do so. This becomes especially powerful when auto-orchestration and scaling decisions are based on the message count in each queue, leading to highly resource efficient systems.

Downsides

Message Queues are not good at request/response communication. Some allow this to be shoehorned on top of existing concepts, but its not really what they are made for. Due to their buffered nature, they can also add significant latency to a system. They are also fairly centralized (though horizontally scalable) and can be quite costly to run at scale

So - when to choose which?

Actually - this is not necessarily an either/or decision. In fact, it can make perfect sense to front ones public facing API with an API gateway, run a service mesh to handle inter-service communication and back things with a message queue for asynchronous task scheduling.

But if we reduce the focus purely to inter-service communication one possible answer could be:

If you already run an API Gateway for your public facing API, you might as well keep complexity low and reuse it for inter-service communication

If you work within a large organization with siloed teams and poor communication, a service mesh can give you the highest degree of independence, making it easy to add new services over time.

If you are designing a system where individual steps are spaced out over time, e.g. a youtube like service where upload, processing, and publishing of videos can take a couple of minutes, use a message or task queue.

What the future holds

Despite all the hype, service meshes are a fairly young concept with e.g. Istio, the most popular alternative only having reached its 1.0 version in July 2018. My prediction would be that these concepts increasingly merge, resulting in a more decentralized mesh of services providing both external API access and internal communication - maybe even in a buffered, queue-like fashion.