This article describes the increasingly popular Microservice architecture pattern. The big idea behind microservices is to architect large, complex and long-lived applications as a set of cohesive services that evolve over time. The term microservices strongly suggests that the services should be small.

Some in the community even advocate building 10-100 LOC services. However, while it’s desirable to have small services, that should not be the main goal. Instead, you should aim to decompose your system into services to solve the kinds of development and deployment problems discussed below. Some services might indeed be tiny where as others might be quite large.

The essence of the microservice architecture is not new. The concept of a distributed system is very old. The microservice architecture also resembles SOA.

Related Sponsored Content 3 Common Pitfalls in Microservice Integration – And How to Avoid Them

It has even been called lightweight or fine-grained SOA. And indeed, one way to think about microservice architecture is that it’s SOA without the commercialization and perceived baggage of WS* and ESB. Despite not being an entirely novel idea, the microservice architecture is still worthy of discussion since it is different than traditional SOA and, more importantly, it solves many of the problems that many organizations currently suffer from.

In this article, you will learn about the motivations for using the microservice architecture and how it compares with the more traditional, monolithic architecture. We discuss the benefits and drawbacks of microservices. You will learn how to solve some of the key technical challenges with using the microservice architecture including inter-service communication and distributed data management.

The (sometimes evil) monolith

Since the earliest days of developing applications for the web, the most widely used enterprise application architecture has been one that packages all the application’s server-side components into a single unit. Many enterprise Java applications consist of a single WAR or EAR file. The same is true of other applications written in other languages such as Ruby and even C++.

Let’s imagine, for example, that you are building an online store that takes orders from customers, verifies inventory and available credit, and ships them. It’s quite likely that you would build an application like the one shown in figure 1.

Figure 1 - the monolithic architecture

The application consists of several components including the StoreFront UI, which implements the user interface, along with services for managing the product catalog, processing orders and managing the customer’s account. These services share a domain model consisting of entities such as Product, Order, and Customer.

Despite having a logically modular design, the application is deployed as a monolith. For example, if you were using Java then the application would consist of a single WAR file running on a web container such as Tomcat. The Rails version of the application would consist of a single directory hierarchy deployed using either, for example, Phusion Passenger on Apache/Nginx or JRuby on Tomcat.

This so-called monolithic architecture has a number of benefits. Monolithic applications are simple to develop since IDEs and other development tools are oriented around developing a single application. They are easy to test since you just need to launch the one application. Monolithic applications are also simple to deploy since you just have to copy the deployment unit – a file or directory – to a machine running the appropriate kind of server.

This approach works well for relatively small applications. However, the monolithic architecture becomes unwieldy for complex applications. A large monolithic application can be difficult for developers to understand and maintain. It is also an obstacle to frequent deployments. To deploy changes to one application component you have to build and deploy the entire monolith, which can be complex, risky, time consuming, require the coordination of many developers and result in long test cycles.

A monolithic architecture also makes it difficult to trial and adopt new technologies. It’s difficult, for example, to try out a new infrastructure framework without rewriting the entire application, which is risky and impractical. Consequently, you are often stuck with the technology choices that you made at the start of the project. In other words, the monolithic architecture doesn’t scale to support large, long-lived applications.

Decomposing applications into services

Fortunately, there are other architectural styles that do scale. The book, The Art of Scalability, describes a really useful, three dimension scalability model: the scale cube, which is shown in Figure 2.

Figure 2 - the scale cube

In this model, the commonly used approach of scaling an application by running multiple identical copies of the application behind a load balancer is known as X-axis scaling. That’s a great way of improving the capacity and the availability of an application.

When using Z-axis scaling each server runs an identical copy of the code. In this respect, it’s similar to X-axis scaling. The big difference is that each server is responsible for only a subset of the data. Some component of the system is responsible for routing each request to the appropriate server. One commonly used routing criteria is an attribute of the request such as the primary key of the entity being accessed, i.e. sharding. Another common routing criteria is the customer type. For example, an application might provide paying customers with a higher SLA than free customers by routing their requests to a different set of servers with more capacity.

Z-axis scaling, like X-axis scaling, improves the application’s capacity and availability. However, neither approach solves the problems of increasing development and application complexity. To solve those problems we need to apply Y-axis scaling.

The 3rd dimension to scaling is Y-axis scaling or functional decomposition. Where as Z-axis scaling splits things that are similar, Y-axis scaling splits things that are different. At the application tier, Y-axis scaling splits a monolithic application into a set of services. Each service implements a set of related functionality such as order management, customer management etc.

Deciding how to partition a system into a set of services is very much an art but there are number of strategies that can help. One approach is to partition services by verb or use case. For example, later on you will see that the partitioned online store has a Checkout UI service, which implements the UI for the checkout use case.

Another partitioning approach is to partition the system by nouns or resources. This kind of service is responsible for all operations that operate on entities/resources of a given type. For example, later on you will see how it makes sense for the online store to have a Catalog service, which manages the catalog of products.

Ideally, each service should have only a small set of responsibilities. (Uncle) Bob Martin talks[PDF] about designing classes using the Single Responsible Principle (SRP). The SRP defines a responsibility of class as a reason to change, and that a class should only have one reason to change. It make sense to apply the SRP to service design as well.

Another analogy that helps with service design is the design of Unix utilities. Unix provides a large number of utilities such as grep, cat and find. Each utility does exactly one thing, often exceptionally well, and can be combined with other utilities using a shell script to perform complex tasks. It makes sense to model services on Unix utilities and create single function services.

It’s important to note that the goal of decomposition is not to have tiny (e.g. 10-100 LOC as some argue) services simply for the sake of it. Instead, the goal is to address the problems and limitations of the monolithic architecture described above. Some services could very well be tiny but others will be substantially larger.

If we apply Y-axis decomposition to the example application we get the architecture shown in figure 3.

Figure 3 - the microservice architecture

The decomposed application consists of various frontend services that implement different parts of the user interface and multiple backend services. The front-services include the Catalog UI, which implements product search and browsing, and Checkout UI, which implements the shopping cart and the checkout process. The backend services include the same logical services that were described at the start of this article. We have turned each of the application’s main logical components into a standalone service. Let’s look at the consequences of doing that.

Benefits and drawbacks of a microservice architecture

This architecture has a number of benefits. First, each microservice is relatively small. The code is easier for a developer to understand. The small code base doesn’t slow down the IDE making developers more productive. Also, each service typically starts a lot faster than a large monolith, which again makes developers more productive, and speeds up deployments

Second, each service can be deployed independently of other services. If the developers responsible for a service need to deploy a change that’s local to that service they do not need to coordinate with other developers. They can simply deploy their changes. A microservice architecture makes continuous deployment feasible.

Third, each service can be scaled independently of other services using X-axis cloning and Z-axis partitioning. Moreover, each service can be deployed on hardware that is best suited to its resource requirements. This is quite different than when using a monolithic architecture where components with wildly different resource requirements – e.g. CPU intensive vs. memory intensive – must be deployed together.

The microservice architecture makes it easier to scale development. You can organize the development effort around multiple, small (e.g. two pizza) teams. Each team is solely responsible for the development and deployment of a single service or a collection of related services. Each team can develop, deploy and scale their service independently of all of the other teams.

The microservice architecture also improves fault isolation. For example, a memory leak in one service only affects that service. Other services will continue to handle requests normally. In comparison, one misbehaving component of a monolithic architecture will bring down the entire system.

Last but not least, the microservice architecture eliminates any long-term commitment to a technology stack. In principle, when developing a new service the developers are free to pick whatever language and frameworks are best suited for that service. Of course, in many organizations it makes sense to restrict the choices but the key point is that you aren’t constrained by past decisions.

Moreover, because the services are small, it becomes practical to rewrite them using better languages and technologies. It also means that if the trial of a new technology fails you can throw away that work without risking the entire project. This is quite different than when using a monolithic architecture, where your initial technology choices severely constrain your ability to use different languages and frameworks in the future.

Drawbacks

Of course, no technology is a silver bullet, and the microservice architecture has a number of significant drawbacks and issues. First, developers must deal with the additional complexity of creating a distributed system. Developers must implement an inter-process communication mechanism. Implementing use cases that span multiple services without using distributed transactions is difficult. IDEs and other development tools are focused on building monolithic applications and don't provide explicit support for developing distributed applications. Writing automated tests that involve multiple services is challenging. These are all issues that you don’t have to deal with in a monolithic architecture.

The microservice architecture also introduces significant operational complexity. There are many more moving parts – multiple instances of different types of service – that must be managed in production. To do this successful you need a high-level of automation, either home-grown code or a PaaS-like technology such as Netflix Asgard and related components, or an off the shelf PaaS such as Pivotal Cloud Foundry.

Also, deploying features that span multiple services requires careful coordination between the various development teams. You have to create a rollout plan that orders service deployments based on the dependencies between services. That’s quite different than when using a monolithic architecture where you can easily deploy updates to multiple components atomically.

Another challenge with using the microservice architecture is deciding at what point during the lifecycle of the application you should use this architecture. When developing the first version of an application, you often do not have the problems that this architecture solves. Moreover, using an elaborate, distributed architecture will slow down development.

This can be a major dilemma for startups whose biggest challenge is often how to rapidly evolve the business model and accompanying application. Using Y-axis splits might make it much more difficult to iterate rapidly. Later on, however, when the challenge is how to scale and you need to use functional decomposition, then tangled dependencies might make it difficult to decompose your monolithic application into a set of services.

Because of these issues, adopting a microservice architecture should not be undertaken lightly. However, for applications that need to scale, such as a consumer-facing web application or SaaS application, it is usually the right choice. Well known sites such as eBay [PDF], Amazon.com, Groupon, and Gilt have all evolved from a monolithic architecture to a microservice architecture.

Now that we have looked at the benefits and drawbacks let’s look at a couple of key design issues within a microservice architecture, beginning with communication mechanisms within the application and between the application and its clients.

Communication mechanisms in a microservice architecture

In a microservice architecture, the patterns of communication between clients and the application, as well as between application components, are different than in a monolithic application. Let’s first look at the issue of how the application’s clients interact with the microservices. After that we will look at communication mechanisms within the application.

API gateway pattern

In a monolithic architecture, clients of the application, such as web browsers and native applications, make HTTP requests via a load balancer to one of N identical instances of the application. But in a microservice architecture, the monolith has been replaced by a collection of services. Consequently, a key question we need to answer is what do the clients interact with?

An application client, such as a native mobile application, could make RESTful HTTP requests to the individual services as shown in figure 4.

Figure 4 - calling services directly

On the surface this might seem attractive. However, there is likely to be a significant mismatch in granularity between the APIs of the individual services and data required by the clients. For example, displaying one web page could potentially require calls to large numbers of services. Amazon.com, for example, describes how some pages require calls to 100+ services. Making that many requests, even over a high-speed internet connection, let alone a lower-bandwidth, higher-latency mobile network, would be very inefficient and result in a poor user experience.

A much better approach is for clients to make a small number of requests per-page, perhaps as few as one, over the Internet to a front-end server known as an API gateway, which is shown in Figure 5.

Figure 5 - API gateway

The API gateway sits between the application’s clients and the microservices. It provides APIs that are tailored to the client. The API gateway provides a coarse-grained API to mobile clients and a finer-grained API to desktop clients that use a high-performance network. In this example, the desktop clients makes multiple requests to retrieve information about a product, where as a mobile client makes a single request.

The API gateway handles incoming requests by making requests to some number of microservices over the high-performance LAN. Netflix, for example, describes how each request fans out to on average six backend services. In this example, fine-grained requests from a desktop client are simply proxied to the corresponding service, whereas each coarse-grained request from a mobile client is handled by aggregating the results of calling multiple services.

Not only does the API gateway optimize communication between clients and the application, but it also encapsulates the details of the microservices. This enables the microservices to evolve without impacting the clients. For examples, two microservices might be merged. Another microservice might be partitioned into two or more services. Only the API gateway needs to be updated to reflect these changes. The clients are unaffected.

Now that we have looked at how the API gateway mediates between the application and its clients, let’s now look at how to implement communication between microservices.

Inter-service communication mechanisms

Another major difference with the microservice architecture is how the different components of the application interact. In a monolithic application, components call one another via regular method calls. But in a microservice architecture, different services run in different processes. Consequently, services must use an inter-process communication (IPC) to communicate.

Synchronous HTTP

There are two main approaches to inter-process communication in a microservice architecture. One option is to a synchronous HTTP-based mechanism such as REST or SOAP. This is a simple and familiar IPC mechanism. It’s firewall friendly so it works across the Internet and implementing the request/reply style of communication is easy. The downside of HTTP is that it doesn’t support other patterns of communication such as publish-subscribe.

Another limitation is that both the client and the server must be simultaneously available, which is not always the case since distributed systems are prone to partial failures. Also, an HTTP client needs to know the host and the port of the server. While this sounds simple, it’s not entirely straightforward, especially in a cloud deployment that uses auto-scaling where service instances are ephemeral. Applications need to use a service discovery mechanism. Some applications use a service registry such as Apache ZooKeeper or Netflix Eureka. In other applications, services must register with a load balancer, such as an internal ELB in an Amazon VPC.

Asynchronous messaging

An alternative to synchronous HTTP is an asynchronous message-based mechanism such as an AMQP-based message broker. This approach has a number of benefits. It decouples message producers from message consumers. The message broker will buffer messages until the consumer is able to process them. Producers are completely unaware of the consumers. The producer simply talks to the message broker and does not need to use a service discovery mechanism. Message-based communication also supports a variety of communication patterns including one-way requests and publish-subscribe. One downside of using messaging is needing a message broker, which is yet another moving part that adds to the complexity of the system. Another downside is that request/reply-style communication is not a natural fit.

There are pros and cons of both approaches. Applications are likely to use a mixture of the two. For example, in the next section, which discusses how to solve data management problems that arise in a partitioned architecture, you will see how both HTTP and messaging are used.

Decentralized data management

A consequence of decomposing the application into services is that the database is also partitioned. To ensure loose coupling, each service has its own database (schema). Moreover, different services might use different types of database – a so-called polyglot persistence architecture. For example, a service that needs ACID transactions might use a relational database, whereas a service that is manipulating a social network might use a graph database. Partitioning the database is essential, but we now have a new problem to solve: how to handle those requests that access data owned by multiple services. Let’s first look at how to handle read requests and then look at update requests.

Handling reads

For example, consider an online store where each customer has a credit limit. When a customer attempts to place an order the system must verify that the sum of all open orders would not exceed their credit limit. It would be trivial to implement this business rule in a monolithic application. But it’s much more difficult to implement this check in a system where customers are managed by the CustomerService and orders are managed by the OrderService. Somehow the OrderService must access the credit limit maintained by the CustomerService.

One solution is for the OrderService to retrieve the credit limit by making an RPC call to the CustomerService. This approach is simple to implement and ensures that the OrderService always has the most current credit limit. The downside is that it reduces availability because the CustomerService must be running in order to place an order. It also increases response time because of the extra RPC call.

Another approach is for the OrderService to store a copy of the credit limit. This eliminates the need to make a request to the CustomerService and so improves availability and reduces response time. It does mean, however, that we must implement a mechanism to update the OrderService’s copy of the credit limit whenever it changes in the CustomerService.

Handling update requests

The problem of keeping the credit limit up to date in OrderService is an example of the more general problem of handling requests that update data owned by multiple services.

Distributed transactions

One solution, of course, is to use distributed transactions. For example, when updating a customer’s credit limit, the CustomerService could use a distributed transaction to update both its credit limit and the corresponding credit limit maintained by the OrderService. Using distributed transactions would ensure that the data is always consistent. The downside of using them is that it reduces system availability since all participants must be available in order for the transaction to commit. Moreover, distributed transactions really have fallen out of favor and are generally not supported by modern software stacks, e.g. REST, NoSQL databases, etc.

Event-driven asynchronous updates

The other approach is to use event-driven asynchronous replication. Services publish events announcing that some data has changed. Other services subscribe to those events and update their data. For example, when the CustomerService updates a customer’s credit limit it publishes a CustomerCreditLimitUpdatedEvent, which contains the customer id and the new credit limit. The OrderService subscribes to these events and updates its copy of the credit limit. The flow of events is shown in Figure 6.

Figure 6 - replicating the credit limit using events

A major benefit of this approach is that producers and consumers of the events are decoupled. Not only does this simplify development but compared to distributed transactions it improves availability. If a consumer isn’t available to process an event then the message broker will queue the event until it can. A major drawback of this approach is that it trades consistency for availability. The application has to be written in a way that can tolerate eventually consistent data. Developers might also need to implement compensating transactions to perform logical rollbacks. Despite these drawbacks, however, this is the preferred approach for many applications.

Refactoring a monolith

Unfortunately, we don’t always have the luxury of working on a brand new, greenfield project. There is a pretty good chance that you are on the team that’s responsible for a huge, scary monolithic application. And, every day you are dealing with the problems described at the start of this article. The good news is that there are techniques that you can use to decompose your monolithic application into a set of services.

First, stop making the problem worse. Don’t continue to implement significant new functionality by adding code to the monolith. Instead, you should find a way to implement new functionality as a standalone service as shown in Figure 7. This probably won’t be easy. You will have to write messy, complex glue code to integrate the service with the monolith. But it’s a good first step in breaking apart the monolith.

Figure 7 - extracting a service

Second, identify a component of the monolith to turn into a cohesive, standalone service. Good candidates for extraction include components that are constantly changing, or components that have conflicting resource requirements, such as large in-memory caches or CPU intensive operations. The presentation tier is also another good candidate. You then turn the component into a service and write glue code to integrate with the rest of the application. Once again, this will probably be painful but it enables you to incrementally migrate to a microservice architecture.

Summary

The monolithic architecture pattern is a commonly used pattern for building enterprise applications. It works reasonable well for small applications: developing, testing and deploying small monolithic applications is relatively simple. However, for large, complex applications, the monolithic architecture becomes an obstacle to development and deployment. Continuous delivery is difficult to do and you are often permanently locked into your initial technology choices. For large applications, it makes more sense to use a microservice architecture that decomposes the application into a set of services.

The microservice architecture has a number of advantages. For example, individual services are easier to understand and can be developed and deployed independently of other services. It is also a lot easier to use new languages and frameworks because you can try out new technologies one service at a time. A microservice architecture also has some significant drawbacks. In particular, applications are much more complex and have many more moving parts. You need a high-level of automation, such as a PaaS, to use microservices effectively. You also need to deal with some complex distributed data management issues when developing microservices. Despite the drawbacks, a microservice architecture makes sense for large, complex applications that are evolving rapidly, especially for SaaS-style applications.

There are various strategies for incrementally evolving an existing monolithic application to a microservice architecture. Developers should implement new functionality as a standalone service and write glue code to integrate the service with the monolith. It also makes sense to iteratively identify components to extract from the monolith and turn into services. While the evolution is not easy, it’s better than trying to develop and maintain an unwieldy monolithic application.

About the Author

Chris Richardson is a developer and architect. He is a Java Champion, a JavaOne rock star and the author of POJOs in Action, which describes how to build enterprise Java applications with POJOs and frameworks such as Spring and Hibernate. Chris is also the founder of the original Cloud Foundry, an early Java PaaS for Amazon EC2. He consults with organizations to improve how they develop and deploy applications using technologies such as cloud computing, microservices, and NoSQL. Twitter @crichardson.