Brian Kelley

Creating a Microservice? Answer these 10 Questions First

Microservices appear simple to build on the surface, but there’s more to creating them than just launching some code running in containers and making HTTP requests between them. Here are 10 important questions that you should answer about any new microservice before development begins on it – and certainly before it gets deployed into production.

1. How will it be tested?

Microservices have an interesting set of benefits and drawbacks when it comes to testing. On one hand, unit testing a small service that represents a well-defined piece of functionality is likely a lot easier than testing an entire monolithic application. On the other hand, verifying the quality of a whole application that is composed of many microservices can represent a significant amount of testing complexity: instead of running a single command to test the code running in one process, a large number of integrated dependent components must be running first, verified as healthy, and stay running throughout the tests.

Will the new microservice be tested both in isolation (with unit tests or mock dependencies) and in a more realistic “integration” or “staging” environment where it is connected to the same kinds of services it will touch in production? Will the tests incorporate performance verification and failure modes? Will all of the testing be automated, or will humans have to get involved to run and review the test results? Making your microservice testable in a simple, fast, and automated way will encourage developers to maintain it and prevent the “broken windows” problem.

2. How will it be configured?

Once your new microservice is live in production, how can its internal behaviors be influenced? This includes both infrastructural changes (for example, changing the minimum number of threads within a pool), and some application-level changes (for example, enabling a new feature by flipping a feature flag). For all of these changes, understanding if the service will need to be restarted for them to take effect is vital.

Of course, building a service that doesn’t need to have configuration changed at all during its lifetime will be the most desirable approach, if that’s possible.

3. How will it be consumed by other parts of the system?

There’s not much point in building a microservice unless other components of your system will make use of it, and so understanding just how they will use it is critical.

Will those other components interact with the new microservice synchronously or asynchronously? Should they be encouraged to cache responses from it for some time? What about retries and idempotency? Will the uptime SLA of the new microservice match that of the other components in the system?

There should be a clear expectation of the response latency to be provided by the new microservice, and components that consume it should be aware of those expectations. That way, when those expectations are not met, the other parts of the system can decide to fire a timeout, trip a circuit breaker, or fail over to another instance of the service.

4. How will it be secured?

Unless they will be within a high security environment, most microservices deployed behind a firewall don’t need to go overboard with inter-service security. Adding large amounts of security checks between microservices can add significant operational complexity and make production issues very hard to debug and fix. Even using HTTPS over HTTP for inter-service communication can be a significant maintenance overhead due to the work required to maintain, deploy, and secure some properly signed certificates. It’s typically a better approach to allow traffic to flow unimpeded between your microservices while still applying sensible levels of application-level authentication and authorization, and of course to maintain a very secure perimeter.

Therefore, it’s likely that other components in the system will be able to send requests to the microservice without issue, but they may still need to pass along some amount of authentication data representing the initiating outside user for the request to be actually approved and processed. This should never be cleartext password data, but it could use a technology like JWT, OAuth, SAML, or Auth0. Regardless of approach, the technique must be documented very clearly, and preferably captured in a client library or sample code to make it easy for other developers to consume the new microservice.

5. How will it be discovered?

When a new microservice gets launched, how will other components in the system find it? The simpler the discovery process, the less flexibility it may have, and the more problems will be faced later. For example, the absolutely simplest method (while also being a brittle hack) would be to hardcode the address of the microservice into the code or configuration of the other components that depend on it. This might work until the address of the service has to change, or until multiple instances of the service become available in other regions. It’s certainly not a recommended approach.

Using indirection techniques like DNS names that hide the microservice’s address is somewhat better, but that can have its own set of drawbacks: finding an appropriate TTL value, forcing name resolutions to be redone, making DNS caches behave consistently, etc. By design, DNS doesn’t take into account a service’s availability, and that can cause application components to follow a path to an IP address where nothing is listening, which wastes time and causes operational noise as they try to find a working instance. It can also make the developer experience very difficult, because using DNS as a routing mechanism usually leads to a lot of ad hoc modification of developer’s /etc/hosts (http://bencane.com/2013/10/29/managing-dns-locally-with-etchosts/) files.

On the sophisticated end of the spectrum, a highly-available datastore or data synchronization service (such as ZooKeeper) might be used as a registry of microservices that are currently alive and well. This requires more technical investment, and it should also be done carefully to make sure that the discovery service itself does not become a Single Point of Failure (SPOF). As microservices are launched, they would register themselves with this registry service, and as they shut down they’d remove themselves. If they unexpectedly terminate or become deadlocked, they’d also have to get automatically removed from the registry too. Remember, discovery isn’t just about finding what’s running – it’s also important to know what isn’t available.

6. How will it scale with increasing load?

If a microservice has real value within a growing application, increasing numbers of developers will consume it, and along with that growing adoption will come a proportional growth in traffic. Having a well-understood scaling plan for the new microservice will be hugely valuable to your operational team.

Will the microservice auto-scale? Is there state held in memory that will make that auto-scaling and request routing difficult (for example, user session state)? What is the sharding strategy, if any?

It would be advantageous to have some advance knowledge of what part of the microservice will fail first when it is significantly scaled out in its current form. For services that are backed by a database, the compute capacity (say, EC2 instances within an auto-scaling group) can usually keep scaling out before the database becomes overloaded. For truly stateless services (for example, computational microservices that don’t read or write to any database), the first thing to run out of resources might be the load balancer that is sitting in front of the cluster of instances. Both of these scenarios have solutions, but those solutions don’t necessarily have to be in place before the first version of a microservice gets deployed. However, it is good to know the limitations of your new microservice so that you can be aware of where the scalability ceiling exists before it is reached in production.

7. How will it handle failures of its dependencies?

Even microservices built with a very small bounded context might be dependent on other existing microservices or monoliths present in the system. For example, it’s quite common to have most application transactions be able to look up customer information, so the service that is used to access customer records will typically be a dependency of most other services that provide business value.

If your new microservice depends on any of these other services, it is crucial to know what should happen when those dependencies fail. Using consistent request timeouts would be a good start, but adding circuit breaking would be even better. The owners of the dependent services might also want any consumers of it to use techniques like exponential backoff when things fail to prevent a thundering herd scenario.

This is thankfully one of the easier scenarios to test, since testing it simply requires the absence of dependencies. However, it’s important to keep in mind that there are many ways for a call to a dependent service’s API to fail, and those failures don’t all manifest the same way.

8. How will the rest of the system handle the failure of the new microservice?

Depending on how much investment has been made in the new microservice’s ability to be highly available, and also depending on what kind of transactions it supports, this might be of minimal concern. For example, a simple operational logging microservice that is sent data asynchronously over UDP could fail for minutes at a time without causing any interruption to the primary business transactions in the application. But a microservice that processes credit card transactions in a synchronous fashion would cause havoc to an e-commerce system if it totally failed, and that should be a failure scenario that is tested and prepared for.

So even though a well-scoped microservice (or its developers) shouldn’t necessarily care about what other parts of the system will do with their new component, system-level awareness of how each service depends on others can only help with preventing cascading failures, and will also help ensure that application performance remains acceptable.

9. How will it be upgraded?

It might be tempting to believe that container technology like Docker and deployment automation tools like Ansible make upgrades become trivial, but there’s much more to microservice maintenance than what those tools provide out of the box.

Defining an upgrade strategy and deciding what level of deployment sophistication your microservice will support is important. Techniques like canary testing, blue/green deployments, feature flags, and response diff’ing all require time and effort above and beyond what would be required for a plain rolling upgrade where the new version replaces the old.

Defining boundaries and policies for upgrading the microservice’s API is especially critical for components that depend on it. For example, allowing only additive changes to an API’s JSON schema can be effective in allowing continual improvement of a service without requiring its consumers follow each upgrade in lockstep. Adding new fields to an XML response payload when its consumers are all doing XML schema validation will cause havoc, however. So if the new microservice will be regularly upgraded to add more and more fields to its API objects, make that explicitly clear to the consumers of the service through its documentation.

Finally, it’s also important to know how the new microservice can be rolled back if there are issues, and what would be considered “rollback worthy criteria”.

10. How will it be monitored and measured?

If your organization already has standards and tools for application monitoring, it would be wise to leverage them and to play nicely in the monitoring ecosystem already in use. Certainly don’t ignore them, or — even worse — integrate with a new monitoring tool that your operational team doesn’t even use yet.

If your organization doesn’t already use a quality application monitoring system, the addition of a new microservice to your application could be used as a good forcing function to have one be put in place. This is especially true for organizations used to monitoring a large monolithic application who are beginning to make the move towards a microservices architecture: the operational monitoring requirements for a set of interconnected microservices are far more complex than for a single large monolith.

Whatever monitoring solution is chosen, be it homegrown, open-source, or commercial, it’s most important that the developers of the microservice have full access to the monitoring and measurement data for their component. Without that visibility, there’ll be no way to complete the feedback loop back to the developers for them to know how to improve their service in the production environment, nor will they be easily able to help diagnose issues with it when they arise in the dead of night.

Summing up

While it might not be necessary to have very sophisticated answers to each of the 10 questions above, it is important to consider each one and have awareness of any architectural limitations your microservice may have. For example, your new microservice might first be deployed without any disaster recovery or region failure tolerance, and then upgraded later to include that kind of resilience. Being aware of what your microservice both can and cannot currently do is crucial, and knowing the answer to each of these questions will help you keep improving it until it evolves into a mature, resilient, and reliable system component.