Many resources on service tests are missing descriptions of the supportive tooling and plumbing required to achieve reliable, repeatable, easy to maintain service level tests. Without the ability to initialize dependencies in a single command or put the service and reliant resources in a known good state; service tests will end up being flaky, unreliable, unused and a massive time sink.

While there is a lot written about service tests, I have found surprisingly little about what these look like in practice. I intend to cover what service tests are, supporting tooling (service test stack) and why service tests are so valuable. We’ll walk through a step by step example of how to build a full service test stack, write a test, and execute it, in a manner that maximizes reliability, repeatability, debug-ability, and maintainability.

What are Service Level Tests and Service Test Stacks?

Service level tests and their supported tooling (service level test stacks) span both White-box and Black-box testing. They enable high confidence in a service fulfillment of its business value. Service tests exercise high value business transactions in the same manner other clients access the service using the same protocols.

The combination of supporting white-box tooling with black-box testing enables successful service testing outcomes. I have found from multiple years of failed projects, that without considering careful resource management, tests can be a huge sink of time because of slow test executions, and flaky false positives. These poor outcomes result in wasted time debugging, re-running, and finally retiring service tests. The following technique is the approach that I have taken with many services and continue to use for each new service that I work on. It has evolved from years of failures and test suites with negative ROIs.

Service tests exercise a handful of key service transactions using actual production service protocols. Service tests test against service in the same way that other clients do. Successfully executed service tests are a way to position a service so that it can be directly exercised in a reliable, repeatable, hermetic, controlled manner. Service Tests consist of the scaffolding, dependencies, and actual tests necessary to isolate and assert a service can function.

Service Test Stack

Service tests themselves operate at a “higher level” than unit tests, meaning they are executed from a client perspective, using client facing protocols with a focus on the business domain. They are more about what is happening (business domain) and less about implementation details (how it is happening). Because they use client facing protocols they are traditionally slower and more expensive. By using the hybrid white-box/black-box strategy listed above allow service tests to mitigate the flakiness, false positives, and costs that are traditionally associated with them.

Service level tests are super value because they show a service actually works from the perspective of a client; but because client facing protocols (network) are involved poorly executed service tests can easily result in flakiness and frequently result in a negative ROI and/or test retirement (or even worse continually having to waste time and money on low/negative value tests!). Because of this service level tests are often glossed over or not attempted. As we’ll see throughout this post, a white-box/black-box service test stack model helps to mitigate and solve these issues, and have repeatedly resulted in a positive ROI.

A positive ROI on service tests is achieved by the use of white-box service provisioning. This is reflected in the lower levels of the stack diagram above. The test level (top of the stack) is the level that is actually performing the test and exercising the System Under Test (SUT, the target service). A key aspect of the test stack is that this top test level applies input and output in a blackbox manner, using the same protocols as public clients:

Applying Blackbox Input/Output

In order to achieve successful service tests, tests should have perfect control over their environments; by knowing how to bring up service and dependencies, configure the service, provision the service, and initialize state. Achieving these things have amazing benefits for correctness, velocity and implementation flexibility while minimizing flaky tests, false positives, and time required to debug failures and understand state of tests.

The service test stack must manage a service and its dependencies, and since these run in separate processes becomes a concurrent operation. This results in the service tests (black-box/top level) themselves often requiring concurrent programming.

Now that we’ve briefly described what service tests are and what’s required to succeed with them we’ll cover why we’d used them.

Why Have Service Level Tests?

Grant Velocity/Efficiency — ENGINEER EMPOWERMENT

Service tests significantly reduce feedback loops while empowering developers to verify their services locally. Relying on complex production-like environment to verify a service is extremely expensive:

Scenario: Verification in production-like environment anti pattern

Many organizations I’ve worked with have a concept of verifying a feature when it is in a production like environment (qa, stag, canary, etc). While I think this is generally valuable this is often where the majority of the verification takes place:

Verification in this anti pattern encompasses a full build and deploy and has a number of problems:

Expensive: Time — there is the overhead of the build and deploys, waiting for each stage and monitoring the service potentially sifting through unrelated logs/metrics

— there is the overhead of the build and deploys, waiting for each stage and monitoring the service potentially sifting through unrelated logs/metrics Indirect: Running in a production like environment involves interacting with all other services Interacting with service indirectly, observation indirectly, through logging metrics

Running in a production like environment involves interacting with all other services Interacting with service indirectly, observation indirectly, through logging metrics Difficult Interactions (Overhead): The position of some services within an architecture doesn’t lend well to exercising directly, and in some cases there may not even be a way to exercise directly! Take a queue based service which since in the middle of a pipeline for example. If the only path to the service are is through its upstream dependencies it increases testing difficulties. Testing an architecture like this requires indirect observation and/or may not even be possible. In this case the only way to test may be to send a message to the beginning of a pipeline.

I hope this illustrates how expensive this is: have you ever been caught in the dev cycle of make a deploy something is broken, recognize some point in the future, make a bug fix go through review, deploy, something is still broken, repeat. Contrast this with service tests. Because service tests enable a full local environment engineers can interact with their service and gain a good confidence in their features BEFORE build and deployments.

Service level tests enable this feedback loop to be massively shortened resulting in huge time savings, reduced complexity, reduced context switches and closer, more direct, human service interaction.

Service Tests Verification Cycle

With service tests the costly build-deploy-verify cycle is avoided.

Enable Flexibility

Allows changing implementations. Even though the service test framework requires Whitebox privilege the actual test input and output can be decoupled to treat the service as a black box. This allows the test itself to be ignorant of implementation details.

Since the test functions as a client the implementation of the service can be rapidly iterated on and engineers can still have confidence that the service can build and fulfill key capabilities. Because of this most projects that I start get service tests as early as possible (even before unit tests). Unit tests are closely tied to implementations which may be in flux super early in projects. As long as the client facing protocols and request/response data structures are known service tests provide much stronger feedback with lower cost than unit tests. Growing Object-Oriented Software, Guided by Tests covers this in a little more depth.

Enable Self Documentation (through executable conventions)

Since service test stack creates hermetic repeatable environments, it is trivial to use the same commands for both local development and CI. Since configuration is kept up to date by being enforced in CI it creates a virtuous documentation cycle:

This cycle allows tooling to be kept up to date by executing it as part of CI. Because it’s enforced in CI there’s automated feedback on when tooling has changed. Service tests enable this cycle by exposing commands that can be executed both locally and in CI. These bring up the service in a known good state and execute tests. Since the same commands are executed locally as in CI they are kept up to date.

Have you ever worked on a project where the README was out of date? or there were out of date wikis? This was because there was no cycle of something enforcing the updates when a change occurred. Service tests address this by declaring dependencies in docker compose and through executable scripts.

Enforce Correctness

Service Level Tests crafts and applies input to a service in a way that’s equivalent to the way it’s applied in production. Except the service level test stack enables less variance and more control than will be present in production. This makes it a subset of production functionality but doesn’t dilute the strength of the test assertions.

Since the tests mirror production protocols there are minimal variables that change. These are usually on the fringes of the application and result in integration issues.

They are critical for maintaining high level of quality and efficiency in a microservice based architectures where a team may have to interact with many different services. Ramp up time is minimal because any engineer with docker and the language runtime can get a full service environment setup with a few commands allowing them to verify their services and prevent regressions locally.

Smaller services can cover enough business use case surface area to completely minimize the sources of errors to env specific integration issues. New deploy pushes to integration issue, if integration issues have already been lined out, it makes the deployment failure surface area really small.

How

In order to illustrate what a service test stack looks like we’ll be using an example service. This service is responsible for saving deposit transactions in a relational database and then submitting them to a queue for more downstream processing. Input is received asynchronously and output is enqueued asynchronously as well. Both using amazon’s SQS.

All example code is available on github.

The result will be the ability to start a service from a build artifact and then execute tests against it.

Choose a transaction to test.

Focused microservices can help make this decision easy, but if the service has many different responsibilities and supports many transactions one way to approach it could be to choose a dimension (ie money earned, “value”, number of clients, number of downstream clients, etc) and use that to determine importance.

For our test case we only have a single transaction: persisting a deposit to a datastore and enqueuing the deposit with it’s identifier to the an output queue.

List out all the dependencies.

Dependencies are all out of process calls (ie databases, caches, APIs, all downstream services in separate processes).

Deposits has a dependency on SQS for input/output messages and Postgres for dependencies.

Determine how to make all dependencies first degree dependencies.

If the service being tested has dependencies that aren’t under your control or have multiple dependencies responses should be stubbed out. Mountebank is a great easy to use option for this. Since go makes writing HTTP servers so simple writing custom stub servers is easy, effective and efficient.

In one of my initial service level test attempts I thought it would be a good idea to have each service know how to bring itself up with its dependencies, this way a service could recurse through its dependency tree and initialize each one. While we invested the time to do this it ended up being a dumpster fire.

Codify Dependencies in Docker-compose.

Docker-compose is the MVP of the service test stack. It’s the foundation the rest of the stack is built on. It makes codifying services and configuration trivial.

Below is the docker-compose for the deposits service:

version: '3'

services:

postgres:

image: postgres

ports:

- 5432:5432

environment:

POSTGRES_USER: root

POSTGRES_PASSWORD: root

volumes:

- ./config:/docker-entrypoint-initdb.d



localstack:

image: localstack/localstack

ports:

- "4567-4583:4567-4583"

- "${PORT_WEB_UI-8080}:${PORT_WEB_UI-8080}"

environment:

- SERVICES=${SERVICES- }

- HOSTNAME_EXTERNAL=${HOSTNAME_EXTERNAL- }

- HOSTNAME=${HOSTNAME- }

- DOCKER_HOST=unix:///var/run/docker.sock

One of the most difficult parts is determining what should be stubbed vs what should be tested against. A good rule of thumb is that all service tests should support local execution without an internet connection. Protocols should remain faithful even if stubs have to be used. The strength of asserting that a service works with production protocols is much stronger than the potential for contracts to drift out of sync.

Create Tooling for Service Dependency Management

One of the key properties of the service test stack is that it can be started in a clean known state. This command should start all dependent processes and yield when they are fully initialized. I usually put schema creation into this step since most docker-ized databases has some sort of hook to initialize a db with schema. All service configuration should be in this step. Service configuration is anything required to start the service being tested, and everything required to create logical resources, ie queues, or users.

The deposits service uses the Postgres docker image hook to create its schema on load and then defines a make rule:

start-service-stack:

docker-compose down && SERVICES=sqs docker-compose up -d

wait-for --poll-interval 1s postgres \

--connection-string="postgresql://root:root@localhost/deposits?sslmode=disable"

wait-for net --address="localhost:4576"

Starting these dependencies look like:

Implement Logical Resource Management

Logical Resource management creates all necessary resources on top of the service dependencies codified in the previous step. The reason they are in the step is that the application relies on them so I usually couple logical resource management to the starting of the service. This is a little fuzzy because schema’s are usually defined and created in the previous step. Having schemas in the previous step has historically worked for me, but I could easily see it moving into this step if it proves to be easier to maintain or reason about.

In order to start the service the deposits service needs some SQS queues created:

start-service:

aws --endpoint-url=$(TEST_SQS_ENDPOINT_URL) sqs delete-queue --queue-url $(TEST_SQS_ENDPOINT_URL)/queue/deposits-in || true

aws --endpoint-url=$(TEST_SQS_ENDPOINT_URL) sqs create-queue --queue-name deposits-in



aws --endpoint-url=$(TEST_SQS_ENDPOINT_URL) sqs delete-queue --queue-url $(TEST_SQS_ENDPOINT_URL)/queue/deposits-out || true

aws --endpoint-url=$(TEST_SQS_ENDPOINT_URL) sqs create-queue --queue-name deposits-out

If localstack supports it (I think it does through elasticmq) this could easily be statically defined and created in the docker-compose step.

Codifying Starting Service

Start the service pointing to the test stack. 12 Factor App defines a series of principles that allow an application to be environment agnostic. This command will have lots of configuration pointing to the local resources created.

Deposits service is a go service and compiles to an executable.

start-service:

... AWS_SECRET_ACCESS_KEY=x \

AWS_ACCESS_KEY_ID=x \

AWS_REGION=us-west-2 \

./$(BIN_DIR)$(BINARY_NAME) \

--sqs-endpoint-url=$(TEST_SQS_ENDPOINT_URL) \

--db-connection-string="postgresql://root:root@localhost/deposits?sslmode=disable" \

--sqs-input-url=$(TEST_SQS_ENDPOINT_URL)/queue/deposits-in \

--sqs-output-url=$(TEST_SQS_ENDPOINT_URL)/queue/deposits-out

The service stack is now capable of starting all dependencies, putting them in known states and starting the service!!!!!

I personally think this by itself is worth the investment. There is a clean slate for development, for bug fixes, or just for experimentation.

Create a Test

Implement logical resource state management. This should make calls to put the dependencies into a known good state. This is often destructive and and results in service tests not being able to be executed concurrently. I haven’t found this be a big deal because of the small number of service tests and the benefit gained from having service tests execute on a known good state.

The test is where the service test stack begins to interact with the service application. The first thing the test needs to do is to put all required resources (logical) into known good states that the test requires.

// Provision Purges all messages in the Queues and

// truncates all data from DB tables

func (dt *DepositsTest) InitializeState() error {

if err := sqsin.PurgeQueues(dt.svc, []string{

"http://localhost:4576/queue/deposits-in",

"http://localhost:4576/queue/deposits-out",

}); err != nil {

return err

} deposits.Truncate() return nil

}

This test expects both the output and the input queues to be completely empty before exercising the service.

Tests then need to apply input in order to exercise the system under test (the service) and then assert on output. Because our service is queue based and the queue buffers responses we can apply the input and then setup a handler:

var msg *sqs.Message

receiveloop:

for {

select {

case msg = <-messageChan:

fmt.Printf("Received Message, %+v

", msg)

break receiveloop



case <-time.After(5 * time.Second):

t.Errorf("Timeout %s reached", time.Duration(30*time.Second))

break receiveloop

}

}



var d deposits.Deposit

if err := json.Unmarshal([]byte(*msg.Body), &d); err != nil {

t.Error(err)

}



// proxy presence of postgres primary auto incremented ID for postgres

if d.ID == 0 {

t.Errorf("expected postgres ID received: %d. %+v", d.ID, d)

}

The test is a vanilla go service test. I usually partition the test suites using build tags ( // build +service,!unit ) This allows unit tests to be executed independently of service tests and vice a versa.

We’re finally able to execute the tests:

Since they are active white-box tests they are uniquely positioned to support clean test environments, by deleting data, or initializing tests with a clean/known data state, while also asserting along the lines of the functional business value a service produces. This is extremely powerful and can result in very stable, easy to debug tests.

Why GO?

Go is extremely well suited for service level tests, timeouts, async options, all have primitives in go. Go has a number of properties that make it a great environment to write service tests in:

Concurrency is a first class citizen. This is necessary since queue based service tests require sending input and receiving output from the same process:

func Test(t *testing.T) {

go applyInput()

listenForOutput()

}

Trivial to support global timeouts using context. A global timeout can be enforced using a context, so that each network call uses the global context and will preempt and close when the timeout is reached. The following example is from the docs:

func slowOperationWithTimeout(ctx context.Context) (Result, error) {

ctx, cancel := context.WithTimeout(ctx, 100*time.Millisecond)

defer cancel() // releases resources if slowOperation completes before timeout elapses

return slowOperation(ctx)

}

Distribution is easy, static binary. While this isn’t a deal breaker it’s just so much easier to work with building and distribution than interpreted languages.

Because go is so well suited for service tests, I often use it even for non-go projects.

Conclusion

Service level tests can truly EMPOWER engineers to take control of their services by allowing them to interact with a complete service locally and verify that service is functional. Correctly crafted tooling is easily adaptable to be executed in CI, as well as locally. Service level tests can completely remove the expensive deploy step to give good confidence that the artifact functions BEFORE the deploy.

Service level tests enable a very high confidence and are executable locally. Service test and the service test stack has had a massive impact on timelines, engineering productivity, and quality. They contribute executable documentation and enable fast ramp up time.

Service level tests are not magical or difficult to understand but they do require a lot of supportive tooling.

As always I appreciate you taking the time to read this and welcome any feedback. Thank you.