Rack Middleware is based on a powerful and flexible 'nested handlers' architecture that has broad application in many kinds of data processing pipeline, not just web apps. We've extracted it as a design pattern and identified some non-web use cases, but can you help us name it?

One of the projects the software engineers at Simply Business have been working on recently is a large data import of our customers' insurance policy records. We're decommissioning a legacy system originally written in Java and Oracle, so it was critical that we migrate our customers' insurance policies from there onto our new Rails-based system.

A number of factors made the job more complex:

the data schema differed significantly between the two systems

some policy records were far less straightforward than others; for example, where customers had changed their insurance requirements mid-year or at renewal time

not all policies were needed (or permitted) in the new system, and

we needed to perform extra lookups and translations on the data as we imported it

Our original code to do this was quite imperative, with many special cases, but after some refactoring and some hammock time, we settled on a reasonably clean design inspired by Rack's "middleware" interface.

What's in a name?

Our design is modelled on Rack Middleware, but it doesn't use Rack, and 'middleware' alone is a very broad term that covers many different kinds of software. Having identified similar designs in several different application spaces and three different programming languages, we think this is or should be a design pattern, but we haven't found it named in the Design Patterns literature.

Therefore, we're writing it up as a design pattern in the hope that somebody will recognise it. Can you name this pattern?

A pattern for composing request processing pipelines by nesting simple handler functions

Context

You have some code that accepts a request to perform an action and returns a response. The action might be, for example, an API request or a background job submission, and the response is typically either the result of that request or some form of job submission ID.

In addition to performing the action itself, you have other requirements associated with it.

You need to augment the request data with data from other sources before the action can be performed. For example, you may have a customer number which needs the corresponding customer to be retrieved, or a build job number which you can use to test that a required package build has completed successfully.

You might refuse to perform the action, depending on the original request, on the extra data it has fetched, or on the context. In a client-server system, this could be because the requesting user is not authorised; in a migration pipeline, this might be because not all records are being migrated.

You want to instrument the action in some way, such as logging whether it happened or measuring how long it took, or notifying some downstream system when records have been transferred.

You want to change the context in which the action is executed; for example, by running it inside an exception handler so that it doesn't fail in unforeseen circumstances.

These additional tasks are often independent of each other and don't interact much. You wish to be able to test each task independently of the overall flow, and your code structure should make it clear that these concerns are separate.

Therefore

Break down the computation into a handler that performs the action, handler itself. Additionally a middleware may and a chain of middlewares, each of which perform one of the ancillary tasks. Each middleware is responsible for calling the next middleware in the chain, with the request as a parameter, before passing the response back to its caller. The final middleware in line calls the

augment the request it receives in some way, before calling the next middleware

after calling the next middleware, change the response before returning it to its own caller

decide not to call the next middleware and instead return a response indicating that the action was 'skipped' or 'failed'

perform other side-effects, such as logging the request or notifying other interested systems that a request was made or actioned.

Middleware combined with a handler is itself also a handler

The central principle here is that you can make a new handler by putting middleware in front of a handler you already have, and you can do this recursively by putting another middleware in front of the one you just made.

This works because a handler is a function that accepts a request and returns a response. When you put middleware in front of that handler, you get a function that accepts a request, optionally does some preprocessing, calls the handler, optionally does some post-processing on the handler's return value, and then returns it as the response. If you ignore the implementation details and look only at the interface - you called it with a request and it returned a response. It has the same interface as the handler it wraps.

Examples of this pattern

Rack

Ruby programmers may be familiar with this pattern as the basis of Rack Middleware. Rack is the specification of the interface between Ruby web applications (typically built with Rails or Sinatra) and the web application server that they run on (e.g. Puma, Unicorn or Webrick).

A Rack application (the "handler" in our nomenclature) is specified as an object that implements a call method, which accepts a "request" Hash and returns a "response" which is an array containing three values.

Consider this very simple Rack handler, implemented as an object:

class MyApp def call(request) [ 200, {"Content-type" => "text/plain"}, ["hello world"]] end end app = MyApp.new

Web applications often have non-functional requirements for authentication, authorisation or logging, which we want to run irrespective of whatever resource is requested. We could add those concerns into the handler itself, but the middleware pattern allows us to separate them out. For example, here is a middleware that randomly refuses requests:

class FeelingLuckyAuth def initialize(app) @app = app end def call(request) # "Did I fire six shots or only five?" authenticated = (rand * 6).to_i if authenticated.zero? @app.call(request.merge(:lucky? => true)) else [ 401, {}, ["ACCESS DENIED!"]] end end end

We can use this middleware with any Rack application that we choose:

app = FeelingLuckyAuth.new(MyApp.new)

and because app is itself an application, we can wrap other middlewares onto it:

app = LogRequestAsJson.new(FeelingLuckyAuth.new(MyApp.new))

and so on. As the number of steps increases, we might look at more succinct ways of writing it:

app = [FeelingLuckyAuth, LogRequestAsJson,GetCustomerById]. reduce(MyApp.new) { |app, middleware| middleware.new(app) }

In fact, the Rack gem provides a class Rack::Builder which implements a DSL to compose middlewares in this manner.

The advantages of a composable interface are easy to see. Entire products have grown up to provide pluggable functionality around HTTP requests. For example, Devise provides authentication and authorisation, Airbrake provides error handling, and New Relic provides performance monitoring.

Python Web Server Gateway Interface

Python WSGI is to Python as Rack is to Ruby. In fact, it would be fairer to write this the other way around; my understanding is that WSGI not only predates Rack but probably inspired it. I'm no Python programmer, but it looks very familiar.

Ring

Ring is the analogous standard in Clojure. A Ring Handler is a function that accepts a request as its argument and returns a key-value data structure with the keys :status , :headers and :body . A simple example:

(defn what-is-my-ip [request] {:status 200 :headers {"Content-Type" "text/plain"} :body (str "Your IP address is " (:remote-addr request))})

The middleware pattern in Ring differs from what we've previously seen in that it's implemented using a function, not a class. To create a Ring middleware, we write a function that accepts a handler as its argument, and returns a new function that wraps the handler. Here's an example of Ring middleware that adds a Content-type header to responses from the handler it's applied to:

(defn wrap-content-type [handler content-type] (fn [request] ;; call the next handler, then add a header to the response ;; before returning it to our caller (let [response (handler request)] (assoc-in response [:headers "Content-Type"] content-type)))) (def wrapped-app (wrap-content-type what-is-my-ip "text/plain"))

This is neat, if you're the kind of person who finds this stuff neat, because it means you can build pipelines of middleware using Clojure's standard comp function. No need to invent DSLs.

Boot tasks

The Ring middleware system also forms the inspiration for the design of Boot. The Boot Clojure build system makes JARs (and other artifacts) for Clojure applications. In Boot, a function that returns middleware is known as a task, and you can compose tasks with comp :

(ns demo.boot-build (:require [boot.core :as core] [boot.task.built-in :as task])) (core/deftask build "Build my project." [] ;; for reasons of Java, we need to create a POM before ;; making a jar file (comp (task/pom) (task/jar) (task/install)))

Migration pipeline at Simply Business

We used this pattern to build a data export pipeline for a large and complex migration project to move insurance policy data from a legacy system to its replacement.

Our system implements middleware as objects, following the Rack approach. We've also written a small helper class to construct pipelines from our migration steps. Without giving too much away from our unpublished proprietary code, it means we can define a processing pipeline like this:

Tools::Pipeline.new .add_step(Tools::GetPolicyHistory) .add_step(Tools::WithLoggedOutcome, log) .add_step(Tools::RejectIfUnsupportedCheckPolicyHistory) .add_step(Tools::FlagWithinRetentionIfEL) .add_step(Tools::RejectIfNotFlaggedWithinRetention) .add_step(Tools::RejectIfActiveChainNotYetMigrated) .add_step(Tools::GetCustomerFromRails, rolodex_api) .add_step(Tools::MakeChain, pimms_api) .add_step(Tools::AssociateCustomerWithChain, rolodex_api) .add_step(Tools::MarkCompletedAfterwards, log, pimms_api) .add_step(Tools::MigratePoliciesAndNotes, log, pimms_api, notes_api) .call(old_policy_number: old_policy_number)

To add the requirement that "all policies with previous claims must be imported", for example, we create a step called Tools::FlagWithinRetentionIfClaims and add it to the appropriate place in the pipeline. No changes anywhere else are needed. Moreover, the step is functional and free of side-effects, meaning it's simpler to test and simpler to reason about.

Related patterns

The Chain of Responsibility pattern is structurally similar, but arises from a different set of forces. Whereas a middleware pipeline will usually run all of the steps on each request, Chain of Responsibility expects that the first handler that knows how to handle a request will do so and return early without calling the rest of the chain.

The Decorator pattern allows behaviour to be added to an individual object dynamically, without affecting the behaviour of other objects from the same class. A middleware pipeline could perhaps be viewed as a nested set of decorators, but using composition instead of inheritance.

Rack has been described as an application of the Pipeline pattern, but that pattern itself seems to be defined very differently by different authors, and I haven't found a write-up that describes it recursively.

Conclusion

By adopting this pattern, we achieved more easily testable code, greater code reuse where different policies required different import strategies, and the ability to iterate on the migration. This last point is significant, as it enabled us to start migrating the more straightforward policies sooner, and add support for more complex scenarios as we went along, with the confidence that we were unlikely to break our more simple cases as we added more difficult ones.

So, does it work? To date, we've imported over 400,000 policies from the legacy system, representing the vast majority of customer policy data, and as a result have been able to disable access to the legacy system for most of our staff. The time is fast approaching when we will be able to unplug it completely.