We’re currently building a product for masking relational databases with sensitive data. One of our goals for the product is to have a relatively simple interface, but enable intelligence and performance optimizations under the hood. To that end the user configures how they want the database to be masked with a simple declarative interface, but we convert that to a series of masking steps which can be run in sequence or in parallel as appropriate.

This is another problem which turns out to be suitable for a similar pipeline structure: we use the declarative list of masks and columns as an input, and the list of masking steps as an output. Individual bits of middleware handle different features and concerns along this conversion process — for example some masks can be converted into a whole column UPDATE statement, while others need to be implemented with a row-by-row process. Once the basic steps list has been created, it can be modified to add synchronization between tables or conditionality to certain masks. A side benefit is that we can swap out different behaviors or enable features based on feature flags by just inserting and removing middlewares from the step conversion pipeline.

Interestingly, most of our standard middlewares in this case differ from the web request middlewares above. For web requests, the standard pattern is that we’ll either handle the request entirely, or delegate down the pipeline. When we’re creating masking steps, however, most middlewares handle some of the request but delegate the rest to whatever’s next in the pipeline — the two results are then concatenated before continuing.

Pipelines with middleware are a very powerful tool, but they aren’t without downsides. One source of confusion I’ve found is that the order of execution isn’t very clear or intuitive, because execution happens both forwards and backwards across all the middleware functions. This can be a problem when the list of middlewares is defined somewhere in code — it can be easy to assume that this order is the same order that the middleware will execute in.

In one case, we avoided using the full power of middlewares because the team decided that the resulting code was too complicated. We were implementing a feature called “conditional masking”, where a conditional mask basically composes a set of other masks and chooses one of the masks to apply to each row based on some condition. The initial solution that was developed had a conditional middleware near the top of the pipeline, which unwrapped the conditional mask into a set of underlying masks, used the rest of the pipeline to turn those underlying masks into steps, and then reassembled the conditional mask on the way back up. This worked well, but in the end we refactored that conditional middleware to use a separate solution which didn’t delegate down the pipeline.