At Boxed, from the beginning, our engineers built our core web services using the practices and principals of Service-Oriented Architecture. This upfront investment and foresight helped form the foundation powering our core technology today. Our API architecture is built using domain-driven design for model partitioning, and hexagonal architecture for layered separation of concerns. Lately, we decided to hop on the hype train and join the GraphQL movement. We’re making a huge investment in and we’re super excited about it, so we’d like to share where we are today.

We’ve been able to leverage the abstractions provided by GraphQL to reduce the boilerplate required to expose new functionality to our clients, improve performance by limiting data returned to only that which was requested, and best of all, expose richer functionality by leveraging the object graph to provide context-aware functionality deep within the domain. I’ll expand upon these points later within this post.

For our GraphQL adoption, there’s been minimal cost. One of my colleagues Holden McGinnis wrote an excellent article detailing how because of our existing infrastructure, we were able to seamlessly expose new functionality into our API using GraphQL. Our path to migration was crystal clear with little to no rewriting required. This was all enabled by our use of Ports and Adapters from Hexagonal Architecture. The majority of our application business logic is organized with a layered architecture with clear service boundaries, so exposing a new port into the business domain was trivial and allowed us to get up and running quickly.

In this post, I’d like to provide an overview of all of the components that make up our GraphQL server, touching a bit on the layers supporting ports and adapters, and a few of the optimization tools we’ve built around this architecture.

Repositories

Starting from the inside out, the repository component is your data access layer. Repositories are responsible for executing queries against databases (of any type) and, in the context of object-oriented programming, reconstructing pure javascript objects into domain entities for use within your domain layer.

A Repository provides a data-store agnostic interface for the business domain to access data. There are many great things that come from the using of this pattern, but, among them, my favorites are the ability to add higher-order caching functions over expensive database calls, stubbing out database queries to facilitate reaching specific edge cases in testing, and lastly being able to view all related database queries in the same place.

Leveraging a repository abstraction over a direct database query in your business logic makes it easy to understand the boundaries in which your application logic starts and ends, because infrastructure concerns (how your methods are being called, and where the data is written) are abstracted away. I’ve written another blog post detailing repositories which can be found here, but, for the purposes of this article, the closest thing to any data in the database is a repository, and any database access has to go through the repository interface. Knowing this, we’ve created countless utilities and helpers for working with the repository abstraction quite easy.

Because we have such an affinity for functional programming, it’s been incredibly easy for us to build performance-tuning abstraction primitives into our core infrastructure, which I’ll touch on a bit later.

Services

When writing almost any web application, you need to execute business logic between your request handler (or event handler for an async operation) and your repository layer. If your repository only handles “reading or writing to the database”, then almost any conditional logic, control flow, or orchestration belongs in the service layer; subject to the caveat that “adapters” related to the ports (db or trigger specific mappings) should be colocated with their respective layers.

Having a service layer effectively allows you to share code between request handlers (in the case of a web API) and asynchronous operations such as a Kafka or queue consumer. It also encourages code re-use because functions can be broken down into smaller pieces and composed together to perform larger operations. Virtually all of our business logic lives in services in one form or another. It’s for this reason that for us to expose our functionality through GraphQL was just a matter of implementing resolvers to parse request arguments, and call existing service methods.

Our services always receive arguments from controllers (for HTTP request handling), a GraphQL resolver, or another service; and very often they’re communicating downstream to other services, repositories for data access, or with third parties. As our services evolved we’ve found that there was a ton of boilerplate involved in the reconstitution of complex objects, so we favor functional patterns which mean sending and receiving plain JSON objects as inputs and outputs.

This has made leveraging performance infrastructure components incredibly easy. Three primary components we use are batching, caching, and DataLoaders.

Caching

Caching is where the results of a slow data source are stored in a much faster data store, to improve retrieval performance when data is allowed to be stale. Caching is done at many levels, from browser to DNS to individual database records in memory within a container. In the context of this article, we leverage heavy caching between our data access layer and our service layer, which has been enabled in large part due to the heavy adoption of the repository design pattern. Because the business logic has no reliance on the underlying data store, we can seamlessly cache the results of almost any Repository call dramatically improving runtime performance.

Sometimes, caching is just not enough. Sometimes, queries take a really long time. Some are from downstream or external services which you have no control over. What happens when your cache is cleared, and you receive two concurrent requests for the same data? That’s when the next layer comes in — batching.

Batching

To illustrate with contrast, caching solves the problem where you’ve received a request for information that you’ve recently fetched and you can return it without another round trip to the database. Batching solves the problem where you’ve received more than one request for the same data (which you don’t yet have), and you’re still receiving additional requests as you execute the first call. In a caching scenario, the underlying targeted cached function will be called twice. When the first call resolves, the information can be cached and returned immediately to requestors. With batching, both requestors will be queued up to receive the results of the original call — so when the first call resolves, every queued call gets their results all at once. The net result is, every batched function consumer will piggyback off of the initial requestor and receive their results in less time than they would otherwise. This abstraction also dramatically reduces the load on downstream services.

Data Loaders

Loaders are instances of the DataLoader. DataLoader is a “generic utility to be used as part of your application’s data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching.” It sounds a lot like caching and batching which we’ve discussed before, but it has a significant nuanced difference. The DataLoader (created by Facebook) is an in-memory cache object which sits at the request level. As a request moves throughout the middleware pipeline, request context can be extracted (auth, device, user agent, etc) and as data is fetched, its stored within a loader for a specific API. Subsequent requests can then query the DataLoader rather than the API and retrieve in-memory results if they’ve been previously queried.

So, in short, use the DataLoader for caching highly specific contextual information which is stored in memory at the request level. Use generic batching and caching (outlined above) for more generic between-the-requests caching.

Controllers

Controllers are port handlers that parse incoming requests into method parameters to call service methods. A port can be any event trigger; a Kafka consumer reading a message, a queue consumer consuming an event, an HTTP request, a GraphQL resolver — anything that needs to access the domain — will come into your application through a port. A controller method is a handler meant for parsing that request in whatever form it takes and adapting it into something consumable by the service layer. The controller is also responsible for taking the results from the service call, adapting it, and sending it back to the client (where applicable, such as an HTTP request). A controller is just a central place for entry points into the application architecture.

These are the main components that make up our GraphQL API. To further illustrate, here’s the general flow.

Application Flow

Repositories & Services are constructed during application bootstrapping. They are singletons.

Request comes in from express .

. Middleware functions are executed, the request context is constructed.

DataLoaders are configured, bound to the request context ( res.locals ).

). GraphQL Resolvers are executed.

DataLoaders are queried.

Services execute business logic.

Repositories fetch the required data.

There are many layers here and it can be very daunting at first to implement them all at once, especially if unfamiliar. It’s important to remember that your mileage may vary and more often than not, most of the layers here aren’t required for the feature that you’re building. Each layer has a specific utility towards optimization or indirection that may or may not help your project. For our mature domains, we’ve invested heavily in these abstractions and they’ve been instrumental in building our platform.

As promised at the start of this article, I’d like to address how GraphQL has helped us.

Reduction of Boilerplate

GraphQL exposes all of your API functionality through a single endpoint which is automatically documented from the GraphQL schema. The schema also provides input type validation automatically. This functionality alone adds a ton of productivity, but that isn’t all. By exposing functionality at points in the graph instead of by use-case (such as in a REST Endpoint) functionality is systemically exposed to anything with access to that node within the graph. This opens up a ton of possibilities from a product perspective, but also makes your code very maintainable since it’s DRY.

Performance Improvements

GraphQL forces clients to explicitly define which fields they want to be returned from the api. In turn, the api only returns the requested fields over the network which in some cases reduces the size of the payload and thus performance. Often times though, the data on the server isn’t simply sitting in memory waiting for a request. Objects are sometimes spread across many tables in a database in or disparate systems.

One of the huge benefits of using GraphQL is that on the server, you’re given tooling which dramatically reduces the resource consumption to the minimum possible required to satisfy the request. GraphQL has a concept called resolvers which are handlers for determining the values of specific fields within the graph and only run if needed. When working with a RESTful call, usually your REST API has a pre-set response type and fetches all the data on every request. With GraphQL, the query determines the code paths that actually run which happens lazily. This means that requests for tiny bits of data only can be optimized to only a few fields from the database, while heavy requests can use alternative strategies.

Where RESTful APIs might advocate breaking up endpoints into single responsibilities, GraphQL enables you to ask for whatever you need in a single request. This allows the api author to leverage caching and re-use to minimize the resources required.

Context Awareness

One of the more interesting benefits of GraphQL that we’ve seen is that where a RESTful call generally has a “top-down” approach towards resolving a request, GraphQL cuts through the layers because a resolver is invoked at virtually every point within the call stack and grants access to request-level stateful objects.

Wherever needed, logic can be written in an elegant way to access contextual properties set up middleware such as user information (auth), client information (user-agent), application locals (dependency injection), the query, or, parent objects within the graph.

The result is that you have more granular control in your ports-layer with field-level role-based access, memoization (data loader), and request specific properties. This happens by the framework so you don’t need to explicitly define a “context” object and manually pass it around your codebase — it’s built into GraphQL so your service functions can accept only what they need.

Conclusion

Introducing GraphQL into our ecosystem has been transformative and we’re really excited about it. I’ve covered some of the core building blocks of our API, but it’s only the beginning. We’ve got a lot of really exciting stuff in the pipeline and we’d love to hear feedback from the community. If it sounds like something you’re interested in working on — we’re hiring!

Stay tuned to our blog to find out more! Thank you for reading.