Compressing GraphQL response

Using basic principles of compression to reduce the response size

If you are reading this you probably don’t need an intro to GraphQL – you chose GraphQL because it is:

Compositional — combine multiple resource queries into a single request.

Declarative — declare exactly which resource fields you want to get.

Strongly typed — inspect the API to learn about the available resources.

Having all of these interactions defined by a standard is a superior option over using (what is loosely referred to) a “REST API” (GraphQL itself is a specification of a RESTful API) – the latter is an inevitable journey of reinventing the wheel.

I have started to use GraphQL in 2017 and since then without exception used it for every project. There has been steep learning curve — nonetheless, it was an enjoyable journey. However, I have soon hit a limitation–the response size can become huge. This happens when you normalize your resources (as you should).

Normalized schema

Consider a resource that describes events (event being a screening of a movie in the cinema). You would query the events resource to obtain a list of events matching your criteria (date, location, etc.), e.g.

type Query {

events (

after: String

coordinates: CoordinatesInput

first: Int

fromDate: String

movieId: ID

# Used to restrict results to the set number of nearest venues (relative to the coordinates).

# Defaults to 5. Maximum 10.

nearest: Int

toDate: String

venueId: ID

): EventsConnection!

}

From the perspective of normalization, this resource makes a lot of sense. However, the kicker is that most of the time you will want to return additional information about the event, e.g. ID, name, [..] of the movie and venue associated with that event.

Here is a real-life example illustrating the latter scenario:

query getEvents($coordinates: CoordinatesInput, $fromDate: String, $toDate: String, $movieId: ID, $venueId: ID) {

events(coordinates: $coordinates, fromDate: $fromDate, toDate: $toDate, movieId: $movieId, venueId: $venueId, nearest: 10) {

edges {

node {

id

date

time

timestamp

url

displayTicketPrice

reservationIsEnabled

auditorium {

id

name

seatingPlan {

id

seatCount

__typename

}

__typename

}

movie {

id

name

runtime

synopsis

credits(first: 5) {

edges {

node {

creditOrder

character {

id

name

__typename

}

person {

id

name

headshotImageUrl

__typename

}

__typename

}

__typename

}

__typename

}

directors {

id

person {

id

name

headshotImageUrl

__typename

}

__typename

}

posterImageUrl

genres {

id

name

__typename

}

releaseYear

trailerYoutubeId

__typename

}

venue {

id

name

cinema {

id

name

url

logo {

svgUrl

__typename

}

__typename

}

coordinates {

latitude

longitude

__typename

}

address {

id

street1

postcode

__typename

}

__typename

}

attributes {

abbreviatedName

id

name

nid

__typename

}

__typename

}

__typename

}

__typename

}

}

This example is obtained from the front-page of https://go2cinema.com/ (a cinema showtimes discovery platform). This query retrieves all of the information required to generate the list of movies in the cinemas and the associated list of events – just the other way around – starting with the events and retrieving movie and venue for each event.

UI used by GO2CINEMA.com to list the obtained results. User navigates by first discovering the movie, then identifying the venue and finally picking the event.

You should already start seeing the issue – if the query responds with 500 events for 10 different movies and 10 different venues, that means that the response will repeat more or less the same information about the movies and venues 500 times. I captured an example response for 279 events. As you can see, the response size is a massive – 1008KB.

GZIP

GZIP is not going to be of much help here. True, a gzipped file size drops to 105KB. However, the problem is that upon receiving the response, the browser needs to parse the entire JSON document. Parsing 1MB JSON string is (a) time consuming and (b) memory expensive.

Deduplication

The good news is that we do not need to return body of duplicate records. All GraphQL resources can be identified by a combination of the resource path (same resource requested from a different context can have a different shape), __typename and id . This means that we can strip all of the repeating instances of a resource at the server-response time and reconstruct the intended response at the client-side.

Lets simplify the original example to illustrate the intended input and output.

events(fromDate: "2018-02-14", toDate: "2018-02-14") {

id

date

time

movie {

id

name

synopsis

__typename

}

__typename

}

(Note: This is not a valid GO2CINEMA query. For illustration purposes only.)

The response of this query will be something along the lines of:

{

"data": {

"events": [

{

"__typename": "Event",

"date": "2018-02-14",

"id": 1,

"movie": {

"__typename": "Movie",

"id": 1,

"name": "The Shape of Water",

"synopsis": "An other-worldly story, set against the backdrop of Cold War era America circa 1962, where a mute janitor working at a lab falls in love with an amphibious man being held captive there and devises a plan to help him escape."

},

"time": "17:05"

},

{

"__typename": "Event",

"date": "2018-02-14",

"id": 2,

"movie": {

"__typename": "Movie",

"id": 1,

"name": "The Shape of Water",

"synopsis": "An other-worldly story, set against the backdrop of Cold War era America circa 1962, where a mute janitor working at a lab falls in love with an amphibious man being held captive there and devises a plan to help him escape."

},

"time": "18:50"

},

{

"__typename": "Event",

"date": "2018-02-14",

"id": 3,

"movie": {

"__typename": "Movie",

"id": 2,

"name": "Three Billboards Outside Ebbing, Missouri",

"synopsis": "After seven months have passed without a culprit in her daughter's murder case, Mildred Hayes makes a bold move, painting three signs leading into her town with a controversial message directed at Bill Willoughby, the town's revered chief of police. When his second-in-command Officer Jason Dixon, an immature mother's boy with a penchant for violence, gets involved, the battle between Mildred and Ebbing's law enforcement is only exacerbated."

},

"time": "17:45"

},

{

"__typename": "Event",

"date": "2018-02-14",

"id": 4,

"movie": {

"__typename": "Movie",

"id": 2,

"name": "Three Billboards Outside Ebbing, Missouri",

"synopsis": "After seven months have passed without a culprit in her daughter's murder case, Mildred Hayes makes a bold move, painting three signs leading into her town with a controversial message directed at Bill Willoughby, the town's revered chief of police. When his second-in-command Officer Jason Dixon, an immature mother's boy with a penchant for violence, gets involved, the battle between Mildred and Ebbing's law enforcement is only exacerbated."

},

"time": "19:25"

}

]

}

}

Here again the movie information is repeating for every event. What we want is to strip the repeating information and keep only the information required to identify the resource, i.e.

{

"data": {

"events": [

{

"__typename": "Event",

"date": "2018-02-14",

"id": 1,

"movie": {

"__typename": "Movie",

"id": 1,

"name": "The Shape of Water",

"synopsis": "An other-worldly story, set against the backdrop of Cold War era America circa 1962, where a mute janitor working at a lab falls in love with an amphibious man being held captive there and devises a plan to help him escape."

},

"time": "17:05"

},

{

"__typename": "Event",

"date": "2018-02-14",

"id": 2,

"movie": {

"__typename": "Movie",

"id": 1

},

"time": "18:50"

},

{

"__typename": "Event",

"date": "2018-02-14",

"id": 3,

"movie": {

"__typename": "Movie",

"id": 2,

"name": "Three Billboards Outside Ebbing, Missouri",

"synopsis": "After seven months have passed without a culprit in her daughter's murder case, Mildred Hayes makes a bold move, painting three signs leading into her town with a controversial message directed at Bill Willoughby, the town's revered chief of police. When his second-in-command Officer Jason Dixon, an immature mother's boy with a penchant for violence, gets involved, the battle between Mildred and Ebbing's law enforcement is only exacerbated."

},

"time": "17:45"

},

{

"__typename": "Event",

"date": "2018-02-14",

"id": 4,

"movie": {

"__typename": "Movie",

"id": 2

},

"time": "19:25"

}

]

}

}

Every event is unique – therefore none of the events are stripped. However, the associated movies are repeating. However, this time, only the first instance of a movie contains the attributes describing the movie. Notice that all instances of a movie after the first instance do not define name and synopsis fields anymore.

GZIP would achieve a similar result at a low-level. The difference is that removing the duplicate that from the JSON object means that (a) a smaller document needs to be parsed and (b) duplicate objects can be assigned by reference (therefore reducing the memory footprint).

“deduplication” can be achieved using graphql-deduplicator deflate method.

Reduplication

If you have implemented the deduplication logic in the backend, then we need our client to be able to reconstruct the original response.

The original response can be reconstructed using graphql-deduplicator inflate method.

graphql-deduplicator

In this article, I have described a compression method that can be used to reduce the GraphQL response size and reduce the amount of time and memory it takes to parse the response. You can either implement this logic yourself or use an abstraction that I have created – https://github.com/gajus/graphql-deduplicator.

graphql-deduplicator is designed to work with any GraphQL backend and client. Instructions for implementing graphql-deduplicator with Apollo stack are included in the project documentation.

Going back to the original GO2CINEMA.com example – the equivalent deduplicated response is just 193KB (28KB gzipped). That is 81% reduction compared to the pre-GZIP response size.

Compatibility with the GraphQL specification

Remember that “deduplication” is not part of the GraphQL specification. Therefore, you should not enable deduplication by default. Otherwise, clients that do not “reduplicate” the response will produce unexpected results. Instead, enable “deduplication conditionally, e.g. using a GET parameter deduplicate .