What is the N+1 Problem in GraphQL?

A crash course on a surprisingly common problem

Once you get beyond the basics of GraphQL, you’ll likely hear people talk about the “N+1 problem.” This might seem scary, it does sound like O(N) notation, which is usually the last thing you hear before your whiteboard interview implodes. But, rest assured this is a simple concept hiding behind a computer science-y name.

The Situation in Question

Let’s say I have a DB of authors and their books, a simple “has many” relationship. Now, I want to get all my authors, and all their books. In REST, you’d make a route that uses your ORM of choice to do something along the lines of:

route: '/authors/books',

method: 'GET',

handler: async () => ORM.getAuthors().getTheirBooks();

Under the hood, it would execute 2 queries: one to get all the authors, and one to get all their books. To use pseudo SQL it would be like:

SELECT *

FROM authors;

-- pretend this returns 3 authors SELECT *

FROM books

WHERE author_id in (1, 2, 3); -- an array of the author's ids

2 queries. Boom. Done. Since the ORM gets all the ids from the first query, matching all the relationships is easy with the second.

Why GraphQL has trouble with this

Here’s the issue, this only works because your second query already had a list of every author_id . GraphQL doesn’t work that way since each resolver function really only knows about its own parent object (don’t worry about context right now). That means your ORM won’t have the luxury of a list of author IDs anymore.

So if we took that same request from above and put it into a GraphQL query:

{

query {

authors {

name

books {

title

}

}

}

}

The first layer could have a resolver that hits the DB once and gets all the authors, but that’s it. In the next layer the books resolver can’t use all those results at once to find all the books. Each book resolver would only get it’s own parent author. This means our ORM would have to hit the DB from one resolver at a time. Here’s some pseudo code for the GraphQL version:

schema = `{

type Query {

authors: [Author]

}



type Author {

id: Int

name: String

books: [Book]

} type Book {

id: Int

title: String

}

}` resolvers = {

Query: {

authors: async () => {

return ORM.getAllAuthors()

}

} Author: {

books: async (authorObj, args) => {

return ORM.getBooksBy(authorObj.id)

}

},

}

And that would create pseudo SQL like this:

SELECT *

FROM authors; SELECT *

FROM books

WHERE author_id in (1); SELECT *

FROM books

WHERE author_id in (2); SELECT *

FROM books

WHERE author_id in (3);

Remember when we used to be efficient? That was nice. This is where the name comes from, by the way. We will always make 1 initial query to the DB and return N results, which means we will have to make N additional DB queries. Personally, I think that means it should be called “1+N” but starting formulas with variables is what all the cool kids do.

What’s the solution?

Is this the achilles heel of GraphQL? Is the cost of a nice interface all our efficiency? Of course not. There’s a really handy tool that came out right alongside GraphQL called DataLoader. Essentially what it does is wait for all your resolvers to load in their individual keys. Once it has them, it hits the DB once with the keys, and returns a promise that resolves an array of the values. It batches our queries instead of making one at a time.

New solutions often have new problems, but as long as you learn about all your tools, there’s nothing you won’t be able to fix. So on that note, go check out DataLoaders!

Happy coding everyone,

Mike

latest article: How to Build a Dynamic, Controlled Form with React Hooks