Gatsby queries data of each page during the build step & stores them in a JSON file. When a user hovers over a <Link> , Gatsby pre-fetches its JSON file & the next page is loaded almost instantly.

These JSON files are stored in public/page-data/[page-name]/page-data.json . For example, the data for the home page of this website is over here:

page-data/index/page-data.json

{ componentChunkName : "component---src-pages-index-tsx" , path : "/" , results : { data : { posts : { nodes : { ... } } } } , pageContext : { isCreatedByStatefulCreatePages : true } }

There're a few metadata fields ( componentChunkName , path , pageContext ), and the rest is the query result. The more data your query returns, the larger this file. Because of this, querying only the data your page need may save you a bit of data.

If you have a page with a considerable amount of data, say, 10,000 items, it might make sense to turn that page into a template & query the data in chunks of 1,000 instead. It might be even possible to implement infinite scrolling by fetching the page-data.json of the generated pages .

Gatsby newcomers might make the mistake of querying all the data and then filter them out in the browser or doing calculations that could have been done during the build step.

Let's explore a few ways to query leaner data.

This post got too long, so I cut it into a few shorter ones. If you're interested, follow me on Twitter or subscribe.

Filter, Sort, Skip, Limit

An easy win is to apply filtering & sorting. The official docs on this topic is short, sweet, and armed with live examples. The ones that I find a bit hard to digest is in and elemMatch, which I'll demonstrate below.

Let's say we have a node of type Pet :

type Food { brand : String ! name : String ! } type Pet { id : String ! foods : [ Food ! ] ! nicknames : [ String ! ] ! }

[String!]! means pet.nicknames must be an array of strings, though it can be an empty one. I used to think that declaring the content type as non-null (note the ! of String ) prevents an empty array, which is wrong.

The operator in gives us the intersection of 2 arrays, e.g., [3] is the intersection of [1,2,3] and [3,4,5] .

This query matches all pets with the nicknames' Pull':

query { allPet ( filter : { nicknames : { in : [ 'Pull' ] } } ) { nodes { ... } } }

This query matches all the pets with the nicknames' Pull' or 'Push':

query { allPet ( filter : { nicknames : { in : [ 'Pull' , 'Push' ] } } ) { nodes { ... } } }

The confusing part is that in only accepts an array, so even if we only want to match against a single name, we still need to pass in an array.

Wait — what if we want to match a pet whose nickname includes both 'Pull' and 'Push'? Even though sift (the package that powers this query syntax) has a helpful operator for that named $all , it is not currently implemented. Please open an issue if you need this.

The elemMatch operator allows you to filter fields of objects inside an array.

This query matches all the pets whose food belong to the brand 'A':

query { allPet ( filter : { food : { elemMatch : { brand : { eq : ' A ' } } } } ) { nodes { ... } } }

Inside { brand: ... } , we can make use of all other operators, not just eq.

Unsortable Field

What if the field we wanted to sort the data with is not sortable, or require additional calculations?

Say we're building a simple e-commerce site, where each day, the most attractive items are featured on the front page. For an item to be featured, it has to be 'trending' and be within a certain price range. The site owner gives us this formula:

score = weeklyPageView * 1000 / Math . max ( price , 10 )

We know that we can add new fields with createResolvers. Since the fields weeklyPageView and price are already available on Product , we can extract them from src.

exports . createResolvers = ( { createResolvers } ) => { createResolvers ( { Product : { score : { type : 'Float!' , resolve : function ( src , args , ctx ) { const { weeklyPageView , price } = src return weeklyPageView * 1000 / Math . max ( price , 10 ) } } } } ) }

Now the score field is added to our product; we can go ahead & sort by it, right?

Errors: Expected type ProductFieldsEnum, found score.

False. It turns out createResolvers is run last in schema generation & Gatsby doesn't generate input types for fields modified/added with it. We'd have to do this with createTypes instead.

In my last post, I use this action to re-define typing of a field with the GraphQL SDL . Today we'll use the custom type builders.

exports . createSchemaCustomization = ( { actions , schema } ) => { const { createTypes } = actions createTypes ( [ schema . buildObjectType ( { name : 'Product' , interface : [ 'Node' ] , extensions : { infer : true } , fields : { score : { type : 'Float!' , resolve : function ( src , args , ctx ) { const { weeklyPageView , price } = src return weeklyPageView * 1000 / Math . max ( price , 10 ) } } } } ) ] ) }

Except for the resolve function, the above is the equivalent of the following SDL : type Product implements Node @infer { score : Float ! } Also, note that createTypes takes an array. We can add additional SDL strings to it.

With this, our Product type can now be sorted by the score field.

query FeaturedItem { allProduct ( limit : 10 , sort : { fields : score , order : DESC } ) { nodes { ... } } }

Conclusion

Some might argue that it doesn't matter: The saved data is minuscule comparing to the valuable development time. Generally, I agree; however, Gatsby has made customizing GraphQL so effortless that there are no reasons not to do it!