Say you have a Rails app (just for the sake of example — this article pertains just the same to pure Ruby apps) and want to build a JSON API. Simple enough, just set up a api/v1/my_resources endpoint and write up a serializer for MyResource, then just serve that through a controller.

Perhaps you're one of those party people. Let's build a real simple app with a Party model as example:

$ rails new faster-json-api --database=postgresql

$ cd faster-json-api

$ rails g migration CreateParties name:string description:string starts_at:datetime ends_at:datetime

Pretty simple: a name, a description, start and end times.

Now onto the API. Let's keep things super simple: there will only be a single endpoint, api/v1/parties , which responds with a JSON of all the parties in the database.

A typical Ruby-based serializer could look something like this:

module PartySerializer

def ruby_to_json

all.to_json(only: %i(name description starts_at ends_at))

end

end

A few notes: we could (should, actually) use pluck in the above example, but we're going to go with the :only option in to_json just to highlight the differences between doing this kind of processing on the Ruby side and doing it on the database side. Also note that ActiveRecord has its own serialization helper that can be used to achieve the same results as above.

Okay, then we can use PartySerializer like so:

class Api::PartiesController < ApplicationController

def index

render json: Party.extend(PartySerializer).ruby_to_json

end

end

Fair enough. Let's do a quick test with curl to see what we get (once we populate the database with seed data and set up the endpoint in config/routes.rb):

$ curl localhost:3000/api/parties | jq .

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 144 0 144 0 0 662 0 --:--:-- --:--:-- --:--:-- 663

[

{

"name": "My awesome party",

"description": "Incredible awesomeness",

"starts_at": "2022-01-01T23:00:00.000Z",

"ends_at": "2022-01-02T02:00:00.000Z"

}

]

(jq is a JSON pretty-printer for bash, by the way)

Seems to work just fine. You might even choose to ditch Ruby core-based JSON formatting and use some fancy super-fast gem to speed things up.

Regardless of whether you use a gem to serialize the model into JSON or not, here’s a rough breakdown of how this Ruby-based serialization approach works. Again, we're using Rails as an example but something very similar will happen regardless of the framework (or lack thereof) that you use:

ActiveRecord translates your query to SQL. In our example it was a pretty simple query: Party.all , which translates to SELECT * FROM parties ; ActiveRecord queries the database with that SQL statement and waits for the response; The database runs the query and responds with something; ActiveRecord receives that data and builds ActiveRecord::Base models (in our example it builds Party models); PartySerializer#ruby_to_json calls #to_json for each of the Party models returned in the previous step. Finally, the controller returns that JSON string to whoever requested it.

Whew! Looks like a lot of work. What if we could skip those intermediate steps entirely and just have the database do all the work for us? Here's what that would look like:

We provide a SQL statement to ActiveRecord::connection.exec_query; Database runs the query and responds with JSON; The controller returns that JSON string to the client.

Looks much simpler, right?

Let's analyze the differences between those lists. In the second list we’re running a SQL statement right from the start, essentially skipping step 1 from the first list, so there’s a little less overhead in translating ActiveRecord syntax to SQL.

But the biggest difference here is that we skipped steps 4 and 5 from the previous list, which load the query results into memory and builds ActiveRecord models. In the end we're delivering JSON, so there is really no need to do that (unless we really need something from the Ruby world: e.g. if we were doing some kind of ETL where the Transform step was done in the Ruby process).

As you might guess, the magic happens in step 1 and 2: we need to build a SQL query that the RDBMS will run and respond with a JSON. Postgres has some pretty awesome built-in JSON functions and they're just perfect for this use case.

json_build_object to the rescue

json_build_object takes a list of keys and values and returns a JSON object. In our case, the keys are the JSON keys in the API response, and the values are their respective column names in the database. So we can expect this:

json_build_object(

‘id’, id,

‘name’, name,

‘description’, description,

‘starts_at’, starts_at,

‘ends_at’, ends_at

)

To return objects like these:

{

"name": "My awesome party",

"description": "Incredible awesomeness",

"starts_at": "2022-01-01T23:00:00.000Z",

"ends_at": "2022-01-02T02:00:00.000Z"

}

But we want to return a list of objects, right? That's the cue for json_agg.

json_agg aggregates values as a JSON array. Those values are, in our case, the JSON objects we just built. So let's glue the two together.

json_agg(

json_build_object(

'id', id,

'name', name,

'description', description,

'starts_at', starts_at,

'ends_at', ends_at

)

)

Those expressions need to be SELECTed, of course. Here's the final query:

SELECT

json_agg(

json_build_object(

'id', id,

'name', name,

'description', description,

'starts_at', starts_at,

'ends_at', ends_at

)

)

FROM parties

That produces exactly the same output as our Ruby-based JSONification.

We should JOIN forces

The examples we used are nice enough, but they seem awfully simple: they're entirely flat, that is, there is no nesting at all.

If you want to nest values that are already part of the model, that is pretty simple: just use json_build_object again. So something like:

SELECT

json_agg(

json_build_object(

'id', id,

'name', name,

'description', description,

'dates', json_build_object(

'starts_at', starts_at,

'ends_at', ends_at

)

)

)

FROM parties

Would produce what you'd expect:

[

{

"id": 1,

"name": "My awesome party",

"description": "Incredible awesomeness",

"dates": {

"starts_at": "2022-01-01T23:00:00",

"ends_at": "2022-01-02T02:00:00"

}

]

But what about if you want to nest another table within that JSON? Say our Parties also have many Sweepstakes, each with a name and description (both are strings), and we want the API to return those as a JSON array in the Party JSON. So we want the final result to be like this:

[

{

"id": 1,

"name": "My awesome party",

"description": "Incredible awesomeness",

"dates": {

"starts_at": "2022-01-01T23:00:00",

"ends_at": "2022-01-02T02:00:00"

},

"sweepstakes": [

{

"name": "My awesome Sweepstake",

"description": "Pure incredibleness"

},

{

"name": "My second awesome Sweepstake",

"description": "Purer incredibleness"

}

]

}

]

The thing is, you can't nest aggregation operators such as json_agg in Postgres, so you can't just do a JOIN on Sweepstakes and then aggregate them within the ongoing Party aggregation.

What we need to do is use json_agg inside the JOIN, and then put those values inside the Party aggregation:

SELECT

json_agg(

json_build_object(

'id', parties.id,

'name', parties.name,

'description', parties.description,

'dates', json_build_object(

'starts_at', starts_at,

'ends_at', ends_at

),

'sweepstakes', s.json_agg

)

)

FROM parties

LEFT JOIN (

SELECT

party_id,

json_agg(

json_build_object(

'name', name,

'description', description

)

)

FROM sweepstakes

GROUP BY party_id

) s ON s.party_id = parties.id

I know, I know: it seems awfully complicated compared to just a regular ActiveRecord joins . But it will be worth it, I promise.

Give me numbers!

You promised me more performance, give me some numbers! We're going to do some benchmarking now using this sample app.

First we need to understand a few things about Ruby's benchmark module. As far as we're concerned there are two important numbers in action here: total and real. The former is the total CPU time your code took, and the latter is the “wall clock” time, i.e. how much time actually went by during the execution.

Since in our optimization we’re essentially offloading work from Ruby's process to Postgres' process, then we should expect the Ruby approach to have greater CPU time relative to real time, while the Postgres approach should have less CPU time relative to real time (Postgres does, of course, use CPU cycles, but they aren't counted in Ruby's benchmark because they're a part of another process). In other words, the Ruby approach should spend most of real time on CPU operations, while the Postgres approach should spend most of real time in IO operations.

With that said, here's sample benchmark output measuring both approaches (pure Ruby x Postgres) with 100 Parties in our database:

user system total real

ruby 0.128114 0.000183 0.128297 ( 0.129308)

postgres 0.000642 0.000046 0.000688 ( 0.015426)

As we expected, the Postgres approach consumes considerably (~200x) less CPU time compared with the Ruby approach. Also as expected, IO time ( real — total ) is much more relevant in the Postgres approach than in the Ruby one: ~95% of real time was spent with IO, against ~0,7% with the Ruby approach.

So was the optimization worth it? It definitively seems so: the sample indicates something in the order of a 10x improvement in real time spent.

Nice, but a benchmark with only 100 elements in the database seems too small to matter. Let's try measuring the same code from 1k up to 10k elements: