group_concat as an alternative for array_agg

For quite some time I have been hardcore PostgreSQL user. I like to leverage database capabilities when possible to get faster response times in my Ruby On Rails applications. Even though PostgreSQL is still by far my favourite open source RDBMS, I’m currently working with MySQL on daily basis and I often find myself searching for MySQL equivalent for some PostgreSQL features.

For the sake of this post let’s assume following table schema

CREATE TABLE orders ( id SERIAL , user_id integer , total_amount numeric , CONSTRAINT orders_pkey PRIMARY KEY ( id ) );

The requirement is to return all users with total amount of money they spent purchasing items. The implementation is pretty straightforward and query looks the same in both database engines.

SELECT user_id , sum ( total_amount ) FROM orders GROUP BY user_id

Having that we are presented with new requirement:

for users that made purchases for more than $1000 we want to get their order ids for further processing.

First part of the task is easily solvable in both databases using HAVING

SELECT user_id , sum ( total_amount ) FROM orders GROUP BY user_id HAVING sum ( total_amount ) > 1000

The next part we can satisfy from application code on couple different ways, for example triggering separate queries for each user

Order . group ( :user_id ). having ( 'sum(total_amount) > 1000' ). pluck ( 'user_id, sum(total_amount)' ). each do | user_id , _ | # separate query to get order ids for each qualified user order_ids = Order . where ( user_id: user_id ). pluck ( :id ) end

This represents classical N+1 queries issue that will most likely have negative impact on performance. Lucky for us it can be solved using built-in features of both MySQL and PostgreSQL.

array_agg in PostgreSQL

In PostgreSQL we can get aggregated values for the group using array_agg function

-- PostgreSQL SELECT user_id , sum ( total_amount ), array_agg ( id ) as order_ids FROM orders GROUP BY user_id HAVING sum ( total_amount ) > 1000

We got all the information we want with just 1 database round-trip, awesome!

group_concat in MySQL

When working with MySQL there is also a possibility to achieve similar results using group_concat. There are two main differences though:

in ruby code we don’t get array but string of ids separated by some character - by default it will be comma

there is a length limit applied, after reaching it the value will be quietly truncated (I admit that I did not read MySQL’s documentation carefully enough and was surprised by this)

First bullet is easily solvable but how to get around the second one? It turns out that we can increase max length that is allowed for group_concat results! Consider this code:

with_increased_group_concat_max_len do Order . group ( :user_id ). having ( 'sum(total_amount) > 1000' ). pluck ( 'user_id, sum(total_amount), group_concat(id)' ). each do | user_id , _ , order_ids | # ... end end

def with_increased_group_concat_max_len ( value = 1045576 ) original = get_group_concat_max_len set_group_concat_max_len ( value ) yield ensure set_group_concat_max_len ( original ) end

def get_group_concat_max_len row = ActiveRecord :: Base . connection . select_one <<- SQL SHOW VARIABLES like 'group_concat_max_len' SQL row [ 'Value' ]. to_i end

def set_group_concat_max_len ( value ) ActiveRecord :: Base . connection . execute <<- SQL SET SESSION group_concat_max_len = #{ value } ; SQL end

We wrapped the query in a block that remembered original value of group_concat_max_len , increased it for the time of executing query and once the query got executed set it back to previous value.

Just as with PostgreSQL, we got all the information we want with just 1 database round-trip, awesome!

WARNING

The max value that can be set for group_concat_max_len depends on other server variable - max_allowed_packet. I strongly recommend to read documentation and verify what’s the maximum value for your installation.