In almost any application it’s common to want to aggregate some set of values together, commonly in a comma separated form. Most developers do this by running a query to get much of the raw data, looping over the data and pushing it into a set, appending each new value to the appropriate key. Hopefully, it’s not a surprise that there’s a much better way to do this with PostgreSQL.

Postgres has a flexible and robust array datatype that comes with a variety of functions. Even without taking advantage of the array datatype in your application, you can still take advantage of some of the functions to get the functionality you need. Lets take a look at an example schema and use case.

An example

Given a project management application, you may have users who have projects that have tasks . An example piece of functionality might be to send an email with a list of all projects that have tasks that are past their due dates of completion. Your schema might look something like this:

# \d users Table "public.users" Column | Type | Modifiers ------------+-----------------------------+----------- id | integer | not null email | character varying(255) | ... # \d projects Table "public.projects" Column | Type | Modifiers ------------+-----------------------------+----------- id | integer | not null user_id | integer | not null name | character varying(255) | not null ... # \d tasks Table "public.tasks" Column | Type | Modifiers --------------+-----------------------------+----------- id | integer | not null project_id | integer | not null completed_at | timestamp without time zone | due_at | timestamp without time zone | ...

To get a list of all projects that have tasks that haven’t been completed, you would start with something like:

SELECT projects.name FROM projects, tasks WHERE projects.id = tasks.project_id AND tasks.due_at > tasks.completed_at AND tasks.due_at > now()

This would give you a list of projects which you could then easily join this with users:

SELECT users.email projects.name FROM projects, tasks, users WHERE projects.id = tasks.project_id AND tasks.due_at > tasks.completed_at AND tasks.due_at > now() AND users.id = projects.user_id

At this point you’ve got everything you need to pull this up into Ruby, Python, or other language of your choice and then build the full set. However if this is thousands or even hundreds of results you’ll be spending more time than necessary, grouping this data for a sensible email. With 3 other small changes you can have this already formatted for you to immediately send of in an email. The first is using a handy function called array_agg which will aggregate items and then you can format them how you wish. The second is just ensuring you’re grouping correctly. Finally you’ll want to unnest the array so it formats the data in a clean way for you.

Looking at it all put together:

SELECT users.email, array_to_string(array_agg(projects.name), ',')) as projects FROM projects, tasks, users WHERE projects.id = tasks.project_id AND tasks.due_at > tasks.completed_at AND tasks.due_at > now() AND users.id = projects.user_id GROUP BY users.email

This would give you a nice clean result of projects that have overdue tasks that you could then send to the user in an email:

email | projects ---------------------------+------------------- craig.kerstiens@gmail.com | blog, timetracker craig@heroku.com | foo, bar, baz

Find this article helpful? I welcome your support on Patreon to be able to devote more time to articles like this that others find useful.