I got asked on irc to show some examples how to use recursive CTE. Apparently my previous post wasn't good enough 🙂

I think that most of the users will use recursive cte to deal with trees I decided to show how to use it, even though it's not my favorite approach to dealing with trees in SQL.

Before we'll go to queries, first we need to know our data. I downloaded copy of directory structure for DMOZ, and converted it to this format:

$ \d dmoz TABLE "public.dmoz" COLUMN | TYPE | Modifiers -----------+---------+----------- id | INTEGER | NOT NULL parent_id | INTEGER | codename | text | Indexes: "dmoz_pkey" PRIMARY KEY , btree ( id ) "dmoz_parent_id" btree ( parent_id ) Foreign - KEY constraints: "dmoz_parent_id_fkey" FOREIGN KEY ( parent_id ) REFERENCES dmoz ( id ) Referenced BY : TABLE "dmoz" CONSTRAINT "dmoz_parent_id_fkey" FOREIGN KEY ( parent_id ) REFERENCES dmoz ( id )

$ SELECT * FROM dmoz LIMIT 10 ; id | parent_id | codename ----+-----------+------------------------- 1 | [ NULL ] | AOL 2 | [ NULL ] | Arts 3 | 2 | Animation 4 | 3 | Anime 5 | 4 | Characters 6 | 4 | Clubs_and_Organizations 7 | 4 | Collectibles 8 | 7 | Cels 9 | 7 | Models_and_Figures 10 | 9 | Action_Figures ( 10 ROWS )

Looks simple enough. Records with parent_id NULL are top level nodes. All other point to something below.

Basic things like:

get list of top level nodes

get list of nodes directly below given node

Are too simple to even describe, so let's move to something a bit more interesting.

First, as a warm-up – how to get “path" to given node. Let's assume we have node id ( 294010 ), and we want all parents, in order.

So, this is how the record looks:

$ SELECT * FROM dmoz WHERE id = 294010 ; id | parent_id | codename --------+-----------+---------- 294010 | 294009 | Maronite ( 1 ROW )

Basic recursive approach is:

$ WITH RECURSIVE parents AS ( SELECT * FROM dmoz WHERE id = 294010 UNION ALL SELECT dmoz .* FROM dmoz JOIN parents ON dmoz . id = parents . parent_id ) SELECT * FROM parents; id | parent_id | codename --------+-----------+--------------------- 294010 | 294009 | Maronite 294009 | 294008 | Eastern_Rites 294008 | 294003 | Catholicism 294003 | 294002 | Christianity 294002 | 293993 | Religion 293993 | 293815 | Society_and_Culture 293815 | 293793 | Brooklyn 293793 | 293622 | New_York_City 293622 | 290084 | N 290084 | 288505 | Localities 288505 | 172145 | New_York 172145 | 157689 | United_States 157689 | 90161 | North_America 90161 | [ NULL ] | Regional ( 14 ROWS )

The important part is – recursive query is union (union all actually) of two queries:

SELECT * FROM dmoz WHERE id = 294010 – this is our starting point

SELECT dmoz.* FROM dmoz join parents on dmoz.id = parents.parent_id – this is the magic

Please note that the “magic" part of query, even though it's inside

of “parents" CTE, it references this CTE (in join condition). This is the

recursive behaviour which can do a lot of cool things.

of course – the way I returned this data is pretty much useless – I have all the information I need, but it would be good to sort it properly.

In here, it just so happens that ids are ascending, so I could just “ORDER BY id", but let's assume it's not always the case, and we want proper sorting by nest level. How to do it?

$ WITH RECURSIVE parents AS ( SELECT *, 0 AS level FROM dmoz WHERE id = 294010 UNION ALL SELECT dmoz .*, parents . level + 1 AS level FROM dmoz JOIN parents ON dmoz . id = parents . parent_id ) SELECT * FROM parents ORDER BY level DESC ; id | parent_id | codename | level --------+-----------+---------------------+------- 90161 | [ NULL ] | Regional | 13 157689 | 90161 | North_America | 12 172145 | 157689 | United_States | 11 288505 | 172145 | New_York | 10 290084 | 288505 | Localities | 9 293622 | 290084 | N | 8 293793 | 293622 | New_York_City | 7 293815 | 293793 | Brooklyn | 6 293993 | 293815 | Society_and_Culture | 5 294002 | 293993 | Religion | 4 294003 | 294002 | Christianity | 3 294008 | 294003 | Catholicism | 2 294009 | 294008 | Eastern_Rites | 1 294010 | 294009 | Maronite | 0 ( 14 ROWS )

As you can see in the starting point query I added “0 as level". And in the magic part of the query, I incremented level from previous row. This, plus ORDER BY level DESC solved the problem. Even if ids wouldn't be in order, I would get the path in correct way.

How efficient it is?

$ EXPLAIN ANALYZE WITH RECURSIVE parents AS ( SELECT *, 0 AS level FROM dmoz WHERE id = 294010 UNION ALL SELECT dmoz .*, parents . level + 1 AS level FROM dmoz JOIN parents ON dmoz . id = parents . parent_id ) SELECT * FROM parents ORDER BY level DESC ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Sort ( cost = 851.54 .. 851 . 79 ROWS = 101 width = 44 ) ( actual TIME = 0.125 .. 0 . 128 ROWS = 14 loops = 1 ) Sort KEY : parents . level Sort Method: quicksort Memory: 26kB CTE parents -> Recursive UNION ( cost = 0.00 .. 846 . 15 ROWS = 101 width = 25 ) ( actual TIME = 0.010 .. 0 . 100 ROWS = 14 loops = 1 ) -> INDEX Scan USING dmoz_pkey ON dmoz ( cost = 0.00 .. 8 . 32 ROWS = 1 width = 21 ) ( actual TIME = 0.009 .. 0 . 010 ROWS = 1 loops = 1 ) INDEX Cond: ( id = 294010 ) -> Nested Loop ( cost = 0.00 .. 83 . 58 ROWS = 10 width = 25 ) ( actual TIME = 0.005 .. 0 . 005 ROWS = 1 loops = 14 ) -> WorkTable Scan ON parents ( cost = 0.00 .. 0 . 20 ROWS = 10 width = 8 ) ( actual TIME = 0.000 .. 0 . 000 ROWS = 1 loops = 14 ) -> INDEX Scan USING dmoz_pkey ON dmoz ( cost = 0.00 .. 8 . 32 ROWS = 1 width = 21 ) ( actual TIME = 0.003 .. 0 . 004 ROWS = 1 loops = 14 ) INDEX Cond: ( id = parents . parent_id ) -> CTE Scan ON parents ( cost = 0.00 .. 2 . 02 ROWS = 101 width = 44 ) ( actual TIME = 0.012 .. 0 . 114 ROWS = 14 loops = 1 ) Total runtime: 0.177 ms ( 13 ROWS )

No magic in here – 15 index scans on dmoz_pkey (one to get the starting point, and 14 loops to get parents). The benefit is that you don't have to run every query yourself, so you save time for roudtrips to server and back.

That was simple. So let's talk about something else. How to get all nodes from given point, down the tree?

Full dmoz tree is a bit over half a million records, so I will do it for a subset of the tree. Let's say, I want all nodes in the Christianity from above table. It's id is 294003. So how can we get all the data?

We start with basic:

$ WITH RECURSIVE c AS ( SELECT * FROM dmoz WHERE id = 294003 ) SELECT * FROM c; id | parent_id | codename --------+-----------+-------------- 294003 | 294002 | Christianity ( 1 ROW )

Please note that I used “RECURSIVE" keyword, while the query is not recursive at all. It's OK, I can do it, it doesn't cause any problems.

Based on what I showed with previous queries, I can quickly add part to get all kids:

$ WITH RECURSIVE c AS ( SELECT * FROM dmoz WHERE id = 294003 UNION ALL SELECT dmoz .* FROM dmoz JOIN c ON dmoz . parent_id = c . id ) SELECT * FROM c; id | parent_id | codename --------+-----------+------------------------------------ 294003 | 294002 | Christianity 294004 | 294003 | Anabaptist 294006 | 294003 | Anglican 294007 | 294003 | Baptist 294008 | 294003 | Catholicism 294011 | 294003 | Christian_and_Missionary_Alliance 294012 | 294003 | Christian_Church 294014 | 294003 | Churches_of_God 294017 | 294003 | Congregational 294018 | 294003 | Evangelical_Free_Church_of_America 294019 | 294003 | Lutheran 294020 | 294003 | Methodist 294023 | 294003 | Orthodox 294024 | 294003 | Pentecostalism 294026 | 294003 | Presbyterian 294027 | 294003 | Reformed_Church 294028 | 294003 | Religious_Society_of_Friends 294029 | 294003 | Seventh - day_Adventists 294030 | 294003 | United_Church_of_Christ 294005 | 294004 | Mennonites 294009 | 294008 | Eastern_Rites 294013 | 294012 | Disciples_of_Christ 294015 | 294014 | Church_of_God_in_Christ 294016 | 294014 | Worldwide_Church_of_God 294021 | 294020 | African_Methodist_Episcopal 294022 | 294020 | United_Methodist 294025 | 294024 | Assemblies_of_God 294010 | 294009 | Maronite ( 28 ROWS )

What's important – I can also add level information, for example if I'd want to limit the depth of shown information to “2 levels below current node" or something like this:

WITH RECURSIVE c AS ( SELECT *, 0 AS level FROM dmoz WHERE id = 294003 UNION ALL SELECT dmoz .*, c . level + 1 AS level FROM dmoz JOIN c ON dmoz . parent_id = c . id ) SELECT * FROM c; id | parent_id | codename | level --------+-----------+------------------------------------+------- 294003 | 294002 | Christianity | 0 294004 | 294003 | Anabaptist | 1 294006 | 294003 | Anglican | 1 294007 | 294003 | Baptist | 1 294008 | 294003 | Catholicism | 1 294011 | 294003 | Christian_and_Missionary_Alliance | 1 294012 | 294003 | Christian_Church | 1 294014 | 294003 | Churches_of_God | 1 294017 | 294003 | Congregational | 1 294018 | 294003 | Evangelical_Free_Church_of_America | 1 294019 | 294003 | Lutheran | 1 294020 | 294003 | Methodist | 1 294023 | 294003 | Orthodox | 1 294024 | 294003 | Pentecostalism | 1 294026 | 294003 | Presbyterian | 1 294027 | 294003 | Reformed_Church | 1 294028 | 294003 | Religious_Society_of_Friends | 1 294029 | 294003 | Seventh - day_Adventists | 1 294030 | 294003 | United_Church_of_Christ | 1 294005 | 294004 | Mennonites | 2 294009 | 294008 | Eastern_Rites | 2 294013 | 294012 | Disciples_of_Christ | 2 294015 | 294014 | Church_of_God_in_Christ | 2 294016 | 294014 | Worldwide_Church_of_God | 2 294021 | 294020 | African_Methodist_Episcopal | 2 294022 | 294020 | United_Methodist | 2 294025 | 294024 | Assemblies_of_God | 2 294010 | 294009 | Maronite | 3 ( 28 ROWS )

This is all great, but how can I sort it so that the result will be sensible?

We can add “paths" – generated by taking all codenames for given element and its parent (up to the starting point), and sort with it:

WITH RECURSIVE c AS ( SELECT *, 0 AS level , codename AS path FROM dmoz WHERE id = 294003 UNION ALL SELECT dmoz .*, c . level + 1 AS level , c . path || '/' || dmoz . codename AS path FROM dmoz JOIN c ON dmoz . parent_id = c . id ) SELECT * FROM c ORDER BY path; id | parent_id | codename | level | path --------+-----------+------------------------------------+-------+------------------------------------------------------ 294003 | 294002 | Christianity | 0 | Christianity 294004 | 294003 | Anabaptist | 1 | Christianity / Anabaptist 294005 | 294004 | Mennonites | 2 | Christianity / Anabaptist / Mennonites 294006 | 294003 | Anglican | 1 | Christianity / Anglican 294007 | 294003 | Baptist | 1 | Christianity / Baptist 294008 | 294003 | Catholicism | 1 | Christianity / Catholicism 294009 | 294008 | Eastern_Rites | 2 | Christianity / Catholicism / Eastern_Rites 294010 | 294009 | Maronite | 3 | Christianity / Catholicism / Eastern_Rites / Maronite 294011 | 294003 | Christian_and_Missionary_Alliance | 1 | Christianity / Christian_and_Missionary_Alliance 294012 | 294003 | Christian_Church | 1 | Christianity / Christian_Church 294013 | 294012 | Disciples_of_Christ | 2 | Christianity / Christian_Church / Disciples_of_Christ 294014 | 294003 | Churches_of_God | 1 | Christianity / Churches_of_God 294015 | 294014 | Church_of_God_in_Christ | 2 | Christianity / Churches_of_God / Church_of_God_in_Christ 294016 | 294014 | Worldwide_Church_of_God | 2 | Christianity / Churches_of_God / Worldwide_Church_of_God 294017 | 294003 | Congregational | 1 | Christianity / Congregational 294018 | 294003 | Evangelical_Free_Church_of_America | 1 | Christianity / Evangelical_Free_Church_of_America 294019 | 294003 | Lutheran | 1 | Christianity / Lutheran 294020 | 294003 | Methodist | 1 | Christianity / Methodist 294021 | 294020 | African_Methodist_Episcopal | 2 | Christianity / Methodist / African_Methodist_Episcopal 294022 | 294020 | United_Methodist | 2 | Christianity / Methodist / United_Methodist 294023 | 294003 | Orthodox | 1 | Christianity / Orthodox 294024 | 294003 | Pentecostalism | 1 | Christianity / Pentecostalism 294025 | 294024 | Assemblies_of_God | 2 | Christianity / Pentecostalism / Assemblies_of_God 294026 | 294003 | Presbyterian | 1 | Christianity / Presbyterian 294027 | 294003 | Reformed_Church | 1 | Christianity / Reformed_Church 294028 | 294003 | Religious_Society_of_Friends | 1 | Christianity / Religious_Society_of_Friends 294029 | 294003 | Seventh - day_Adventists | 1 | Christianity / Seventh - day_Adventists 294030 | 294003 | United_Church_of_Christ | 1 | Christianity / United_Church_of_Christ ( 28 ROWS )

And finally, explain:

EXPLAIN ANALYZE WITH RECURSIVE c AS ( SELECT *, 0 AS level , codename AS path FROM dmoz WHERE id = 294003 UNION ALL SELECT dmoz .*, c . level + 1 AS level , c . path || '/' || dmoz . codename AS path FROM dmoz JOIN c ON dmoz . parent_id = c . id ) SELECT * FROM c ORDER BY path; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Sort ( cost = 980.07 .. 982 . 74 ROWS = 1071 width = 76 ) ( actual TIME = 0.331 .. 0 . 333 ROWS = 28 loops = 1 ) Sort KEY : c . path Sort Method: quicksort Memory: 28kB CTE c -> Recursive UNION ( cost = 0.00 .. 904 . 75 ROWS = 1071 width = 57 ) ( actual TIME = 0.011 .. 0 . 164 ROWS = 28 loops = 1 ) -> INDEX Scan USING dmoz_pkey ON dmoz ( cost = 0.00 .. 8 . 32 ROWS = 1 width = 21 ) ( actual TIME = 0.009 .. 0 . 010 ROWS = 1 loops = 1 ) INDEX Cond: ( id = 294003 ) -> Nested Loop ( cost = 0.00 .. 87 . 50 ROWS = 107 width = 57 ) ( actual TIME = 0.007 .. 0 . 034 ROWS = 7 loops = 4 ) -> WorkTable Scan ON c ( cost = 0.00 .. 0 . 20 ROWS = 10 width = 40 ) ( actual TIME = 0.000 .. 0 . 002 ROWS = 7 loops = 4 ) -> INDEX Scan USING dmoz_parent_id ON dmoz ( cost = 0.00 .. 8 . 51 ROWS = 11 width = 21 ) ( actual TIME = 0.003 .. 0 . 003 ROWS = 1 loops = 28 ) INDEX Cond: ( parent_id = c . id ) -> CTE Scan ON c ( cost = 0.00 .. 21 . 42 ROWS = 1071 width = 76 ) ( actual TIME = 0.013 .. 0 . 191 ROWS = 28 loops = 1 ) Total runtime: 0.405 ms ( 13 ROWS )

Now, one can ask: OK, but I want custom ordering – for example I want Catholicism as the first element in Christianity. Solution: simply add column like “priority", and use this column (+codename to handle duplicate priorities):

$ ALTER TABLE dmoz ADD COLUMN priority int4 DEFAULT 500 ; ALTER TABLE $ ALTER TABLE dmoz ADD CHECK ( priority BETWEEN 100 AND 999 ) ; ALTER TABLE

Now – all priorities will be 3 digit integers, with default value being 500.

So, now I can increase priority of Catholicism (record in database, not religion!):

$ UPDATE dmoz SET priority = 250 WHERE id = 294008 ; UPDATE 1

With priority change in place, I can change the query to:

$ WITH RECURSIVE c AS ( SELECT *, 0 AS level , codename AS path , priority || codename AS priority_path FROM dmoz WHERE id = 294003 UNION ALL SELECT dmoz .*, c . level + 1 AS level , c . path || '/' || dmoz . codename AS path , c . priority_path || '/' || dmoz . priority || dmoz . codename AS priority_path FROM dmoz JOIN c ON dmoz . parent_id = c . id ) SELECT * FROM c ORDER BY priority_path; id | parent_id | codename | priority | level | path | priority_path --------+-----------+------------------------------------+----------+-------+------------------------------------------------------+--------------------------------------------------------------- 294003 | 294002 | Christianity | 500 | 0 | Christianity | 500Christianity 294008 | 294003 | Catholicism | 250 | 1 | Christianity / Catholicism | 500Christianity / 250Catholicism 294009 | 294008 | Eastern_Rites | 500 | 2 | Christianity / Catholicism / Eastern_Rites | 500Christianity / 250Catholicism / 500Eastern_Rites 294010 | 294009 | Maronite | 500 | 3 | Christianity / Catholicism / Eastern_Rites / Maronite | 500Christianity / 250Catholicism / 500Eastern_Rites / 500Maronite 294004 | 294003 | Anabaptist | 500 | 1 | Christianity / Anabaptist | 500Christianity / 500Anabaptist 294005 | 294004 | Mennonites | 500 | 2 | Christianity / Anabaptist / Mennonites | 500Christianity / 500Anabaptist / 500Mennonites 294006 | 294003 | Anglican | 500 | 1 | Christianity / Anglican | 500Christianity / 500Anglican 294007 | 294003 | Baptist | 500 | 1 | Christianity / Baptist | 500Christianity / 500Baptist 294011 | 294003 | Christian_and_Missionary_Alliance | 500 | 1 | Christianity / Christian_and_Missionary_Alliance | 500Christianity / 500Christian_and_Missionary_Alliance 294012 | 294003 | Christian_Church | 500 | 1 | Christianity / Christian_Church | 500Christianity / 500Christian_Church 294013 | 294012 | Disciples_of_Christ | 500 | 2 | Christianity / Christian_Church / Disciples_of_Christ | 500Christianity / 500Christian_Church / 500Disciples_of_Christ 294014 | 294003 | Churches_of_God | 500 | 1 | Christianity / Churches_of_God | 500Christianity / 500Churches_of_God 294015 | 294014 | Church_of_God_in_Christ | 500 | 2 | Christianity / Churches_of_God / Church_of_God_in_Christ | 500Christianity / 500Churches_of_God / 500Church_of_God_in_Christ 294016 | 294014 | Worldwide_Church_of_God | 500 | 2 | Christianity / Churches_of_God / Worldwide_Church_of_God | 500Christianity / 500Churches_of_God / 500Worldwide_Church_of_God 294017 | 294003 | Congregational | 500 | 1 | Christianity / Congregational | 500Christianity / 500Congregational 294018 | 294003 | Evangelical_Free_Church_of_America | 500 | 1 | Christianity / Evangelical_Free_Church_of_America | 500Christianity / 500Evangelical_Free_Church_of_America 294019 | 294003 | Lutheran | 500 | 1 | Christianity / Lutheran | 500Christianity / 500Lutheran 294020 | 294003 | Methodist | 500 | 1 | Christianity / Methodist | 500Christianity / 500Methodist 294021 | 294020 | African_Methodist_Episcopal | 500 | 2 | Christianity / Methodist / African_Methodist_Episcopal | 500Christianity / 500Methodist / 500African_Methodist_Episcopal 294022 | 294020 | United_Methodist | 500 | 2 | Christianity / Methodist / United_Methodist | 500Christianity / 500Methodist / 500United_Methodist 294023 | 294003 | Orthodox | 500 | 1 | Christianity / Orthodox | 500Christianity / 500Orthodox 294024 | 294003 | Pentecostalism | 500 | 1 | Christianity / Pentecostalism | 500Christianity / 500Pentecostalism 294025 | 294024 | Assemblies_of_God | 500 | 2 | Christianity / Pentecostalism / Assemblies_of_God | 500Christianity / 500Pentecostalism / 500Assemblies_of_God 294026 | 294003 | Presbyterian | 500 | 1 | Christianity / Presbyterian | 500Christianity / 500Presbyterian 294027 | 294003 | Reformed_Church | 500 | 1 | Christianity / Reformed_Church | 500Christianity / 500Reformed_Church 294028 | 294003 | Religious_Society_of_Friends | 500 | 1 | Christianity / Religious_Society_of_Friends | 500Christianity / 500Religious_Society_of_Friends 294029 | 294003 | Seventh - day_Adventists | 500 | 1 | Christianity / Seventh - day_Adventists | 500Christianity / 500Seventh - day_Adventists 294030 | 294003 | United_Church_of_Christ | 500 | 1 | Christianity / United_Church_of_Christ | 500Christianity / 500United_Church_of_Christ ( 28 ROWS )

The query looks different, but that's just because of indentation – the only real change is addition of priority_path column, and usage of it (instead of normal path) for ORDER BY.

Of course – depending on your case you might not calculate “path" at all, or return just a subset of columns. Or even do voodoo magic, and return the data as html:

$ WITH RECURSIVE c AS ( SELECT *, 0 AS level , codename AS path , priority || codename AS priority_path FROM dmoz WHERE id = 294003 UNION ALL SELECT dmoz .*, c . level + 1 AS level , c . path || '/' || dmoz . codename AS path , c . priority_path || '/' || dmoz . priority || dmoz . codename AS priority_path FROM dmoz JOIN c ON dmoz . parent_id = c . id ) SELECT repeat ( '<ol>' , level - COALESCE ( lag ( level ) OVER ( ORDER BY priority_path ) , - 1 ) ) || '<li><a href="http://www.dmoz.org/Regional/North_America/United_States/New_York/Localities/N/New_York_City/Brooklyn/Society_and_Culture/Religion/' || c . path || '">' || c . codename || '</a></li>' || repeat ( '</ol>' , level - COALESCE ( lead ( level ) OVER ( ORDER BY priority_path ) , - 1 ) ) FROM c ORDER BY priority_path;

Which returns HTML that renders like this:

So, I think that's about it. If something is not clear – please let me know. I will try to answer all questions/problems, aside from the final voodoo magic, which I will not explain – it's voodoo after all.