Blog

Many SQL implementations don’t have loops, making some kinds of analysis very difficult. Postgres, SQL Server, and several others have the next best thing — recursive CTEs!

We’ll use them to solve the Traveling Salesman Problem and find the shortest round-trip route through several US cities.

With Recursive

Normal CTEs are great at helping to organize large queries. They are a simple way to make temporary tables you can access later in your query.

Recursive CTEs are more powerful – they reference themselves and allow you to explore hierarchical data. While that may sound complicated, the underlying concept is very similar to a for loop in other programming languages.

These CTEs have two parts — an anchor member and a recursive member. The anchor member selects the starting rows for the recursive steps.

The recursive member generates more rows for the CTE by first joining against the anchor rows, and then joining against rows created in previous recursions. The recursive member comes after a union all in the CTE definition.

Here’s a simple recursive CTE that generates the numbers 1 to 10. The anchor member selects the value 1, and the recursive member adds to it up to the number 10:

with recursive incrementer(prev_val) as ( select 1 -- anchor member union all select -- recursive member incrementer.prev_val + 1 from incrementer where prev_val < 10 -- termination condition ) select * from incrementer

The first time the recursive CTE runs it generates a single row 1 using the anchor member. In the second execution, the recursive member joins against the 1 and outputs a second row, 2. In the third execution the recursive step joins against both rows 1 and 2 and adds the rows 2 (a duplicate) and 3.

Recursive CTEs also only return distinct rows. Even though our CTE above creates many rows with the same value, only a distinct set of rows will be returned.

Notice how the CTE specifies its output as the named value prev_val. This lets us refer to the output of the previous recursive step.

And at the very end there is a termination condition to halt the recursion once the sum gets to 10. Without this condition, the CTE would enter an infinite loop!

Under the hood, the database is building up a table named after this recursive CTE using unions:

Recursive CTEs can also have many parameters. Here’s one that takes the sum, double, and square of starting values of 1, 2 and 3:

with recursive cruncher(inc, double, square) as ( select 1, 2.0, 3.0 -- anchor member union all select -- recursive member cruncher.inc + 1, cruncher.double * 2, cruncher.square ^ 2 from cruncher where inc < 10 ) select * from cruncher

With recursive CTEs, we can solve the Traveling Salesman Problem.

Finding the Shortest Path

There are many algorithms for finding the shortest round-trip path through several cities. We’ll use the simplest: brute force. Our recursive CTE will enumerate all possible routes and their total distances. We’ll then sort to find the shortest.

First, a list of cities with our customers, along with their latitudes and longitudes:

create table places as ( select 'Seattle' as name, 47.6097 as lat, 122.3331 as lon union all select 'San Francisco', 37.7833, 122.4167 union all select 'Austin', 30.2500, 97.7500 union all select 'New York', 40.7127, 74.0059 union all select 'Boston', 42.3601, 71.0589 union all select 'Chicago', 41.8369, 87.6847 union all select 'Los Angeles', 34.0500, 118.2500 union all select 'Denver', 39.7392, 104.9903 )

And we’ll need a distance function to compute how far two lat/lons are from each other (thanks to strkol on stackoverflow.com):\

create or replace function lat_lon_distance( lat1 float, lon1 float, lat2 float, lon2 float ) returns float as $$ declare x float = 69.1 * (lat2 - lat1); y float = 69.1 * (lon2 - lon1) * cos(lat1 / 57.3); begin return sqrt(x * x + y * y); end $$ language plpgsql

Our CTE will use San Francisco as its anchor city, and then recurse from there to every other city:

with recursive travel(places_chain, last_lat, last_lon, total_distance, num_places) as ( select -- anchor member name, lat, lon, 0::float, 1 from places where name = 'San Francisco' union all select -- recursive member -- add to the current places_chain travel.places_chain || ' -> ' || places.name, places.lat, places.lon, -- add to the current total_distance travel.total_distance + lat_lon_distance(last_lat, last_lon, places.lat, places.lon), travel.num_places + 1 from places, travel where position(places.name in travel.places_chain) = 0 )

The parameters in the CTE are:

places_chain : The list of places visited so far, which will be different for each instance of the recursion

: The list of places visited so far, which will be different for each instance of the recursion last_lat and last_lon : The latitude and longitude of the last place in the places_chain

: The latitude and longitude of the last place in the total_distance : The distance traveled going from one place to the next in the places_chain

: The distance traveled going from one place to the next in the num_places: The number of places in places_chain — we’ll use this to tell which routes are complete because they visited all cities

In the recursive member, the where clause ensures that we never repeat a place. If we’ve already visited Denver, position(…) will return a number greater than 0, invalidating this instance of the recursion.

We can see all possible routes by selecting all 8-city chains:

select * from travel where num_places = 8

We need to add in the distance from the last city back to San Francisco to complete the round-trip. We could hard code San Francisco’s lat/lon, but a join is more elegant. Once that’s done we sort by distance and show the smallest:

select travel.places_chain || ' -> ' || places.name, total_distance + lat_lon_distance( travel.last_lat, travel.last_lon, places.lat, places.lon) as final_dist from travel, places where travel.num_places = 8 and places.name = 'San Francisco' order by 2 -- ascending! limit 1

Even though this query is significantly more complicated than the incrementer query earlier, the database is doing the same things behind the scenes. The top branch is the creating the CTE’s rows, the bottom branch is the final join and sort:

Run this query and you’ll see the shortest route takes 6671 miles and visits the cities in this order:

San Francisco -> Seattle -> Denver -> Chicago -> Boston -> New York -> Austin -> Los Angeles -> San Francisco

Thanks to recursive CTEs, we can solve the Traveling Salesman Problem in SQL!