Google’s BigQuery has a very interesting SQL language feature, which I’ve missed many times in other databases:

select: SELECT [{ ALL | DISTINCT }] { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ] [ REPLACE ( expression [ AS ] column_name [, ...] ) ] | expression [ [ AS ] alias ] } [, ...] [ FROM from_item [, ...] ] [ WHERE bool_expression ] ...

Notice the two keywords EXCEPT and REPLACE that can be used along with an asterisked expression.

An Example

For example, when running a query like this one (which fetches the longest film(s) every actor in the Sakila database played in):

SELECT * FROM ( SELECT a.*, f.*, RANK() OVER (PARTITION BY actor_id ORDER BY length DESC) rk FROM film f JOIN film_actor fa USING (film_id) JOIN actor a USING (actor_id) ) t WHERE rk = 1 ORDER BY first_name, last_name

This is one way to apply TOP-N per category filtering in SQL, which works with most modern databases, including MySQL 8.0. Essentially, we’re calculating the rank of a film per actor ordered by the film’s length.

The result looks like this:

actor_id |first_name |last_name |..|title |length|..|rk | ---------|------------|-------------|..|-----------------------|------|--|---| 71 |ADAM |GRANT |..|GLADIATOR WESTWARD | 173|..|1 | 71 |ADAM |GRANT |..|BALLROOM MOCKINGBIRD | 173|..|1 | 132 |ADAM |HOPPER |..|TORQUE BOUND | 179|..|1 | 165 |AL |GARLAND |..|JACKET FRISCO | 181|..|1 |

Let’s assume for a moment that we really need the entire projection of the ACTOR and FILM tables (so, SELECT * is fine), but we certainly don’t need the useless RK column, which is always 1.

Sometimes, having some excess columns is not going to be a problem, but sometimes it is. How to remove it? We can’t reference the ACTOR and FILM tables anymore in the outer query:

SELECT a.*, f.* -- Would be great, but wrong syntax FROM ( SELECT a.*, f.*, RANK() OVER (PARTITION BY actor_id ORDER BY length DESC) rk FROM film f JOIN film_actor fa USING (film_id) JOIN actor a USING (actor_id) ) t WHERE rk = 1 ORDER BY first_name, last_name

The outer query only has one table, and that’s the (derived) table T .

How to Solve This

In BigQuery syntax, we could now simply write

SELECT * EXCEPT rk FROM (...) t WHERE rk = 1 ORDER BY first_name, last_name

Which is really quite convenient! We want to project everything, except this one column. But none of the more popular SQL databases support this syntax.

Luckily, in PostgreSQL, we can use a workaround: Nested records:

SELECT (a).*, (f).* -- Unnesting the records again FROM ( SELECT a, -- Nesting the actor table f, -- Nesting the film table RANK() OVER (PARTITION BY actor_id ORDER BY length DESC) rk FROM film f JOIN film_actor fa USING (film_id) JOIN actor a USING (actor_id) ) t WHERE rk = 1 ORDER BY (a).first_name, (a).last_name;

Notice how we’re no longer projecting A.* and F.* inside of the derived table T , but instead, the entire table (record). In the outer query, we have to use some slightly different syntax to unnest the record again (e.g. (A).FIRST_NAME ), and we’re done.

How Does This Work?

Informix, Oracle, PostgreSQL, and maybe a few lesser known ones, have implemented the SQL standard’s ORDBMS features to various degrees. ORDBMS attempted to combine relational and object oriented features in the SQL language (and in the storage model). For all practical purposes, this essentially just means that we can nest records and collections.

For instance, in PostgreSQL, we can write:

-- Explicit ROW constructor SELECT 1, ROW(2, ROW(3, 4)) -- Implicit ROW constructor SELECT 1, (2, (3, 4))

And we’ll get:

x |row | ---------|----------| 1 |(2,(3,4)) |

Along with ordinary “scalar” values, we can have nested rows (or nested tuples) constructed very easily. Conveniently, we can also reference a table without its column names in the projection, such as:

SELECT a, f FROM film f JOIN film_actor fa USING (film_id) JOIN actor a USING (actor_id)

Which produces the aforementioned result:

a |f | -----|-----| (...)|(...)| (...)|(...)| (...)|(...)| ...

Similar things are possible in Oracle as well, except that Oracle doesn’t support structural row/tuple types, only nominal ones. We’d have to create some types first, prior to being able to use them:

CREATE TABLE film_t AS OBJECT (...);

Bonus

Of course, if you’re using SQL Server or Oracle, you wouldn’t have this problem, because then you could use the totally underrated WITH TIES clause along with CROSS APPLY :

SQL Server

SELECT * FROM actor a CROSS APPLY ( SELECT TOP 1 WITH TIES f.* FROM film f JOIN film_actor fa ON f.film_id = fa.film_id AND fa.actor_id = a.actor_id ORDER BY length DESC ) f ORDER BY first_name, last_name;

Oracle

(Do check performance on this!)

SELECT * FROM actor a CROSS APPLY ( SELECT f.* FROM film f JOIN film_actor fa ON f.film_id = fa.film_id AND fa.actor_id = a.actor_id ORDER BY length DESC FETCH FIRST ROW WITH TIES ) f ORDER BY first_name, last_name;

PostgreSQL and DB2 support the LATERAL keyword, which could be used with FETCH FIRST ROW ONLY semantics (so, no ties are selected).

For more details about TOP N per category queries, see this blog post