One of Oracle 12c’s coolest features was the introduction of the SQL standard OFFSET .. FETCH clause, as we can now write things like:

SELECT * FROM film ORDER BY film_id FETCH FIRST 1 ROW ONLY

This is querying the Sakila database. Most other databases had this clause (or a non-standard version of it) for ages, e.g. MySQL with LIMIT . For all the different LIMIT syntaxes, check out the jOOQ manual.

Implementation wise, the Oracle folks chose to rewrite this clause as a simple window function filter. In principle, behind the scenes, the following is executed:

Teradata syntax

SELECT * FROM film QUALIFY row_number() OVER (ORDER BY film_id) = 1 ORDER BY film_id

Standard syntax

SELECT * -- Except rn FROM ( SELECT film.*, row_number() OVER (ORDER BY film_id) rn FROM film ) t WHERE rn = 1 ORDER BY film_id

This does definitely look much better than the “old” approach using ROWNUM filtering, which many of us have written for years:

Legacy Oracle syntax

SELECT t.* FROM ( SELECT * FROM film ORDER BY film_id ) t WHERE ROWNUM = 1

What I don’t like about this “old” approach is that we’re relying on the ORDER BY clause of a derived table, which should not guaranteed to be retained in the outer most query in my opinion (although it is, in Oracle, in this case).

So, having the SQL standard syntax is definitely good.

What’s the problem?

Now, while the SQL transformation from FETCH FIRST to ROW_NUMBER() filtering is certainly correct, the execution plan doesn’t really make me happy. Consider the ROWNUM based query:

--------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | |* 1 | COUNT STOPKEY | | | | 2 | VIEW | | 1 | | 3 | TABLE ACCESS BY INDEX ROWID| FILM | 1000 | | 4 | INDEX FULL SCAN | PK_FILM | 1 | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM=1)

And compare that to the FETCH FIRST query:

------------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------------- | 0 | SELECT STATEMENT | | | |* 1 | VIEW | | 1 | |* 2 | WINDOW SORT PUSHED RANK| | 1000 | | 3 | TABLE ACCESS FULL | FILM | 1000 | ------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

The cardinalities are both correct, but one of the queries seems to traverse the entire table to find the top FILM_ID , which the other query found in the index directly. A workaround would be to hint the number of rows to the FETCH FIRST query:

SELECT /*+FIRST_ROWS(1)*/ * FROM film ORDER BY film_id FETCH FIRST 1 ROW ONLY;

… in case of which we’ll get a similar plan as that of the ROWNUM filtering query:

--------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | |* 1 | VIEW | | 1 | |* 2 | WINDOW NOSORT STOPKEY | | 1 | | 3 | TABLE ACCESS BY INDEX ROWID| FILM | 1000 | | 4 | INDEX FULL SCAN | PK_FILM | 1 | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

Measuring this using our measurement technique yields quite devastating results:

Run 1, Statement 1 : 1.11230 -- ROWNUM Run 1, Statement 2 : 1.15508 -- FETCH FIRST +FIRST_ROWS Run 1, Statement 3 : 46.92781 -- FETCH FIRST Run 2, Statement 1 : 1.68449 Run 2, Statement 2 : 1.99465 Run 2, Statement 3 : 47.32620 Run 3, Statement 1 : 1.10428 Run 3, Statement 2 : 1.13904 Run 3, Statement 3 : 68.06417 Run 4, Statement 1 : 1 Run 4, Statement 2 : 6.00535 Run 4, Statement 3 : 44.88235

The above results don’t show any time measurement, but a number relative to the fastest execution (1)

There is a 40x performance difference between the approaches, with ROWNUM based filtering being the fastest, FETCH FIRST plus +FIRST_ROWS hint being slightly slower, and “naked” FETCH FIRST being terribly slow, when repeating the measurement 5x and running each query 10000x on my machine on Oracle 12.2.0.1.0 in Docker.

Things get worse when joining tables. Let’s run a query that fetches the first customer and their full address information:

-- Legacy Oracle syntax SELECT t.* FROM ( SELECT * FROM customer JOIN address USING (address_id) JOIN city USING (city_id) JOIN country USING (country_id) ORDER BY customer_id ) t WHERE ROWNUM = 1; -- Standard syntax with hint SELECT /*+FIRST_ROWS(1)*/ * FROM customer JOIN address USING (address_id) JOIN city USING (city_id) JOIN country USING (country_id) ORDER BY customer_id FETCH FIRST 1 ROW ONLY; -- Standard syntax without hint SELECT * FROM customer JOIN address USING (address_id) JOIN city USING (city_id) JOIN country USING (country_id) ORDER BY customer_id FETCH FIRST 1 ROW ONLY;

The two queries are equivalent, they both produce the same result. Yet, the plans are very different.

Oracle’s legacy syntax

----------------------------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------------------------- | 0 | SELECT STATEMENT | | | |* 1 | COUNT STOPKEY | | | | 2 | VIEW | | 1 | | 3 | NESTED LOOPS | | 1 | | 4 | NESTED LOOPS | | 1 | | 5 | NESTED LOOPS | | 1 | | 6 | NESTED LOOPS | | 1 | | 7 | TABLE ACCESS BY INDEX ROWID| CUSTOMER | 302 | | 8 | INDEX FULL SCAN | PK_CUSTOMER | 1 | | 9 | TABLE ACCESS BY INDEX ROWID| ADDRESS | 1 | |* 10 | INDEX UNIQUE SCAN | PK_ADDRESS | 1 | | 11 | TABLE ACCESS BY INDEX ROWID | CITY | 1 | |* 12 | INDEX UNIQUE SCAN | PK_CITY | 1 | |* 13 | INDEX UNIQUE SCAN | PK_COUNTRY | 1 | | 14 | TABLE ACCESS BY INDEX ROWID | COUNTRY | 1 | ----------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM=1) 10 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID") 12 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID") 13 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

We’re seeing tons of nested loop joins because that’s what we’ll expect given the low cardinalities imposed by the COUNT STOPKEY operation.

SQL standard syntax with hint

----------------------------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------------------------- | 0 | SELECT STATEMENT | | | |* 1 | VIEW | | 1 | |* 2 | WINDOW NOSORT STOPKEY | | 1 | | 3 | NESTED LOOPS | | 1 | | 4 | NESTED LOOPS | | 1 | | 5 | NESTED LOOPS | | 1 | | 6 | NESTED LOOPS | | 1 | | 7 | TABLE ACCESS BY INDEX ROWID| CUSTOMER | 302 | | 8 | INDEX FULL SCAN | PK_CUSTOMER | 1 | | 9 | TABLE ACCESS BY INDEX ROWID| ADDRESS | 1 | |* 10 | INDEX UNIQUE SCAN | PK_ADDRESS | 1 | | 11 | TABLE ACCESS BY INDEX ROWID | CITY | 1 | |* 12 | INDEX UNIQUE SCAN | PK_CITY | 1 | |* 13 | INDEX UNIQUE SCAN | PK_COUNTRY | 1 | | 14 | TABLE ACCESS BY INDEX ROWID | COUNTRY | 1 | ----------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("from$_subquery$_008"."rowlimit_$$_rownumber"<=1) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "CUSTOMER"."CUSTOMER_ID")<=1) 10 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID") 12 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID") 13 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Quite a similar plan

SQL standard syntax without hint

--------------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | | |* 1 | VIEW | | 1 | |* 2 | WINDOW SORT PUSHED RANK | | 599 | |* 3 | HASH JOIN | | 599 | | 4 | TABLE ACCESS FULL | CUSTOMER | 599 | |* 5 | HASH JOIN | | 603 | | 6 | MERGE JOIN | | 600 | | 7 | TABLE ACCESS BY INDEX ROWID| COUNTRY | 109 | | 8 | INDEX FULL SCAN | PK_COUNTRY | 109 | |* 9 | SORT JOIN | | 600 | | 10 | TABLE ACCESS FULL | CITY | 600 | | 11 | TABLE ACCESS FULL | ADDRESS | 603 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("from$_subquery$_008"."rowlimit_$$_rownumber"<=1) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "CUSTOMER"."CUSTOMER_ID")<=1) 3 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID") 5 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID") 9 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID") filter("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Oops. Many HASH JOIN and MERGE JOIN and TABLE ACCESS FULL operations, as well as a WINDOW SORT operation, rather than a WINDOW NOSORT operation. That cannot be good. Let’s measure this again.

Run 1, Statement 1 : 1.26157 -- ROWNUM Run 1, Statement 2 : 1.32394 -- FETCH FIRST +FIRST_ROWS Run 1, Statement 3 : 66.97384 -- FETCH FIRST Run 2, Statement 1 : 1.31992 Run 2, Statement 2 : 1.76459 Run 2, Statement 3 : 72.76056 Run 3, Statement 1 : 1 Run 3, Statement 2 : 1.36419 Run 3, Statement 3 : 74.06439 Run 4, Statement 1 : 1.08451 Run 4, Statement 2 : 1.64990 Run 4, Statement 3 : 66.83702

The difference is even worse: Factor 60x. And make no mistake, these are trivial data set sizes. As we can see in the last execution plan, the cardinality of the CUSTOMER table is 599. This can get much worse for larger tables.

Why even use this syntax?

The SQL standard syntax is much nicer to write, and it allows for nice TOP-N style queries using CROSS APPLY or LATERAL, e.g. to find the TOP 3 longest film titles per actor:

SELECT actor_id, first_name, last_name, title FROM actor a OUTER APPLY ( SELECT /*+FIRST_ROWS(1)*/ title FROM film f JOIN film_actor fa USING (film_id) WHERE fa.actor_id = a.actor_id ORDER BY length(title) DESC FETCH FIRST 3 ROWS ONLY ) t ORDER BY actor_id, length(title) DESC;

This would have been much harder with the ROWNUM approach. In older Oracle versions, it was even impossible, because we could not reference A.ACTOR_ID from doubly nested derived tables / correlated subqueries. Luckily, this is no longer the case. So, syntactically, it is definitely a much better way to do paginated queries or TOP-N queries, but the price is very high.

Disclaimer

The optimiser might make much better choices when:

The base data set is much bigger than the above 600 to 1000 rows “strong” tables.

Indeed, when fetching the first row from the PAYMENT table (with ~16000 rows), the difference becomes smaller or even inexistent:

Run 1, Statement 1 : 1 -- ROWNUM Run 1, Statement 2 : 1.72246 -- FETCH FIRST +FIRST_ROWS Run 1, Statement 3 : 1.76165 -- FETCH FIRST Run 2, Statement 1 : 1.03919 Run 2, Statement 2 : 1.78284 Run 2, Statement 3 : 1.75742 Run 3, Statement 1 : 1.2553 Run 3, Statement 2 : 1.86441 Run 3, Statement 3 : 2.39089 Run 4, Statement 1 : 2.28814 Run 4, Statement 2 : 3.02436 Run 4, Statement 3 : 2.39407 Run 5, Statement 1 : 1.31462 Run 5, Statement 2 : 2.27225 Run 5, Statement 3 : 1.70975

As can be seen, the measurement errors start to outweigh the difference in performance, so the difference isn’t really as significant anymore.

The limit is not 1 or 3, but 10 or 50

When fetching the top 50 rows from the joined customer/address query, the measurements actually changed quite a bit. Suddenly, the ROWNUM query wasn’t optimal anymore and behaved like the un-hinted FETCH FIRST query. Adding a /*+FIRST_ROWS(1)*/ hint (not /*+FIRST_ROWS(50)*/ !) did help:

Run 1, Statement 1 : 1.00545 -- ROWNUM +FIRST_ROWS Run 1, Statement 2 : 7.24842 -- ROWNUM Run 1, Statement 3 : 1.35691 -- FETCH FIRST +FIRST_ROWS Run 1, Statement 4 : 7.15264 -- FETCH FIRST Run 2, Statement 1 : 1.08054 Run 2, Statement 2 : 6.51922 Run 2, Statement 3 : 1.35960 Run 2, Statement 4 : 7.94527 Run 3, Statement 1 : 1.02824 Run 3, Statement 2 : 7.16228 Run 3, Statement 3 : 1.19702 Run 3, Statement 4 : 7.55008 Run 4, Statement 1 : 1.08364 Run 4, Statement 2 : 6.66652 Run 4, Statement 3 : 1.18559 Run 4, Statement 4 : 7.36938 Run 5, Statement 1 : 1 Run 5, Statement 2 : 6.89051 Run 5, Statement 3 : 1.24211 Run 5, Statement 4 : 7.15167

Conclusion

What we’ve seen here is a bit unfortunate. For some cases, one approach is better than the other, performance wise. For others, it’s vice versa. Paginated queries are still a bit tricky for Oracle to get right and we have to measure things explicitly.

Workaround in jOOQ