WHAT ==== Support SQL:99 feature "lateral derived tables". WHY === SQL Server has it (named CROSS APPLY), PG too, Oracle, DB2, and it's not too hard to implement. It gives "one more SQL feature" to MySQL 8.0. Ref: https://bugs.mysql.com/bug.php?id=78930 WHAT, EXACTLY ============= See slides 3-11 in https://www.slideshare.net/MarkusWinand/modern-sql and/or read my examples below. If I have a table of salespeople (one row describes a salesperson), and a table of all_sales (one row describes a sale: salesperson, customer, amount, date), and I want to know, for each salesperson, the biggest deal ever (size & customer), I can try two approaches. 1) First approach: for each salesperson, calculate the maximum (gives the size) and then find the customer which provided this maximum. In MySQL this can be done like this: select salespeople.name, -- find max deal size for this salesperson ( select max(amount) as amount from all_sales where all_sales.salesperson_id = salespeople.id ) as amount, -- find customer for this max size ( select customer_name from all_sales where all_sales.salesperson_id = salespeople.id and all_sales.amount = -- the max size, again ( select max(amount) as amount from all_sales where all_sales.salesperson_id = salespeople.id ) ) as customer_name from salespeople; That calculates the max size twice per salesperson, once in the first subquery and once in the second, which is inefficient. We would like to calculate it once per salesperson and "cache" it. Usually to cache results we can use a derived table, so we try: select salespeople.name, max_sale.amount, customer_of_max_sale.customer_name from salespeople, -- calculate max size and cache it in transient tmp table "max_sale" ( select max(amount) as amount from all_sales where all_sales.salesperson_id = salespeople.id ) as max_sale, -- find customer, reusing cached max size ( select customer_name from all_sales where all_sales.salesperson_id = salespeople.id and all_sales.amount = -- the cached max size max_sale.amount ) as customer_of_max_sale; This is illegal in SQL92 because a derived table cannot depend on other tables in the same FROM clause (it has to be constant over the query's duration). In SQL99 it becomes legal if one adds the keyword LATERAL: select salespeople.name, max_sale.amount, customer_of_max_sale.customer_name from salespeople, -- calculate max size and cache it in transient tmp table "max_sale" LATERAL ( select max(amount) as amount from all_sales where all_sales.salesperson_id = salespeople.id ) as max_sale, -- find customer, reusing cached max size LATERAL ( select customer_name from all_sales where all_sales.salesperson_id = salespeople.id and all_sales.amount = max_sale.amount ) as customer_of_max_sale; which means "this derived table depends on previous tables on its side" (which are... lateral). 2) Second approach: another solution to achieve our goal could be: select salespeople.name, -- find max size and customer in one go ( select amount, customer_name from all_sales where all_sales.salesperson_id = salespeople.id order by amount desc limit 1 ) from salespeople; it is efficient but illegal as a subquery in the SELECT list may not return more than one column, in SQL. We can rewrite to a derived table, which can have two columns: select salespeople.name, max_sale.amount, max_sale.customer_name from salespeople, -- find max size and customer in one go ( select amount, customer_name from all_sales where all_sales.salesperson_id = salespeople.id order by amount desc limit 1 ) as max_sale; but then our derived table is dependent on the salespeople table, which requires support for LATERAL. In short, LATERAL is the efficient solution to all drawbacks in the two tried approaches.

F-1 Support LATERAL keyword, reserved in SQL:99, in front of a derived table's definition. A derived table with a LATERAL keyword can only occur in FROM, either in the comma-separated list of tables, or inside a JOIN part: SELECT ... FROM t1, ..., LATERAL (SELECT ... WHERE t1.a=...) AS alias, t2 LEFT JOIN (t3 JOIN LATERAL (SELECT ...) AS alias2 ON ...) ON ...; There can be several derived tables with a LATERAL keyword for each. F-2 if a derived table is prefixed with the LATERAL keyword, in its definition it can reference columns from tables which precede it in the FROM clause. F-3 if a derived table is prefixed with the LATERAL keyword, it is re-evaluated so that its content is always up-to-date with the referenced outer columns. For example, in F-1, the content of 'alias' has to change every time a new row of t1 is processed by the top query. F-4 if a lateral derived table 'alias' depends on table t1, it is forbidden that there be a LEFT JOIN with 'alias' on left and t1 on right (or a symmetric RIGHT JOIN). For example these should return error: SELECT ... FROM t1 RIGHT JOIN LATERAL (SELECT ... WHERE t1.a=...) AS alias ON 1; SELECT ... FROM LATERAL (SELECT ... WHERE t1.a=...) AS alias LEFT JOIN t1 ON 1; F-5 This feature should not affect views and CTEs, only derived tables: there is no SQL syntax to specify that a view/CTE is "lateral"; but using LATERAL for a derived table inside the definition of a CTE/view is ok. F-6 in EXPLAIN and in optimizer trace it should be visible at which stage in execution a lateral derived table is rematerialized. F-7 in optimizer trace the specific cost calculations for a lateral derived table should be visible. NF-1 depending on its definition, a lateral derived table may be merged in outer query or materialized into a tmp table. NF-2 if a lateral derived table turns out, after analysis, to not contain references to other tables, the query should execute as fast as if the LATERAL word had not been specified, and as fast as MySQL 8.0. NF-3 in agreement with the SQL standard, a table function has an implicit LATERAL, so it behaves as in previous 8.0 versions. However, in agreement with the standard, the LATERAL word isn't allowed before JSON_TABLE, even though it's implicit.

==Syntax specification== The syntax of a lateral derived table is the same as a non-lateral derived table except that the keyword LATERAL is specified before the derived table specification. Example: SELECT ( SELECT 1 FROM t1 JOIN LATERAL (SELECT * FROM t2 WHERE t1.x = t2.x AND t2.y = t0.z ) lateral_derived_table ON true ) FROM t0; Notice the WHERE clause, which references a column from the table t1 in the same query block as where the lateral derived table is defined. Note that a derived table 'dt' may have two types of outer references: 1) outer ref to a table in the same FROM clause as 'dt', if 'dt' has LATERAL; this is the case of t1.x. We'll call them "FROM-clause outer refs". 2) outer ref to a table in any query outer to the query which owns the FROM clause where 'dt' is; this is the case of t0.z. We'll call them "non-FROM-clause outer refs", and supporting them is being done in WL#461. ==Limitations== * Per the SQL standard, if a lateral derived table is in the right operand of a JOIN clause and contains a reference to the left operand, then the join operation must be an INNER JOIN, CROSS JOIN or LEFT OUTER JOIN. * If a lateral derived table 'dt' references an aggregate function, this function's aggregation query cannot be that which owns the FROM clause where 'dt' is. ==Implementation== * For a lateral derived table, the outer name resolution context of the subquery must be the context of the outer query block, so that search in FROM-clause tables is possible; for a non-lateral derived table, the outer context must be the outer query block of the outer query block. * After resolving a lateral derived table, check whether it contains FROM-clause outer refs. It is doesn't, consider it's not lateral. * For a lateral derived table, make sure that the join type with respect to tables of FROM-clause outer refs are correct (inner join and left join allowed). * Both materialization and merging can be used to evaluate lateral derived tables. The same rules as for non-lateral derived tables apply when deciding whether to merge or materialize. * A lateral derived table is marked as "dependent on" the tables of its FROM-clause outer refs (by setting TABLE_LIST::dep_tables); this makes sure that it is put at a right place in the optimizer's plan. * A lateral derived table cannot use join buffering, as it depends on the row of a previous table. * A lateral derived table cannot be marked as "const" by the optimizer. * The lateral derived table is evaluated for each row of tables it has references to. * For a materialized lateral derived, make sure to empty and materialize it just before it is read by the join executor (in QEP_TAB::prepare_scan()). For a materialized derived table 'dt' having only non-FROM-clause outer references, empty it only when the subquery owning 'dt' starts executing, and then materialize (thus, once per subquery execution) it in QEP_TAB::prepare_scan(). For a materialized derived table having no outer ref, never empty it, and materialize it (thus, once for all) in QEP_TAB::prepare_scan().