Historically we've always materialized the full output of a CTE query,

treating WITH as an optimization fence (so that, for example, restrictions

from the outer query cannot be pushed into it). This is appropriate when

the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE

query is non-recursive and side-effect-free, there's no hazard of changing

the query results by pushing restrictions down.



Another argument for materialization is that it can avoid duplicate

computation of an expensive WITH query --- but that only applies if

the WITH query is called more than once in the outer query. Even then

it could still be a net loss, if each call has restrictions that

would allow just a small part of the WITH query to be computed.



Hence, let's change the behavior for WITH queries that are non-recursive

and side-effect-free. By default, we will inline them into the outer

query (removing the optimization fence) if they are called just once.

If they are called more than once, we will keep the old behavior by

default, but the user can override this and force inlining by specifying

NOT MATERIALIZED. Lastly, the user can force the old behavior by

specifying MATERIALIZED; this would mainly be useful when the query had

deliberately been employing WITH as an optimization fence to prevent a

poor choice of plan.



Andreas Karlsson, Andrew Gierth, David Fetter



Discussion: https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk





23 files changed:

b/contrib/postgres_fdw/expected/postgres_fdw.out diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out (file)

index 029552b .. 42108bd 100644 --- a/ contrib/postgres_fdw/expected/postgres_fdw.out +++ b/ contrib/postgres_fdw/expected/postgres_fdw.out @@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t -- join in CTE EXPLAIN (VERBOSE, COSTS OFF) -WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; +WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Limit @@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 Output: t.c1_1, t.c2_1, t.c1_3 (12 rows) -WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; +WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; c1_1 | c2_1 ------+------ 101 | 101

b/contrib/postgres_fdw/sql/postgres_fdw.sql diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql (file)

index bab8479 .. eb9d1ad 100644 --- a/ contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/ contrib/postgres_fdw/sql/postgres_fdw.sql @@ -493,8 +493,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE; -- join in CTE EXPLAIN (VERBOSE, COSTS OFF) -WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; -WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; +WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; +WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10; -- ctid with whole-row reference EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;

b/doc/src/sgml/queries.sgml diff --git a/doc/src/sgml/queries.sgml (file)

index 88bc189 .. 2225255 100644 --- a/ doc/src/sgml/queries.sgml +++ b/ doc/src/sgml/queries.sgml @@ -2199,22 +2199,94 @@ SELECT n FROM t LIMIT 100; </para> <para> - A useful property of <literal>WITH</literal> queries is that they are evaluated - only once per execution of the parent query, even if they are referred to - more than once by the parent query or sibling <literal>WITH</literal> queries. + A useful property of <literal>WITH</literal> queries is that they are + normally evaluated only once per execution of the parent query, even if + they are referred to more than once by the parent query or + sibling <literal>WITH</literal> queries. Thus, expensive calculations that are needed in multiple places can be placed within a <literal>WITH</literal> query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. - However, the other side of this coin is that the optimizer is less able to - push restrictions from the parent query down into a <literal>WITH</literal> query - than an ordinary subquery. The <literal>WITH</literal> query will generally be + However, the other side of this coin is that the optimizer is not able to + push restrictions from the parent query down into a multiply-referenced + <literal>WITH</literal> query, since that might affect all uses of the + <literal>WITH</literal> query's output when it should affect only one. + The multiply-referenced <literal>WITH</literal> query will be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.) </para> + <para> + However, if a <literal>WITH</literal> query is non-recursive and + side-effect-free (that is, it is a <literal>SELECT</literal> containing + no volatile functions) then it can be folded into the parent query, + allowing joint optimization of the two query levels. By default, this + happens if the parent query references the <literal>WITH</literal> query + just once, but not if it references the <literal>WITH</literal> query + more than once. You can override that decision by + specifying <literal>MATERIALIZED</literal> to force separate calculation + of the <literal>WITH</literal> query, or by specifying <literal>NOT + MATERIALIZED</literal> to force it to be merged into the parent query. + The latter choice risks duplicate computation of + the <literal>WITH</literal> query, but it can still give a net savings if + each usage of the <literal>WITH</literal> query needs only a small part + of the <literal>WITH</literal> query's full output. + </para> + + <para> + A simple example of these rules is +<programlisting> +WITH w AS ( + SELECT * FROM big_table +) +SELECT * FROM w WHERE key = 123; +</programlisting> + This <literal>WITH</literal> query will be folded, producing the same + execution plan as +<programlisting> +SELECT * FROM big_table WHERE key = 123; +</programlisting> + In particular, if there's an index on <structfield>key</structfield>, + it will probably be used to fetch just the rows having <literal>key = + 123</literal>. On the other hand, in +<programlisting> +WITH w AS ( + SELECT * FROM big_table +) +SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref +WHERE w2.key = 123; +</programlisting> + the <literal>WITH</literal> query will be materialized, producing a + temporary copy of <structname>big_table</structname> that is then + joined with itself — without benefit of any index. This query + will be executed much more efficiently if written as +<programlisting> +WITH w AS NOT MATERIALIZED ( + SELECT * FROM big_table +) +SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref +WHERE w2.key = 123; +</programlisting> + so that the parent query's restrictions can be applied directly + to scans of <structname>big_table</structname>. + </para> + + <para> + An example where <literal>NOT MATERIALIZED</literal> could be + undesirable is +<programlisting> +WITH w AS ( + SELECT key, very_expensive_function(val) as f FROM some_table +) +SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f; +</programlisting> + Here, materialization of the <literal>WITH</literal> query ensures + that <function>very_expensive_function</function> is evaluated only + once per table row, not twice. + </para> + <para> The examples above only show <literal>WITH</literal> being used with <command>SELECT</command>, but it can be attached in the same way to

b/doc/src/sgml/ref/select.sgml diff --git a/doc/src/sgml/ref/select.sgml (file)

index 4db8142 .. 06d611b 100644 --- a/ doc/src/sgml/ref/select.sgml +++ b/ doc/src/sgml/ref/select.sgml @@ -72,7 +72,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac <phrase>and <replaceable class="parameter">with_query</replaceable> is:</phrase> - <replaceable class="parameter">with_query_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] AS ( <replaceable class="parameter">select</replaceable> | <replaceable class="parameter">values</replaceable> | <replaceable class="parameter">insert</replaceable> | <replaceable class="parameter">update</replaceable> | <replaceable class="parameter">delete</replaceable> ) + <replaceable class="parameter">with_query_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] AS [ [ NOT ] MATERIALIZED ] ( <replaceable class="parameter">select</replaceable> | <replaceable class="parameter">values</replaceable> | <replaceable class="parameter">insert</replaceable> | <replaceable class="parameter">update</replaceable> | <replaceable class="parameter">delete</replaceable> ) TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] </synopsis> @@ -93,7 +93,8 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] These effectively serve as temporary tables that can be referenced in the <literal>FROM</literal> list. A <literal>WITH</literal> query that is referenced more than once in <literal>FROM</literal> is - computed only once. + computed only once, + unless specified otherwise with <literal>NOT MATERIALIZED</literal>. (See <xref linkend="sql-with" endterm="sql-with-title"/> below.) </para> </listitem> @@ -272,9 +273,18 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] that are earlier in the <literal>WITH</literal> list. </para> + <para> + The primary query and the <literal>WITH</literal> queries are all + (notionally) executed at the same time. This implies that the effects of + a data-modifying statement in <literal>WITH</literal> cannot be seen from + other parts of the query, other than by reading its <literal>RETURNING</literal> + output. If two such data-modifying statements attempt to modify the same + row, the results are unspecified. + </para> + <para> A key property of <literal>WITH</literal> queries is that they - are evaluated only once per execution of the primary query, + are normally evaluated only once per execution of the primary query, even if the primary query refers to them more than once. In particular, data-modifying statements are guaranteed to be executed once and only once, regardless of whether the primary query @@ -282,12 +292,35 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] </para> <para> - The primary query and the <literal>WITH</literal> queries are all - (notionally) executed at the same time. This implies that the effects of - a data-modifying statement in <literal>WITH</literal> cannot be seen from - other parts of the query, other than by reading its <literal>RETURNING</literal> - output. If two such data-modifying statements attempt to modify the same - row, the results are unspecified. + However, a <literal>WITH</literal> query can be marked + <literal>NOT MATERIALIZED</literal> to remove this guarantee. In that + case, the <literal>WITH</literal> query can be folded into the primary + query much as though it were a simple sub-<literal>SELECT</literal> in + the primary query's <literal>FROM</literal> clause. This results in + duplicate computations if the primary query refers to + that <literal>WITH</literal> query more than once; but if each such use + requires only a few rows of the <literal>WITH</literal> query's total + output, <literal>NOT MATERIALIZED</literal> can provide a net savings by + allowing the queries to be optimized jointly. + <literal>NOT MATERIALIZED</literal> is ignored if it is attached to + a <literal>WITH</literal> query that is recursive or is not + side-effect-free (i.e., is not a plain <literal>SELECT</literal> + containing no volatile functions). + </para> + + <para> + By default, a side-effect-free <literal>WITH</literal> query is folded + into the primary query if it is used exactly once in the primary + query's <literal>FROM</literal> clause. This allows joint optimization + of the two query levels in situations where that should be semantically + invisible. However, such folding can be prevented by marking the + <literal>WITH</literal> query as <literal>MATERIALIZED</literal>. + That might be useful, for example, if the <literal>WITH</literal> query + is being used as an optimization fence to prevent the planner from + choosing a bad plan. + <productname>PostgreSQL</productname> versions before v12 never did + such folding, so queries written for older versions might rely on + <literal>WITH</literal> to act as an optimization fence. </para> <para> @@ -2087,6 +2120,12 @@ SELECT distributors.* WHERE distributors.name = 'Westward'; <para> <literal>ROWS FROM( ... )</literal> is an extension of the SQL standard. </para> + + <para> + The <literal>MATERIALIZED</literal> and <literal>NOT + MATERIALIZED</literal> options of <literal>WITH</literal> are extensions + of the SQL standard. + </para> </refsect2> </refsect1>

b/src/backend/optimizer/plan/planner.c diff --git a/src/backend/optimizer/plan/planner.c (file)

index ddb86bd .. 4e52c42 100644 --- a/ src/backend/optimizer/plan/planner.c +++ b/ src/backend/optimizer/plan/planner.c @@ -646,8 +646,8 @@ subquery_planner(PlannerGlobal *glob, Query *parse, root->partColsUpdated = false; /* - * If there is a WITH list, process each WITH query and build an initplan - * SubPlan structure for it. + * If there is a WITH list, process each WITH query and either convert it + * to RTE_SUBQUERY RTE(s) or build an initplan SubPlan structure for it. */ if (parse->cteList) SS_process_ctes(root);

b/src/backend/optimizer/plan/subselect.c diff --git a/src/backend/optimizer/plan/subselect.c (file)

index 915c6d0 .. c721054 100644 --- a/ src/backend/optimizer/plan/subselect.c +++ b/ src/backend/optimizer/plan/subselect.c @@ -57,6 +57,14 @@ typedef struct finalize_primnode_context Bitmapset *paramids; /* Non-local PARAM_EXEC paramids found */ } finalize_primnode_context; +typedef struct inline_cte_walker_context +{ + const char *ctename; /* name and relative level of target CTE */ + int levelsup; + int refcount; /* number of remaining references */ + Query *ctequery; /* query to substitute */ +} inline_cte_walker_context; + static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot, List *plan_params, @@ -75,6 +83,10 @@ static Node *convert_testexpr_mutator(Node *node, static bool subplan_is_hashable(Plan *plan); static bool testexpr_is_hashable(Node *testexpr); static bool hash_ok_operator(OpExpr *expr); +static bool contain_dml(Node *node); +static bool contain_dml_walker(Node *node, void *context); +static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); +static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -804,10 +816,13 @@ hash_ok_operator(OpExpr *expr) /* * SS_process_ctes: process a query's WITH list * - * We plan each interesting WITH item and convert it to an initplan. + * Consider each CTE in the WITH list and either ignore it (if it's an + * unreferenced SELECT), "inline" it to create a regular sub-SELECT-in-FROM, + * or convert it to an initplan. + * * A side effect is to fill in root->cte_plan_ids with a list that * parallels root->parse->cteList and provides the subplan ID for - * each CTE's initplan. + * each CTE's initplan , or a dummy ID (-1) if we didn't make an initplan . */ void SS_process_ctes(PlannerInfo *root) @@ -838,6 +853,44 @@ SS_process_ctes(PlannerInfo *root) continue; } + /* + * Consider inlining the CTE (creating RTE_SUBQUERY RTE(s)) instead of + * implementing it as a separately-planned CTE. + * + * We cannot inline if any of these conditions hold: + * + * 1. The user said not to (the CTEMaterializeAlways option). + * + * 2. The CTE is recursive. + * + * 3. The CTE has side-effects; this includes either not being a plain + * SELECT, or containing volatile functions. Inlining might change + * the side-effects, which would be bad. + * + * Otherwise, we have an option whether to inline or not. That should + * always be a win if there's just a single reference, but if the CTE + * is multiply-referenced then it's unclear: inlining adds duplicate + * computations, but the ability to absorb restrictions from the outer + * query level could outweigh that. We do not have nearly enough + * information at this point to tell whether that's true, so we let + * the user express a preference. Our default behavior is to inline + * only singly-referenced CTEs, but a CTE marked CTEMaterializeNever + * will be inlined even if multiply referenced. + */ + if ((cte->ctematerialized == CTEMaterializeNever || + (cte->ctematerialized == CTEMaterializeDefault && + cte->cterefcount == 1)) && + !cte->cterecursive && + cmdType == CMD_SELECT && + !contain_dml(cte->ctequery) && + !contain_volatile_functions(cte->ctequery)) + { + inline_cte(root, cte); + /* Make a dummy entry in cte_plan_ids */ + root->cte_plan_ids = lappend_int(root->cte_plan_ids, -1); + continue; + } + /* * Copy the source Query node. Probably not necessary, but let's keep * this similar to make_subplan. @@ -934,6 +987,127 @@ SS_process_ctes(PlannerInfo *root) } } +/* + * contain_dml: is any subquery not a plain SELECT? + * + * We reject SELECT FOR UPDATE/SHARE as well as INSERT etc. + */ +static bool +contain_dml(Node *node) +{ + return contain_dml_walker(node, NULL); +} + +static bool +contain_dml_walker(Node *node, void *context) +{ + if (node == NULL) + return false; + if (IsA(node, Query)) + { + Query *query = (Query *) node; + + if (query->commandType != CMD_SELECT || + query->rowMarks != NIL) + return true; + + return query_tree_walker(query, contain_dml_walker, context, 0); + } + return expression_tree_walker(node, contain_dml_walker, context); +} + +/* + * inline_cte: convert RTE_CTE references to given CTE into RTE_SUBQUERYs + */ +static void +inline_cte(PlannerInfo *root, CommonTableExpr *cte) +{ + struct inline_cte_walker_context context; + + context.ctename = cte->ctename; + /* Start at levelsup = -1 because we'll immediately increment it */ + context.levelsup = -1; + context.refcount = cte->cterefcount; + context.ctequery = castNode(Query, cte->ctequery); + + (void) inline_cte_walker((Node *) root->parse, &context); + + /* Assert we replaced all references */ + Assert(context.refcount == 0); +} + +static bool +inline_cte_walker(Node *node, inline_cte_walker_context *context) +{ + if (node == NULL) + return false; + if (IsA(node, Query)) + { + Query *query = (Query *) node; + + context->levelsup++; + + /* + * Visit the query's RTE nodes after their contents; otherwise + * query_tree_walker would descend into the newly inlined CTE query, + * which we don't want. + */ + (void) query_tree_walker(query, inline_cte_walker, context, + QTW_EXAMINE_RTES_AFTER); + + context->levelsup--; + + return false; + } + else if (IsA(node, RangeTblEntry)) + { + RangeTblEntry *rte = (RangeTblEntry *) node; + + if (rte->rtekind == RTE_CTE && + strcmp(rte->ctename, context->ctename) == 0 && + rte->ctelevelsup == context->levelsup) + { + /* + * Found a reference to replace. Generate a copy of the CTE query + * with appropriate level adjustment for outer references (e.g., + * to other CTEs). + */ + Query *newquery = copyObject(context->ctequery); + + if (context->levelsup > 0) + IncrementVarSublevelsUp((Node *) newquery, context->levelsup, 1); + + /* + * Convert the RTE_CTE RTE into a RTE_SUBQUERY. + * + * Historically, a FOR UPDATE clause has been treated as extending + * into views and subqueries, but not into CTEs. We preserve this + * distinction by not trying to push rowmarks into the new + * subquery. + */ + rte->rtekind = RTE_SUBQUERY; + rte->subquery = newquery; + rte->security_barrier = false; + + /* Zero out CTE-specific fields */ + rte->ctename = NULL; + rte->ctelevelsup = 0; + rte->self_reference = false; + rte->coltypes = NIL; + rte->coltypmods = NIL; + rte->colcollations = NIL; + + /* Count the number of replacements we've done */ + context->refcount--; + } + + return false; + } + + return expression_tree_walker(node, inline_cte_walker, context); +} + + /* * convert_ANY_sublink_to_join: try to convert an ANY SubLink to a join *

b/src/backend/parser/gram.y diff --git a/src/backend/parser/gram.y (file)

index 7bb2b0b .. a68f78e 100644 --- a/ src/backend/parser/gram.y +++ b/ src/backend/parser/gram.y @@ -479,7 +479,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <list> row explicit_row implicit_row type_list array_expr_list %type <node> case_expr case_arg when_clause case_default %type <list> when_clause_list -%type <ival> sub_type +%type <ival> sub_type opt_materialized %type <value> NumericOnly %type <list> NumericOnly_list %type <alias> alias_clause opt_alias_clause @@ -11344,17 +11344,24 @@ cte_list: | cte_list ',' common_table_expr { $$ = lappend($1, $3); } ; -common_table_expr: name opt_name_list AS '(' PreparableStmt ')' +common_table_expr: name opt_name_list AS opt_materialized '(' PreparableStmt ')' { CommonTableExpr *n = makeNode(CommonTableExpr); n->ctename = $1; n->aliascolnames = $2; - n->ctequery = $5; + n->ctematerialized = $4; + n->ctequery = $6; n->location = @1; $$ = (Node *) n; } ; +opt_materialized: + MATERIALIZED { $$ = CTEMaterializeAlways; } + | NOT MATERIALIZED { $$ = CTEMaterializeNever; } + | /*EMPTY*/ { $$ = CTEMaterializeDefault; } + ; + opt_with_clause: with_clause { $$ = $1; } | /*EMPTY*/ { $$ = NULL; } @@ -16237,6 +16244,7 @@ makeRecursiveViewSelect(char *relname, List *aliases, Node *query) /* create common table expression */ cte->ctename = relname; cte->aliascolnames = aliases; + cte->ctematerialized = CTEMaterializeDefault; cte->ctequery = query; cte->location = -1;

b/src/include/nodes/parsenodes.h diff --git a/src/include/nodes/parsenodes.h (file)

index 2fe14d7 .. 6d035a0 100644 --- a/ src/include/nodes/parsenodes.h +++ b/ src/include/nodes/parsenodes.h @@ -1402,11 +1402,19 @@ typedef struct OnConflictClause * * We don't currently support the SEARCH or CYCLE clause. */ +typedef enum CTEMaterialize +{ + CTEMaterializeDefault, /* no option specified */ + CTEMaterializeAlways, /* MATERIALIZED */ + CTEMaterializeNever /* NOT MATERIALIZED */ +} CTEMaterialize; + typedef struct CommonTableExpr { NodeTag type; char *ctename; /* query name (never qualified) */ List *aliascolnames; /* optional list of column names */ + CTEMaterialize ctematerialized; /* is this an optimization fence? */ /* SelectStmt/InsertStmt/etc before parse analysis, Query afterwards: */ Node *ctequery; /* the CTE's subquery */ int location; /* token location, or -1 if unknown */

b/src/test/regress/expected/rowsecurity.out diff --git a/src/test/regress/expected/rowsecurity.out (file)

index 1d12b01 .. 2e17049 100644 --- a/ src/test/regress/expected/rowsecurity.out +++ b/ src/test/regress/expected/rowsecurity.out @@ -2179,7 +2179,7 @@ EXPLAIN (COSTS OFF) EXECUTE plancache_test; Filter: (((a % 2) = 0) AND f_leak(b)) (2 rows) -PREPARE plancache_test2 AS WITH q AS (SELECT * FROM z1 WHERE f_leak(b)) SELECT * FROM q,z2; +PREPARE plancache_test2 AS WITH q AS MATERIALIZED (SELECT * FROM z1 WHERE f_leak(b)) SELECT * FROM q,z2; EXPLAIN (COSTS OFF) EXECUTE plancache_test2; QUERY PLAN ------------------------------------------------- @@ -2192,7 +2192,7 @@ EXPLAIN (COSTS OFF) EXECUTE plancache_test2; -> Seq Scan on z2 (7 rows) -PREPARE plancache_test3 AS WITH q AS (SELECT * FROM z2) SELECT * FROM q,z1 WHERE f_leak(z1.b); +PREPARE plancache_test3 AS WITH q AS MATERIALIZED (SELECT * FROM z2) SELECT * FROM q,z1 WHERE f_leak(z1.b); EXPLAIN (COSTS OFF) EXECUTE plancache_test3; QUERY PLAN ----------------------------------------------------- @@ -2826,7 +2826,7 @@ ALTER TABLE t1 ENABLE ROW LEVEL SECURITY; GRANT ALL ON t1 TO regress_rls_bob; INSERT INTO t1 (SELECT x, md5(x::text) FROM generate_series(0,20) x); SET SESSION AUTHORIZATION regress_rls_bob; -WITH cte1 AS (SELECT * FROM t1 WHERE f_leak(b)) SELECT * FROM cte1; +WITH cte1 AS MATERIALIZED (SELECT * FROM t1 WHERE f_leak(b)) SELECT * FROM cte1; NOTICE: f_leak => cfcd208495d565ef66e7dff9f98764da NOTICE: f_leak => c81e728d9d4c2f636f067f89cc14862c NOTICE: f_leak => a87ff679a2f3e71d9181a67b7542122c @@ -2853,7 +2853,8 @@ NOTICE: f_leak => 98f13708210194c475687be6106a3b84 20 | 98f13708210194c475687be6106a3b84 (11 rows) -EXPLAIN (COSTS OFF) WITH cte1 AS (SELECT * FROM t1 WHERE f_leak(b)) SELECT * FROM cte1; +EXPLAIN (COSTS OFF) +WITH cte1 AS MATERIALIZED (SELECT * FROM t1 WHERE f_leak(b)) SELECT * FROM cte1; QUERY PLAN ------------------------------------------------- CTE Scan on cte1

b/src/test/regress/expected/rowtypes.out diff --git a/src/test/regress/expected/rowtypes.out (file)

index 6ff2fd3 .. ffccaa5 100644 --- a/ src/test/regress/expected/rowtypes.out +++ b/ src/test/regress/expected/rowtypes.out @@ -1147,7 +1147,7 @@ from (values (1,row(1,2)), (1,row(null,null)), (1,null), (6 rows) explain (verbose, costs off) -with r(a,b) as +with r(a,b) as materialized (values (1,row(1,2)), (1,row(null,null)), (1,null), (null,row(1,2)), (null,row(null,null)), (null,null) ) select r, r is null as isnull, r is not null as isnotnull from r; @@ -1160,7 +1160,7 @@ select r, r is null as isnull, r is not null as isnotnull from r; Output: "*VALUES*".column1, "*VALUES*".column2 (5 rows) -with r(a,b) as +with r(a,b) as materialized (values (1,row(1,2)), (1,row(null,null)), (1,null), (null,row(1,2)), (null,row(null,null)), (null,null) ) select r, r is null as isnull, r is not null as isnotnull from r;

b/src/test/regress/expected/rules.out diff --git a/src/test/regress/expected/rules.out (file)

index b31594a .. 95b8a95 100644 --- a/ src/test/regress/expected/rules.out +++ b/ src/test/regress/expected/rules.out @@ -3110,7 +3110,7 @@ explain (costs off) INSERT INTO hats VALUES ('h8', 'forbidden') RETURNING *; (5 rows) -- ensure upserting into a rule, with a CTE (different offsets!) works -WITH data(hat_name, hat_color) AS ( +WITH data(hat_name, hat_color) AS MATERIALIZED ( VALUES ('h8', 'green'), ('h9', 'blue'), ('h7', 'forbidden') @@ -3124,7 +3124,8 @@ RETURNING *; h9 | blue (2 rows) -EXPLAIN (costs off) WITH data(hat_name, hat_color) AS ( +EXPLAIN (costs off) +WITH data(hat_name, hat_color) AS MATERIALIZED ( VALUES ('h8', 'green'), ('h9', 'blue'), ('h7', 'forbidden')

b/src/test/regress/expected/subselect.out diff --git a/src/test/regress/expected/subselect.out (file)

index 6e238e8 .. cc3f5f3 100644 --- a/ src/test/regress/expected/subselect.out +++ b/ src/test/regress/expected/subselect.out @@ -1154,3 +1154,159 @@ fetch backward all in c1; (2 rows) commit; +-- +-- Tests for CTE inlining behavior +-- +-- Basic subquery that can be inlined +explain (verbose, costs off) +with x as (select * from (select f1 from subselect_tbl) ss) +select * from x where f1 = 1; + QUERY PLAN +---------------------------------- + Seq Scan on public.subselect_tbl + Output: subselect_tbl.f1 + Filter: (subselect_tbl.f1 = 1) +(3 rows) + +-- Explicitly request materialization +explain (verbose, costs off) +with x as materialized (select * from (select f1 from subselect_tbl) ss) +select * from x where f1 = 1; + QUERY PLAN +------------------------------------------ + CTE Scan on x + Output: x.f1 + Filter: (x.f1 = 1) + CTE x + -> Seq Scan on public.subselect_tbl + Output: subselect_tbl.f1 +(6 rows) + +-- Stable functions are safe to inline +explain (verbose, costs off) +with x as (select * from (select f1, now() from subselect_tbl) ss) +select * from x where f1 = 1; + QUERY PLAN +----------------------------------- + Seq Scan on public.subselect_tbl + Output: subselect_tbl.f1, now() + Filter: (subselect_tbl.f1 = 1) +(3 rows) + +-- Volatile functions prevent inlining +explain (verbose, costs off) +with x as (select * from (select f1, random() from subselect_tbl) ss) +select * from x where f1 = 1; + QUERY PLAN +---------------------------------------------- + CTE Scan on x + Output: x.f1, x.random + Filter: (x.f1 = 1) + CTE x + -> Seq Scan on public.subselect_tbl + Output: subselect_tbl.f1, random() +(6 rows) + +-- SELECT FOR UPDATE cannot be inlined +explain (verbose, costs off) +with x as (select * from (select f1 from subselect_tbl for update) ss) +select * from x where f1 = 1; + QUERY PLAN +-------------------------------------------------------------------- + CTE Scan on x + Output: x.f1 + Filter: (x.f1 = 1) + CTE x + -> Subquery Scan on ss + Output: ss.f1 + -> LockRows + Output: subselect_tbl.f1, subselect_tbl.ctid + -> Seq Scan on public.subselect_tbl + Output: subselect_tbl.f1, subselect_tbl.ctid +(10 rows) + +-- Multiply-referenced CTEs are inlined only when requested +explain (verbose, costs off) +with x as (select * from (select f1, now() as n from subselect_tbl) ss) +select * from x, x x2 where x.n = x2.n; + QUERY PLAN +------------------------------------------- + Merge Join + Output: x.f1, x.n, x2.f1, x2.n + Merge Cond: (x.n = x2.n) + CTE x + -> Seq Scan on public.subselect_tbl + Output: subselect_tbl.f1, now() + -> Sort + Output: x.f1, x.n + Sort Key: x.n + -> CTE Scan on x + Output: x.f1, x.n + -> Sort + Output: x2.f1, x2.n + Sort Key: x2.n + -> CTE Scan on x x2 + Output: x2.f1, x2.n +(16 rows) + +explain (verbose, costs off) +with x as not materialized (select * from (select f1, now() as n from subselect_tbl) ss) +select * from x, x x2 where x.n = x2.n; + QUERY PLAN +---------------------------------------------------------------------------- + Result + Output: subselect_tbl.f1, now(), subselect_tbl_1.f1, now() + One-Time Filter: (now() = now()) + -> Nested Loop + Output: subselect_tbl.f1, subselect_tbl_1.f1 + -> Seq Scan on public.subselect_tbl + Output: subselect_tbl.f1, subselect_tbl.f2, subselect_tbl.f3 + -> Materialize + Output: subselect_tbl_1.f1 + -> Seq Scan on public.subselect_tbl subselect_tbl_1 + Output: subselect_tbl_1.f1 +(11 rows) + +-- Check handling of outer references +explain (verbose, costs off) +with x as (select * from int4_tbl) +select * from (with y as (select * from x) select * from y) ss; + QUERY PLAN +----------------------------- + Seq Scan on public.int4_tbl + Output: int4_tbl.f1 +(2 rows) + +explain (verbose, costs off) +with x as materialized (select * from int4_tbl) +select * from (with y as (select * from x) select * from y) ss; + QUERY PLAN +------------------------------------- + CTE Scan on x + Output: x.f1 + CTE x + -> Seq Scan on public.int4_tbl + Output: int4_tbl.f1 +(5 rows) + +-- Ensure that we inline the currect CTE when there are +-- multiple CTEs with the same name +explain (verbose, costs off) +with x as (select 1 as y) +select * from (with x as (select 2 as y) select * from x) ss; + QUERY PLAN +------------- + Result + Output: 2 +(2 rows) + +-- Row marks are not pushed into CTEs +explain (verbose, costs off) +with x as (select * from subselect_tbl) +select * from x for update; + QUERY PLAN +---------------------------------------------------------------- + Seq Scan on public.subselect_tbl + Output: subselect_tbl.f1, subselect_tbl.f2, subselect_tbl.f3 +(2 rows) +

b/src/test/regress/sql/rowsecurity.sql diff --git a/src/test/regress/sql/rowsecurity.sql (file)

index 38e9b38 .. 52da276 100644 --- a/ src/test/regress/sql/rowsecurity.sql +++ b/ src/test/regress/sql/rowsecurity.sql @@ -840,10 +840,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM z1 WHERE f_leak(b); PREPARE plancache_test AS SELECT * FROM z1 WHERE f_leak(b); EXPLAIN (COSTS OFF) EXECUTE plancache_test; -PREPARE plancache_test2 AS WITH q AS (SELECT * FROM z1 WHERE f_leak(b)) SELECT * FROM q,z2; +PREPARE plancache_test2 AS WITH q AS MATERIALIZED (SELECT * FROM z1 WHERE f_leak(b)) SELECT * FROM q,z2; EXPLAIN (COSTS OFF) EXECUTE plancache_test2; -PREPARE plancache_test3 AS WITH q AS (SELECT * FROM z2) SELECT * FROM q,z1 WHERE f_leak(z1.b); +PREPARE plancache_test3 AS WITH q AS MATERIALIZED (SELECT * FROM z2) SELECT * FROM q,z1 WHERE f_leak(z1.b); EXPLAIN (COSTS OFF) EXECUTE plancache_test3; SET ROLE regress_rls_group1; @@ -1071,8 +1071,9 @@ INSERT INTO t1 (SELECT x, md5(x::text) FROM generate_series(0,20) x); SET SESSION AUTHORIZATION regress_rls_bob; -WITH cte1 AS (SELECT * FROM t1 WHERE f_leak(b)) SELECT * FROM cte1; -EXPLAIN (COSTS OFF) WITH cte1 AS (SELECT * FROM t1 WHERE f_leak(b)) SELECT * FROM cte1; +WITH cte1 AS MATERIALIZED (SELECT * FROM t1 WHERE f_leak(b)) SELECT * FROM cte1; +EXPLAIN (COSTS OFF) +WITH cte1 AS MATERIALIZED (SELECT * FROM t1 WHERE f_leak(b)) SELECT * FROM cte1; WITH cte1 AS (UPDATE t1 SET a = a + 1 RETURNING *) SELECT * FROM cte1; --fail WITH cte1 AS (UPDATE t1 SET a = a RETURNING *) SELECT * FROM cte1; --ok

b/src/test/regress/sql/rowtypes.sql diff --git a/src/test/regress/sql/rowtypes.sql (file)

index ea93347 .. 45f60e8 100644 --- a/ src/test/regress/sql/rowtypes.sql +++ b/ src/test/regress/sql/rowtypes.sql @@ -461,12 +461,12 @@ from (values (1,row(1,2)), (1,row(null,null)), (1,null), (null,row(1,2)), (null,row(null,null)), (null,null) ) r(a,b); explain (verbose, costs off) -with r(a,b) as +with r(a,b) as materialized (values (1,row(1,2)), (1,row(null,null)), (1,null), (null,row(1,2)), (null,row(null,null)), (null,null) ) select r, r is null as isnull, r is not null as isnotnull from r; -with r(a,b) as +with r(a,b) as materialized (values (1,row(1,2)), (1,row(null,null)), (1,null), (null,row(1,2)), (null,row(null,null)), (null,null) ) select r, r is null as isnull, r is not null as isnotnull from r;

b/src/test/regress/sql/rules.sql diff --git a/src/test/regress/sql/rules.sql (file)

index f4ee30e .. 6666951 100644 --- a/ src/test/regress/sql/rules.sql +++ b/ src/test/regress/sql/rules.sql @@ -1132,7 +1132,7 @@ SELECT tablename, rulename, definition FROM pg_rules explain (costs off) INSERT INTO hats VALUES ('h8', 'forbidden') RETURNING *; -- ensure upserting into a rule, with a CTE (different offsets!) works -WITH data(hat_name, hat_color) AS ( +WITH data(hat_name, hat_color) AS MATERIALIZED ( VALUES ('h8', 'green'), ('h9', 'blue'), ('h7', 'forbidden') @@ -1140,7 +1140,8 @@ WITH data(hat_name, hat_color) AS ( INSERT INTO hats SELECT * FROM data RETURNING *; -EXPLAIN (costs off) WITH data(hat_name, hat_color) AS ( +EXPLAIN (costs off) +WITH data(hat_name, hat_color) AS MATERIALIZED ( VALUES ('h8', 'green'), ('h9', 'blue'), ('h7', 'forbidden')