Hi Ask Tom Team,

Recently I came across MATCH_RECOGNIZE clause. I checked the Oracle documentation but it's not easy to digest.

Looking at OLL webinars, youtube videos and some of Ketih's article on oracle blog helped me understanding a little of it.

In oracle training as well there seems nothing on it so i thought to as here.



Is it possible for you share some light on it by giving less complex examples or any series of videos like you did on Analytical functions.



Oracle documentation syntax:



table_reference ::=

{only (query_table_expression) | query_table_expression }[flashback_query_clause]

[pivot_clause|unpivot_clause|row_pattern_recognition_clause] [t_alias]



row_pattern_recognition_clause ::=

MATCH_RECOGNIZE (

[row_pattern_partition_by ]

[row_pattern_order_by ]

[row_pattern_measures ]

[row_pattern_rows_per_match ]

[row_pattern_skip_to ]

PATTERN (row_pattern)

[ row_pattern_subset_clause]

DEFINE row_pattern_definition_list

)



row_pattern_partition_by ::=

PARTITION BY column[, column]...



row_pattern_order_by ::=

ORDER BY column[, column]...



row_pattern_measures ::=

MEASURES row_pattern_measure_column[, row_pattern_measure_column]...



row_pattern_measure_column ::=

expression AS c_alias



row_pattern_rows_per_match ::=

ONE ROW PER MATCH

| ALL ROWS PER MATCH



row_pattern_skip_to ::=

AFTER MATCH {

SKIP TO NEXT ROW

| SKIP PAST LAST ROW

| SKIP TO FIRST variable_name

| SKIP TO LAST variable_name

| SKIP TO variable_name}



row_pattern ::=

row_pattern_term

| row_pattern "|" row_pattern_term



row_pattern_term ::=

row_pattern_factor

| row_pattern_term row_pattern_factor



row_pattern_factor ::=

row_pattern_primary [row_pattern_quantifier]



row_pattern_quantifier ::=

*[?]

|+[?]

|?[?]

|"{"[unsigned_integer ],[unsigned_integer]"}"[?]

|"{"unsigned_integer "}"



row_pattern_primary ::=

variable_name

|$

|^

|([row_pattern])

|"{-" row_pattern"-}"

| row_pattern_permute



row_pattern_permute ::=

PERMUTE (row_pattern [, row_pattern] ...)



row_pattern_subset_clause ::=

SUBSET row_pattern_subset_item [, row_pattern_subset_item] ...



row_pattern_subset_item ::=

variable_name = (variable_name[ , variable_name]...)



row_pattern_definition_list ::=

row_pattern_definition[, row_pattern_definition]...



row_pattern_definition ::=

variable_name AS condition



create table t ( run_date date, time_in_s int, distance_in_miles int ); insert into t values (date'2017-01-01', 420, 1); insert into t values (date'2017-01-02', 2400, 5); insert into t values (date'2017-01-03', 2430, 5); insert into t values (date'2017-01-04', 2350, 5); insert into t values (date'2017-01-05', 410, 1); insert into t values (date'2017-01-06', 400, 1); insert into t values (date'2017-01-08', 2300, 5); insert into t values (date'2017-01-09', 425, 1); insert into t values (date'2017-01-10', 422, 1); commit; select * from t; RUN_DATE TIME_IN_S DISTANCE_IN_MILES 01-JAN-2017 00:00:00 420 1 02-JAN-2017 00:00:00 2,400 5 03-JAN-2017 00:00:00 2,430 5 04-JAN-2017 00:00:00 2,350 5 05-JAN-2017 00:00:00 410 1 06-JAN-2017 00:00:00 400 1 08-JAN-2017 00:00:00 2,300 5 09-JAN-2017 00:00:00 425 1 10-JAN-2017 00:00:00 422 1

select * from t match_recognize ( order by run_date pattern ( anything ) define anything as run_date = run_date ); SQL Error: ORA-30732: table contains no user-visible columns

select * from t match_recognize ( order by run_date measures match_number() as mno, classifier() as cls pattern ( anything ) define anything as run_date = run_date ); MNO CLS 1 ANYTHING 2 ANYTHING 3 ANYTHING 4 ANYTHING 5 ANYTHING 6 ANYTHING 7 ANYTHING 8 ANYTHING 9 ANYTHING

select * from t match_recognize ( order by run_date measures match_number() as mno, classifier() as cls pattern ( anything* ) define anything as run_date = run_date ); MNO CLS 1 ANYTHING

select * from t match_recognize ( order by run_date measures match_number() as mno, classifier() as cls all rows per match pattern ( anything* ) define anything as run_date = run_date ); RUN_DATE MNO CLS TIME_IN_S DISTANCE_IN_MILES 01-JAN-2017 00:00:00 1 ANYTHING 420 1 02-JAN-2017 00:00:00 1 ANYTHING 2,400 5 03-JAN-2017 00:00:00 1 ANYTHING 2,430 5 04-JAN-2017 00:00:00 1 ANYTHING 2,350 5 05-JAN-2017 00:00:00 1 ANYTHING 410 1 06-JAN-2017 00:00:00 1 ANYTHING 400 1 08-JAN-2017 00:00:00 1 ANYTHING 2,300 5 09-JAN-2017 00:00:00 1 ANYTHING 425 1 10-JAN-2017 00:00:00 1 ANYTHING 422 1

pattern ( faster+ )

define faster as time_in_s < prev(time_in_s)

select * from t match_recognize ( order by run_date measures match_number() as mno, classifier() as cls all rows per match pattern ( faster+ ) define faster as time_in_s < prev(time_in_s) ); RUN_DATE MNO CLS TIME_IN_S DISTANCE_IN_MILES 04-JAN-2017 00:00:00 1 FASTER 2,350 5 05-JAN-2017 00:00:00 1 FASTER 410 1 06-JAN-2017 00:00:00 1 FASTER 400 1 09-JAN-2017 00:00:00 2 FASTER 425 1 10-JAN-2017 00:00:00 2 FASTER 422 1

select * from t match_recognize ( partition by distance_in_miles order by run_date measures match_number() as mno, classifier() as cls all rows per match pattern ( faster+ ) define faster as time_in_s < prev(time_in_s) ); DISTANCE_IN_MILES RUN_DATE MNO CLS TIME_IN_S 1 05-JAN-2017 00:00:00 1 FASTER 410 1 06-JAN-2017 00:00:00 1 FASTER 400 1 10-JAN-2017 00:00:00 2 FASTER 422 5 04-JAN-2017 00:00:00 1 FASTER 2,350 5 08-JAN-2017 00:00:00 1 FASTER 2,300

select * from t match_recognize ( partition by distance_in_miles order by run_date measures match_number() as mno, classifier() as cls all rows per match pattern ( anything faster+ ) define faster as time_in_s < prev(time_in_s) ); DISTANCE_IN_MILES RUN_DATE MNO CLS TIME_IN_S 1 01-JAN-2017 00:00:00 1 ANYTHING 420 1 05-JAN-2017 00:00:00 1 FASTER 410 1 06-JAN-2017 00:00:00 1 FASTER 400 1 09-JAN-2017 00:00:00 2 ANYTHING 425 1 10-JAN-2017 00:00:00 2 FASTER 422 5 03-JAN-2017 00:00:00 1 ANYTHING 2,430 5 04-JAN-2017 00:00:00 1 FASTER 2,350 5 08-JAN-2017 00:00:00 1 FASTER 2,300

alter session set nls_date_format = 'DD-MON-YYYY'; select * from t match_recognize ( partition by distance_in_miles order by run_date measures match_number() as mno, classifier() as cls, first(run_date) as frun_date, last(run_date) as lrun_date, count(*) as c all rows per match pattern ( anything faster+ ) define faster as time_in_s < nvl(prev(time_in_s), time_in_s) ); DISTANCE_IN_MILES RUN_DATE MNO CLS FRUN_DATE LRUN_DATE C TIME_IN_S 1 01-JAN-2017 1 ANYTHING 01-JAN-2017 01-JAN-2017 1 420 1 05-JAN-2017 1 FASTER 01-JAN-2017 05-JAN-2017 2 410 1 06-JAN-2017 1 FASTER 01-JAN-2017 06-JAN-2017 3 400 1 09-JAN-2017 2 ANYTHING 09-JAN-2017 09-JAN-2017 1 425 1 10-JAN-2017 2 FASTER 09-JAN-2017 10-JAN-2017 2 422 5 03-JAN-2017 1 ANYTHING 03-JAN-2017 03-JAN-2017 1 2,430 5 04-JAN-2017 1 FASTER 03-JAN-2017 04-JAN-2017 2 2,350 5 08-JAN-2017 1 FASTER 03-JAN-2017 08-JAN-2017 3 2,300

select * from t match_recognize ( partition by distance_in_miles order by run_date measures match_number() as mno, classifier() as cls, first(run_date) as frun_date, final last(run_date) as lrun_date, final count(*) as c all rows per match pattern ( anything faster+ ) define faster as time_in_s < nvl(prev(time_in_s), time_in_s) ); DISTANCE_IN_MILES RUN_DATE MNO CLS FRUN_DATE LRUN_DATE C TIME_IN_S 1 01-JAN-2017 1 ANYTHING 01-JAN-2017 06-JAN-2017 3 420 1 05-JAN-2017 1 FASTER 01-JAN-2017 06-JAN-2017 3 410 1 06-JAN-2017 1 FASTER 01-JAN-2017 06-JAN-2017 3 400 1 09-JAN-2017 2 ANYTHING 09-JAN-2017 10-JAN-2017 2 425 1 10-JAN-2017 2 FASTER 09-JAN-2017 10-JAN-2017 2 422 5 03-JAN-2017 1 ANYTHING 03-JAN-2017 08-JAN-2017 3 2,430 5 04-JAN-2017 1 FASTER 03-JAN-2017 08-JAN-2017 3 2,350 5 08-JAN-2017 1 FASTER 03-JAN-2017 08-JAN-2017 3 2,300

select * from t match_recognize ( partition by distance_in_miles order by run_date measures match_number() as mno, classifier() as cls, first(run_date) as frun_date, last(run_date) as lrun_date, count(*) as c pattern ( anything faster+ ) define faster as time_in_s < nvl(prev(time_in_s), time_in_s) ); DISTANCE_IN_MILES MNO CLS FRUN_DATE LRUN_DATE C 1 1 FASTER 01-JAN-2017 06-JAN-2017 3 1 2 FASTER 09-JAN-2017 10-JAN-2017 2 5 1 FASTER 03-JAN-2017 08-JAN-2017 3

Match_recognize in SQL is awesome. But does take a bit of explaining to understand as you say.So here goes:Imagine you've started running. You're keeping track of your progress in the following table:You want to know how many days in a row you ran faster than the previous day. Pattern matching is ideal for this kind of problem. Let's see how.Before we do that, let's start with the simplest thing possible: matching anything.To do this look for a pattern of "anything". Next you need to define this variable. To always have a match, you need something that is always true. One way to do this is looking for rows where the run_date equals itself:Hmmm. What's going on there?The query has no columns to display!To resolve this you either need to add:- A measures clause specifying output columns- A partition by clause. These columns are included in your output- The "all rows per match" clause, which includes the source table's columnsLet's add measures with a couple of predefined functions:- match_number() - Assigns a number to each row in the match. All rows in the same match will have the same value. This starts at 1 and increases for each new match.- classifier() - Which pattern variable the current row matchesThese are handy for debugging. In they go and you get:So that gives 9 different matches with match numbers 1-9. All match the variable anything, so this is the classifier.Why's that? Surely if you're matching "anything", it should all be one match?Well the pattern is a regular expression. Anything on its own means match one row. To match zero or more rows, use the asterisk operator:Now you have just a single match. But where have all the other rows gone?!By default, match_recognize only returns one row per match. There's only one match, so you only get one row. To see them all, specify "all rows per match":Note doing this also shows you all the columns in the source table. So I'd recommend using "all rows per match" while you're developing your queries.OK, that's the basics. Let's move onto the original problem: Finding days you ran faster than the previous one.The pattern you're looking for is one or more days where time_in_s is less than this value for the previous row. The patterns use regular expressions. The + is for one or more. So your pattern variable will be:To get the value for a column in the previous row, pass it to the prev() function. So the definition of this variable is:Plug this into your query and you get:So you've two periods where you had a series of days faster than the previous. Unsurprisingly when you ran one mile the day after a five mile run, you were quicker!These probably aren't the results you want. It's more useful to know whether you were quicker when you ran thedistance. To split the results up like this, add a "partition by distance_in_miles" clause:So now you have three periods when you were faster over the same distance. Two for one mile and one for 5 miles. Note the match_number() resets back to 1 for the group of 5 miles.This is good. But you probably want to include the day before you were faster in your results. i.e. your very first run or the last one you were slower. To get this, add an "anything" match again:Note you don't actually need to define the "anything" variable! Also the rows now have different classifiers, depending on which variable they matched in the pattern clause.OK, this is looking better. But in your final results you may want to see:- The first and last days in each series of improvements- The number of consecutive days you were fasterTo get these, add them to the measures clause. The first() and last() functions will, as you might expect, return the first and last values in the match for the expression. You can get the number of rows in the match with your trusty old count(*):Note: count(*) and last() return the relevant values up to that point. i.e. the running count and current date. To overcome this and show the actual last values in the group, add the final keyword before them:At this point you're nearly done!If you only want one row/series of faster days, remove the "all rows per match" clause. You can also take out the "final" keywords before last and count. This is because "one row per match" gives the final value:Hopefully that gives you enough of the basics to get started :)Pattern matching is a fantastic feature that makes it easy to solve problems that were previously thorny using SQL. For example, bin packing style problems:If you need more examples, check out these by Tim Hall:And some from Tom Kyte at:You've already mentioned Keith Laker's match_recognize series. Here's the link to the first in the series for those interested:NOTE: Edited explanation of SQL Error: ORA-30732 to include all ways to resolve this.