In this third installment of the regular expression functions added to Db2 for i is going to be about REGEXP_SUBSTR . The SQL SUBSTRING and RPG %SUBST built in function need a fixed start and number of positions. The REGEXP_SUBST allows me to use regular expressions patterns to find and substring the pattern I desire.

All of these Regular Expressions functions were added to Db2 for i as part of the 7.3 TR2 and 7.2 TR6 updates. Rather than repeat what I have in the earlier posts, I will refer you to them for more details.

The format for this function is:

REGEXP_SUBSTR(source-string, search-pattern, starting-position, occurrence, flags, group)

source-string What is searched, it can be a string or a variable. search-pattern What I am looking for in the search string. starting-position Where I want to start my search for the search pattern. flags These are the same flags I can use with the other regexp functions. group Capture group, as with REGEXP_INSTR I was not able to find a reason why I would use this.

In these examples I am going to use the same table, PERSON , I used in the post I wrote about REGEXP_LIKE , therefore, I am not going to waste your time to describe the table or its contents again.

Let me start with my example, to find all the people who have an e in their last name followed by another character.

01 SELECT LNAME, 02 REGEXP_SUBSTR(LNAME,'e.',1,1,'i') AS RESULT 03 FROM PERSON

Line 2: The parameters in the REGEXP_SUBSTR are:

LNAME The short name of the last name column in the table. 'e.' The dot ( . ) following the e indicates that I want an E followed by some other character. 1 I want my search to start in the first position of the last name. 1 First occurrence of the letter E only. 'i' The lower case i means that my search is not case specific, in other words it will match both lower and upper case forms of the letter.

The results are:

Last RESULT name ALLEN EN Austin - bennett en BRUCK - Caine - curtis - DALEY EY Framer er harvey ey HEWITT EW Hill - hill - HOLDER ER Imlach - jones es KEARNS EA Kletzenbauer et lightening en MYERSCOUGH ER Nicholas - ryan - SATCHWELL EL Stiffle - straw - WESSON ES

Note: The dashes ( - ) indicate a null, not found, value.

This has worked like a scan, to find the E, and a substring, to get the result. Notice that with the last name Stiffle null was returned as the E at the end of the name is not followed by another character.

I only know of one way to remove the rows that return null from the results by adding a where statement like line 4, see below.

01 SELECT LNAME, 02 REGEXP_SUBSTR(LNAME,'e.',1,1,'i') AS RESULT 03 FROM PERSON 04 WHERE REGEXP_SUBSTR(LNAME,'e.',1,1,'i') IS NOT NULL

If you know of an easier or better way to stop null appearing in the result set please let me know in the Comments below.

So what do I get if I change my Select to look for the second occurrence of E.

01 SELECT LNAME, 02 REGEXP_SUBSTR(LNAME,'e.',1,2,'i') AS RESULT 03 FROM PERSON 04 WHERE REGEXP_SUBSTR(LNAME,'e.',1,2,'i') IS NOT NULL

Line 2: I change the occurrence value, the fourth parameter, to 2 to indicate I only want to include in my results a second E in the last names.

Line 4: I have repeated the REGEXP_SUBSTR in the Where to ignore any null results.

And what do I get?

Last RESULT name bennett et Kletzenbauer en

Only those names with a second E in them, followed by a character.

How about any name with a vowel in it, followed by N.

01 SELECT LNAME, 02 REGEXP_SUBSTR(LNAME,'[aeiou]n',1,1,'i') AS RESULT 03 FROM PERSON 04 WHERE REGEXP_SUBSTR(LNAME,'[aeiou]n',1,1,'i') IS NOT NULL

Line 2: The square brackets ( [ ] ) allow me to put in a list of range of values. In my example I have listed all the vowels. What this will do is return in the results any last name with a vowel followed by a letter N.

Last RESULT name ALLEN EN Austin in bennett en Caine in jones on Kletzenbauer en lightening en ryan an WESSON ON

How about a vowel followed by another vowel?

01 SELECT LNAME, 02 REGEXP_SUBSTR(LNAME,'[aeiou]+[aeiou]',1,1,'i') AS RESULT 03 FROM PERSON 04 WHERE REGEXP_SUBSTR(LNAME,'[aeiou]+[aeiou]',1,1,'i') IS NOT NULL

Line 2: in the search pattern I have used a plus ( + ), in regular expressions it means "match preceding element one or more times".

The best way to understand what that means is to see the results.

Last RESULT name Austin Au Caine ai KEARNS EA Kletzenbauer aue MYERSCOUGH OU

The interesting result is for the last name "Kletzenbauer" as it returns three vowels because we have the first vowel, A, and an ending vowel, E, but also any other vowels that precede the last vowel. If there was a last name with four vowels next to each other, " Baaaat " is would appear in the results as " aaaa ", because the vowels are all next to one another.

Let me make this more complicated… I want to extract strings from these last names starting with the first vowel and ending with the last vowel.

01 SELECT LNAME, 02 REGEXP_SUBSTR(LNAME,'[aeiou]+[a-z]+[aeiou]',1,1,'i') AS RESULT 03 FROM PERSON

My results are just what I desired:

Last RESULT name ALLEN ALLE Austin Austi bennett enne BRUCK - Caine aine curtis urti DALEY ALE Framer ame harvey arve HEWITT EWI Hill - hill - HOLDER OLDE Imlach Imla jones one KEARNS - Kletzenbauer etzenbaue lightening ighteni MYERSCOUGH ERSCOU Nicholas ichola ryan - SATCHWELL ATCHWE Stiffle iffle straw - WESSON ESSO

I included the null results this time so you could see why some of the last names did not return a result.

BRUCK , Hill , hill , ryan , straw do not return results as they only have one vowel.

, , , , do not return results as they only have one vowel. KEARNS does not return a vowel as the two vowels are next to one another, with no other character between.

What I have been impressed with is the ease I have been able to get substrings from strings without having to give a hard starting position and a length, try creating a line of RPG code to do the same thing. I can definitely see myself using this function in the future.

Other Db2 for i regular expressions:

You can learn more about the REGEXP_SUBSTR command from the IBM website here.

This article was written for IBM i 7.3 TR2 and 7.2 TR6.