On this post I will be covering how to use regex to add custom delimiters to a string. The code referenced in this post was written to work in Postgres 10.3. The code will not work as expected in Postgres 9.6 or below. The function that I created as an example for this post takes a string as input and add a ‘_’ as the delimiter between digits and letters. Below is the output of the function.



SELECT add_delimiter('test12foo34');

add_delimiter

----------------

test_12_foo_34



Below is the complete code for the add_delimiter function. I will cover each section in detail throughout this post.



CREATE OR REPLACE FUNCTION add_delimiter(str VARCHAR) RETURNS text AS $$

DECLARE

new_string text;

split_char varchar;

regex varchar;

prefixes varchar[];

regex_parts varchar[];

regex_parts_length int;

i int;

BEGIN

split_char := '_';

regex := '([a-zA-Z]{1}\d{1}|\d{1}[a-zA-Z]{1})';

prefixes := regexp_split_to_array(str,regex);

regex_parts := (SELECT ARRAY(select array_to_string(regexp_matches(str,regex,'g'),'')));

regex_parts_length := array_upper(regex_parts,1);

new_string := '';

FOR i IN 1..array_upper(prefixes,1)

LOOP

new_string := new_string || prefixes[i];

IF i <= regex_parts_length THEN

new_string := new_string || substring(regex_parts[i],1,1)||split_char||substring(regex_parts[i],2,1);

END IF;

END LOOP;

RETURN new_string;

END; $$

LANGUAGE PLPGSQL;



In the first section of the function we define our delimiter and regex.



split_char := '_';

regex := '([a-zA-Z]{1}\d{1}|\d{1}[a-zA-Z]{1})';



The regex in my example matches any occurrence of a digit next to a letter and vice-versa.

The next statement:



prefixes := regexp_split_to_array(str,regex);



Returns the parts of the string which do not match the regular expression. From the Postgres docs: “Split string using a POSIX regular expression as the delimiter”. Here is what regexp_split_to_array returns with my example:



select regexp_split_to_array('test12foo34','([a-zA-Z]{1}\d{1}|\d{1}[a-zA-Z]{1})');

regexp_split_to_array

-----------------------

{tes,"",o,4}



The following section:



regex_parts := (SELECT ARRAY(select array_to_string(regexp_matches(str,regex,'g'),'')));



returns all the sections of the string that match the regex.



SELECT ARRAY(select array_to_string(regexp_matches('test12foo34','([a-zA-Z]{1}\d{1}|\d{1}[a-zA-Z]{1})','g'),''));

array

------------

{t1,2f,o3}



The outer function ARRAY() is necessary to return the result of regexp_matches as a single array, because regexp_matches will return multiple array rows as seen here:



SELECT regexp_matches('test12foo34','([a-zA-Z]{1}\d{1}|\d{1}[a-zA-Z]{1})','g');

regexp_matches

----------------

{t1}

{2f}

{o3}



The next code section loops over the sections of the string that do not match the regex and creates a new string by concatenating the parts that do not match the regex with the parts that do. The function array_upper() is used to get the number of elements in the array. What is important to note is that in PL/pgSQL arrays start from index 1. In my example the sections that match the regex must be 2 characters in length, therefore it is trivial to add a delimiter where it is required.

Creating this function was my first introduction to the power of PL/pgSQL, and I’m sure now that I am more familiar with it I will be using it again in the future. Here are the links to the PostgreSQL documentation that I used for this post: String Functions, Array Functions, and PL/pgSQL Tutorial. I hope you enjoyed! Thanks for reading.