Support for regular expressions was added to ECMAScript 3 in 1999.

Sixteen years later, ES6/ES2015 introduced Unicode mode (the u flag), sticky mode (the y flag), and the RegExp.prototype.flags getter.

This article highlights what’s happening in the world of JavaScript regular expressions right now. Spoiler: it’s quite a lot — there are more RegExp -related proposals currently advancing through the TC39 standardization process than there have been updates to RegExp in the history of ECMAScript!

We’ll discuss the following ES2018 features and ECMAScript proposals:

dotAll mode (the s flag)

By default, . matches any character except for line terminators:

/foo.bar/u.test('foo

bar');

// → false

(It doesn’t match astral Unicode symbols either, but we fixed that by enabling the u flag.)

ES2018 introduces dotAll mode, enabled through the s flag. In dotAll mode, . matches line terminators as well.

/foo.bar/su.test('foo

bar');

// → true

Lookbehind assertions

Lookarounds are zero-width assertions that match a string without consuming anything. ECMAScript currently supports lookahead assertions that do this in forward direction. Positive lookahead ensures a pattern is followed by another pattern:

const pattern = /\d+(?= dollars)/u;

const result = pattern.exec('42 dollars');

// → result[0] === '42'

Negative lookahead ensures a pattern is not followed by another pattern:

const pattern = /\d+(?! dollars)/u;

const result = pattern.exec('42 pesos');

// → result[0] === '42'

ES2018 adds support for lookbehind assertions. Positive lookbehind ensures a pattern is preceded by another pattern:

const pattern = /(?<=\$)\d+/u;

const result = pattern.exec('$42');

// → result[0] === '42'

Negative lookbehind ensures a pattern is not preceded by another pattern:

const pattern = /(?<!\$)\d+/u;

const result = pattern.exec('€42');

// → result[0] === '42'

Named capture groups

Currently, each capture group in a regular expression is numbered and can be referenced using that number:

const pattern = /(\d{4})-(\d{2})-(\d{2})/u;

const result = pattern.exec('2017-01-25');

// → result[0] === '2017-01-25'

// → result[1] === '2017'

// → result[2] === '01'

// → result[3] === '25'

This is useful, but not very readable or maintainable. Whenever the order of capture groups in the pattern changes, the indices need to be updated accordingly.

ES2018 adds support for named capture groups, enabling more readable and maintainable code.

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;

const result = pattern.exec('2017-01-25');

// → result.groups.year === '2017'

// → result.groups.month === '01'

// → result.groups.day === '25'

Unicode property escapes

The Unicode Standard assigns various properties and property values to every symbol. For example, to get the set of symbols that are used in the Greek script, search the Unicode database for symbols whose Script_Extensions property is set to Greek .

Unicode property escapes make it possible to access these Unicode character properties natively in ECMAScript regular expressions. For example, the pattern \p{Script_Extensions=Greek} matches every symbol that is used in the Greek script.

const regexGreekSymbol = /\p{Script_Extensions=Greek}/u;

regexGreekSymbol.test('π');

// → true

Previously, developers wishing to use equivalent regular expressions in JavaScript had to resort to large run-time dependencies or build scripts, both of which lead to performance and maintainability problems. With built-in support for Unicode property escapes, creating regular expressions based on Unicode properties couldn’t be easier.

Unicode sequence property escapes

A separate proposal extends Unicode property escapes functionality to Unicode properties that expand to sequences of characters, such as Basic_Emoji (which encompasses all emoji, regardless of whether they consist of a single code point or a sequence of code points):

const regexBasicEmoji = /\p{Basic_Emoji}/u;

// Note: although 4️⃣ looks like a single symbol, it consists

// of two Unicode code points.

regexBasicEmoji.test('4️⃣');

// → true



// Flag emojis consist of multiple code points.

regexBasicEmoji.test('🇧🇪');

// → true

This proposal would make it easier to match emoji (which can consist of multiple code points) and hashtags (which can contain emoji) using regular expressions. As the Unicode Standard defines more sequence properties over time, JavaScript regular expressions could support those as well.

Note: This proposal is still in the process of being standardized, and as such, its syntax is subject to change. The descriptions and code examples in this article match the latest versions of the proposal at the time of writing. This proposal is currently at stage 2 and can make it into ES2020, at the earliest.

String.prototype.matchAll

A common use case of global ( g ) or sticky ( y ) regular expressions is applying it to a string and iterating through all of the matches, including capturing groups. The String.prototype.matchAll proposal makes this easier than ever before.

const string = 'Magic hex numbers: DEADBEEF CAFE 8BADF00D';

const regex = /\b[0-9a-fA-F]+\b/g;

for (const match of string.matchAll(regex)) {

console.log(match);

}

The match object for each loop iteration is equivalent to what regex.exec(string) would return.

// Iteration 1:

[

'DEADBEEF',

index: 19,

input: 'Magic hex numbers: DEADBEEF CAFE 8BADF00D'

]



// Iteration 2:

[

'CAFE',

index: 28,

input: 'Magic hex numbers: DEADBEEF CAFE 8BADF00D'

]



// Iteration 3:

[

'8BADF00D',

index: 33,

input: 'Magic hex numbers: DEADBEEF CAFE 8BADF00D'

]

String.prototype.matchAll is especially useful for regular expressions with capture groups:

const string = 'Favorite GitHub repos: tc39/ecma262 v8/v8.dev tc39/test262';

const regex = /\b(?<owner>[a-z0-9]+)\/(?<repo>[a-z0-9\.]+)\b/g;



for (const match of string.matchAll(regex)) {

console.log(`${match[0]} at ${match.index} with '${match.input}'`);

console.log(`→ owner: ${match.groups.owner}`);

console.log(`→ repo: ${match.groups.repo}`);

}



// Output:

//

// tc39/ecma262 at 23 with 'Favorite GitHub repos: tc39/ecma262 v8/v8.dev tc39/test262'

// → owner: tc39

// → repo: ecma262

// v8/v8.dev at 36 with 'Favorite GitHub repos: tc39/ecma262 v8/v8.dev tc39/test262'

// → owner: v8

// → repo: v8.dev

// tc39/test262 at 46 with 'Favorite GitHub repos: tc39/ecma262 v8/v8.dev tc39/test262'

// → owner: tc39

// → repo: test262

Legacy RegExp features