Table of contents Unicode

XRegExp.matchRecursive

XRegExp.build

Addons

If you want, you can download XRegExp bundled with all addons as xregexp-all.js. Alternatively, you can download the individual addon scripts from GutHub. XRegExp's npm package uses xregexp-all.js .

Unicode

The Unicode Base script adds base support for Unicode matching via the \p{…} syntax. À la carte token addon packages add support for Unicode categories, scripts, blocks, and other properties. All Unicode tokens can be inverted using \P{…} or \p{^…} . Token names are case insensitive, and any spaces, hyphens, and underscores are ignored. You can omit the braces for token names that are a single letter.

Example

// Categories XRegExp('\\p{Sc}\\pN+'); // Sc: currency symbol, N: number // Scripts XRegExp('\\p{Cyrillic}'); XRegExp('[\\p{Latin}\\p{Common}]'); // Blocks (use 'In' prefix) XRegExp('\\p{InLatinExtended-A}'); XRegExp('\\P{InPrivateUseArea}'); // Uppercase \P for negation XRegExp('\\p{^InMongolian}'); // Alternate negation syntax // Properties XRegExp('\\p{ASCII}'); XRegExp('\\p{Assigned}'); // In action... var unicodeWord = XRegExp("^\\pL+$"); // L: Letter unicodeWord.test("Русский"); unicodeWord.test("日本語"); unicodeWord.test("العربية"); XRegExp("^\\p{Katakana}+$").test("カタカナ");

By default, \p{…} and \P{…} support the Basic Multilingual Plane (i.e. code points up to U+FFFF ). You can opt-in to full 21-bit Unicode support (with code points up to U+10FFFF ) on a per-regex basis by using flag A . In XRegExp, this is called astral mode. You can automatically add flag A for all new regexes by running XRegExp.install('astral') . When in astral mode, \p{…} and \P{…} always match a full code point rather than a code unit, using surrogate pairs for code points above U+FFFF .

// Using flag A to match astral code points XRegExp('^\\pS$').test('💩'); // -> false XRegExp('^\\pS$', 'A').test('💩'); // -> true XRegExp('(?A)^\\pS$').test('💩'); // -> true // Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo) XRegExp('(?A)^\\pS$').test('\uD83D\uDCA9'); // -> true // Implicit flag A XRegExp.install('astral'); XRegExp('^\\pS$').test('💩'); // -> true

Opting in to astral mode disables the use of \p{…} and \P{…} within character classes. In astral mode, use e.g. (\pL|[0-9_])+ instead of [\pL0-9_]+ .

XRegExp.matchRecursive

See API: XRegExp.matchRecursive.

XRegExp.build

See API: XRegExp.build.

◊