The Regex Coach - interactive regular expressions





Abstract The Regex Coach is a graphical application for Windows which can be used to experiment with (Perl-compatible) regular expressions interactively. It has the following features: It shows whether a regular expression matches a particular target string.

It can also show which parts of the target string correspond to captured register groups or to arbitrary parts of the regular expression.

It can "walk" through the target string one match at a time.

It can simulate Perl's split and s/// (substitution) operators.

and (substitution) operators. It tries to describe the regular expression in plain English.

It can show a graphical representation of the regular expression's parse tree.

It can single-step through the matching process as performed by the regex engine.

Everything happens in "real time", i.e. as soon as you make a change somewhere in the application all other parts are instantly updated.

The program hasn't changed since 2008 and this page is also essentially still the same. But I can confirm that in September 2019 the program still works fine for me on Windows 10.

The Regex Coach together with this documentation can be downloaded from http://weitz.de/files/regex-coach.exe. The current version is 0.9.2 - see the changelog for what's new. The file (an installer) is about 2MB in size.

You should use Windows 2000 or Windows XP with all updates and service packs installed. The program might work with older or unpatched Windows versions, but don't expect support for these configurations. See also below.

You also must have the Microsoft runtime library msvcr80.dll installed. If you don't have it or if you aren't sure, you can get it from http://www.microsoft.com/downloads/details.aspx?familyid=32BC1BEE-A3F9-4C13-9C99-220B62A191EE&displaylang=en.

If you have a previous version (0.8.5 or earlier) of The Regex Coach installed, uninstall it first before you install the new version! If you haven't done this, and the new application won't start, remove the file The Regex Coach.exe.manifest from the application directory.

If you have an older version of Windows and the current version of The Regex Coach doesn't work for you, you can try the last release which was built with LispWorks 4.4.6 - it is at http://weitz.de/files/regex-coach-0.8.5.exe. If that works for you - fine. Don't expect support or updates, though.

There is no Mac version and I have no plans to release one. Sending me email and begging for it won't change that. And, no, I don't want to open source the application or send the source code to you privately - no need to ask...

Jeremy Rayner has written a "homage" to The Regex Coach in Java - see here for more details.





The Regex Coach is free for private or non-commercial use. The Regex Coach is also free for commercial use but you are not allowed to re-distribute it and/or charge money for it without written permission by the author - email me at edi@weitz.de for details.

The program is provided 'as is' with no warranty - use at your own risk.





m//

s///

split

Of course, this application should also be useful to programmers using Perl-compatible regex toolkits like PCRE (which is used by projects like Python, Apache, and PHP) or CL-PPCRE. Also, Java's regular expressions and those of XML Schema are very similar to Perl's.

The following descriptions will use the notions introduced by this annotated screenshot. The screenshot itself is an imagemap - click on any part of it to go directly to the relevant section of the docs.





GNU Emacs

bash

TAB

The upper pane is the regex pane. Here you'll type the regular expression you want to investigate.

The second pane is the target pane. Here you'll type the text (the target string) the regular expression will try to match.

If there's a match, the part of the target string that matched will be emphasized by a yellow background. (If you also check the ' g ' modifier checkbox all matches will be emphasized - the "current" one in yellow, the others in green.)

The target message area will show the extent of the match (or notify you that there isn't a match at all). This is particularly useful if there's a zero-length match because you won't see any highlighted characters in the target pane in this case. The message "Match from n to m" means that the characters starting from position n up to m (exclusively) belong to the match. The first character of the string is character 0 (zero) as usual.

b

b

If you've made an invalid selection the selection highlight button is disabled. You'll also see a message about your selection being invalid in the info pane.

If you have no idea what a "valid subexpression" of the regular expression could be consider the following rule of thumb: Every part of the regular expression which can be wrapped in a non-capturing group - i.e. with (?:...) - without altering the meaning of the expression is valid.

(A more precise description of this would be: Consider the parse tree of the regular expression and assume that every leaf of the tree which is a string is further divided into the single characters which together constitute the string. Now, every contiguous part of the regular expression which can be completely and exactly covered by nodes of the parse tree is a valid subexpression.)

Press the "nothing" button to disable highlighting.

g

split

The headline above the scan buttons which usually says "Scan from 0" will change accordingly showing a message like "Scan #n from m" which means that the regex engine is trying to find the nth match starting at character m of the target string. The target message area will be changed as well - it'll say "Match #n from k to l" instead of "Match from k to l" (or it'll say "No further match" instead of "No match" if you've pressed the scan forward button too often).

s///

Note that you'll have to use " \& ", " \` ", " \' " and "

" instead of Perl's " $& ", " $` ", " $' " and " $n " - see the CL-PPCRE documentation for the gory details.

split

You can use the radio buttons below the pane to select another divider if the vertical line happens to be a part of your target string. But note that choosing the "block" option might significantly slow down the program if your target strings are long.

You can type a non-negative integer into the "Limit" field. This corresponds to the optional third argument to Perl's split operator.

Note that many of the optimizations done by the CL-PPCRE engine are turned off here for pedagogical reasons. (For example, when trying to match the regex a*abc against the target string aaaabd the "real" engine wouldn't even start because it'll first use a Boyer-Moore-Horspool search to check if the constant string abc is somewhere in the target.) Some of them remain, however: The engine will only try to match from position 0 if the regex starts with .* and is in single-line mode. Also, as you'll see, the stepper tries to match constant strings as a whole (instead of single characters which would be quite boring).

i

g

Ctrl-s

Ctrl-x Ctrl-s

Note: Due to the way Motif works, the file menu can't be used like this on Linux. Instead you can use the Emacs key sequences Ctrl-x Ctrl-w and Ctrl-x i .

No automatic scrolling occurs while the target pane has the input focus.





aa...abb...b

(?:aa...a)(?:bb...b)

Also, there seem to be problems with Eastern European versions of Windows, specifically with "character set 1250" or similar. Sorry, I currently don't have the time and resources to investigate this any further.

If you encounter any other bugs or problems please send them to the mailing list.





It might be worthwhile to note that due to the dynamic nature of Lisp The Regex Coach could be written without changing a single line of code in the CL-PPCRE engine itself although the application has to track information and query the engine while the regular expressions is parsed and the scanners are built. All this could be done 'after the fact' by using facilities like defadvice and :around methods. Imagine writing this application in Perl without touching Perl's regex engine... :)

Also, thanks to LispWork's cross-platform CAPI toolkit the code for the Windows and Linux versions is nearly identical without any platform-specific parts (except for some lines regarding different fonts and keybindings).





Brigitte Bovy from LispWorks ("Xanalys" at that time) support helped with the tricky interaction between the editor panes. I also got a couple of helpful tips from the Lispworks mailing list, specifically from Jeff Caldwell, John DeSoi, David Fox, and Nick Levine.

Thanks to the guys at "Café Olé" in Hamburg where I wrote most of the code.

Development of the The Regex Coach has been supported by Euphemismen.de.







