More resources

Related blog articles:

Papers:

Books:

MIT prof Michael Sipser's Theory of Computation covers regular languages and automata in exhaustive detail in the very first chapter.

Derivatives of regular expressions

The derivative of a regular expression with respect to a character computes a new regular expression that matches what the original expression would match, assuming it had just matched the character.

For example, the derivative of the expression (foo|frak)* with respect to the character f is the expression (oo|rak)(foo|frak)* .

On the other hand, the derivative of the expression (foo|frak)* with respect to the character c is the null pattern ∅. The null pattern is not allowed in most regular expression implementations, but it is necessary in order to make the derivative a total function. By definition, no string can match the null pattern.

The matching strategy with derivatives is straightforward:

If the string to match is empty and the current pattern matches empty, then the match succeeds. If the string to match is non-empty, the new pattern is the derivative of the current pattern with respect to the first character of the current string, and the new string to match is the remainder of the current string.

To define the derivative, we first need a function δ that returns the empty string ε if its argument accepts the empty string, and the null pattern ∅ otherwise:

δ (∅) = ∅

(∅) = ∅ δ (ε) = ε

(ε) = ε δ (c) = ∅

(c) = ∅ δ ( re 1 re 2 ) = δ ( re 1 ) δ ( re 2 )

( ) = ( ) ( ) δ ( re 1 | re 2 ) = δ ( re 1 ) | δ ( re 2 )

( | ) = ( ) | ( ) δ(re*) = ε

c

D c (∅) = ∅

(∅) = ∅ D c (ε) = ∅

(ε) = ∅ D c ( c ) = ε

( ) = ε D c ( c' ) = ∅ if c ≠ c' .

( ) = ∅ if ≠ . D c ( re 1 re 2 ) = δ ( re 1 ) D c ( re 2 ) | D c ( re 1 ) re 2

( ) = ( ) ( ) | ( ) D c ( re 1 | re 2 ) = D c ( re 1 ) | D c ( re 2 )

( | ) = ( ) | ( ) D c (re*) = D c (re) re*

Code

Let) denote the derivative of the regular expressionwith respect to the character; then the derivative can be defined recursively:

I've implemented the derivative in Scheme as a quick demonstration. Without much code, you can actually create a functioning regular-expression matcher: