This was originally intended to be a CUFP, but I thought that the story behind the problem was at least as interesting as the solution; and, the more I wrote of the story, the more meditative it seemed. (I think that, even if the specific code doesn't catch your attention much, at least one of the linked topics might!) I hope that this was the right place for it to end up.

The question has (implicitly) rung out, on Perlmonks, Proggit, and, one must presume, the haunted nightmares of many: Are Perl regexes Turing complete?

This question caught the eye of yours truly, except that I confused the Turing completeness of regexes with the Turing completeness of substitutions, which led to an apparently unmotivated question and which required ikegami to set me straight.

Now I think that I've got the question: Can Perl regexes be used to recognise an arbitrary recursively enumerable language? The stirring answer: No, sort of *.

Why not? Well, a regex can be simulated by a linear-bounded automaton (that's the hyphenation, and I'll stick by it), and these can recognise only the context-sensitive languages (not all the recursively-enumerable ones, as a Turing machine can). Another way to say this is that regexes can only recognise languages in NSPACE(n). Thought of in this way, it's not so surprising: A regex has finitely many capturing groups, say k of them, each of which can be stored in space equal to the length, n , of the input (since they will, after all, only contain substrings of the input); so we only need space k*n to perform the ‘super-DFA’ part of Perl regex matching **. I suspect this upper bound is far too from sharp, and that Perl-5.8 regexes can't recognise balanced parentheses, which are only context free. What if we fix this by upgrading to Perl-5.10 regexes, which allow recursion? Well, then we can obviously at least recognise balanced parentheses, but I suspect that we're still not Turing-complete—indeed, still only context sensitive. Maybe someone more versed in CS than am I can confirm this.

Well, so how about the question which I understood in the first place: Can Perl substitutions be used to perform any computation that a Turing machine can, viewing the string on which they act as the input tape? Again: No, sort of *. After all, a substitution consists just of a match—which can be completed by an LBA, as above—followed by the assembly of the substitution string—which, again, involves only constant overhead (the literal, or constant, part of the match) over a constant multiple of the input (the part coming from interpolation of capture variables—any other variables are not affected by the match, and so may as well be regarded as constants). Thus, substitutions can still handle only CSLs.

Well, that's a little anti-climactic. So, what is true? Well, recursive Perl substitutions are Turing complete. As I pointed out elsewhere in the linked thread, that's known; but, hey, if I can figure out in a month's work what's obvious to MJD, then I figure that I'm doing all right. :-)

OK, so, that was a post full of negatives followed by a bold assertion; let's substantiate it. The obvious way to show Turing completeness is to simulate a Turing machine, but, man, that's been done to death. I heard the plea of cdiggins on LtU, who wondered why everyone prefers the λ calculus over the SKI combinator-calculus, and determined that I would not be among those hip and trendy kids who think only of the former. I quickly came up with:

perl -pe ' BEGIN { my $term = qr/(S|K|\((?-1)(?-1)\))/ } 1 while s/\(\(\(S($term)\)($term)\)($term)\)/(($1$3)($2$3))/ or s/\(\(K($term)\)($term)\)/$2/; ' [download]

which does a great job of reducing (fully parenthesised) terms from the calculus, but is a little unsatisfying because it requires recursive regexes, which I've just said that I don't know aren't already Turing complete. All right, so, moving on.

Well, we know that Conway's game of life is Turing complete, so maybe that's a good place to start; but it's played out on a 2-dimensional grid, and translating that 2-dimensional grid into the 1-dimensional ‘tape’ for a regex seemed to be a little hairy. So, a 1-dimensional version—well, that sounds like cellular automata. Everybody's favourite new kind of scientist, who studied these extensively, has a favourite—which we know, thanks to Matthew Cook, is Turing complete. So, let's simulate that! The problem is that there are 8 different rules for cell propagation, and running 8 different substitutions seems sort of profligate; can we get it down to 2, as we did for the SKI combinator-calculus above? If (as I did) we get stuck on that, can we at least show that, if we could do it, we'd have got an optimal result—that is, that a 1-regex substitution can never be Turing complete? I spent a few sleepless nights on this; I was sure that it wasn't possible—after all, one of Turing's criteria for a universal computer is that it should be able to make choices; but our sub-Perl-5.10 regexes can't, so we should have to build in at least one choice “manually”, on the level of multiple regexes. How to prove it, though? I thought that I could come up with some argument from Kolmogorov complexity, but no luck—because, it finally occurred to me, I was guilty of the same blinkered thinking as the designers of the early, “fixed-program” computers, and was trying to shove the entire program into the architecture. Why not instead let the program be recorded in the data?

Out of that idea came the following code for implementing rule 110 (or, rather, rule 124, which propagates to the right instead of the left):

perl -Mre=eval -nE <<CODE chomp; $_ = <<RULE . "---

$_"; ^111 0^11 ^110 1^10 ^11 1^1 ^01 1^1 ^10 1^0 ^00 0^0 ^1 1 ^0 1 1^1 0 0^0 RULE BEGIN { my $nc_rule = qr/^ *\S+ *\S* *$/m; my $c_rule = qr/^ *(\S+) *(\S*) *$/m; } /\^.*$/ or say( ( split /

/ )[-1] ) while s/$c_rule(?:

$nc_rule)*

---

.*?\K\1/$2/; CODE [download]

bash

perl

No code specified for -e

0

1

0

1

...0001000...

1 11 111 1011 11111 100011 1100111 11101011 101111111 1110000011 10110000111 111110001011 1000110011111 11001110100011 111010111100111 ... [download]

(The here-doc syntax is fake—at least for me, undercomplains—but I assume that someone out there knows how to fix it.) This accepts input a string ofs ands, which is regarded as infinitely padded withs on both sides, and prints out its successive evolutes under rule 124. For example, here's a sample run, showing the evolution of the input(which stands for):(It goes on forever, so don't wait for the surprise finish. :-) See the list of projects below!)

Emboldened by this success, I thought that perhaps I could tackle the original problem of reduction for the SKI combinator-calculus. To keep things simple (well, for me), I decided to use the Unlambda syntax, whereby ` is used as the notation for application, so that (SK) becomes `SK . I also wanted to include support for variables; again to keep life simple, I stuck with the austere CSer's set x , x' , …. The code is as follows:

perl -Mre=eval -nE <<CODE chomp; $_ = <<RULE . "---

$_"; + ^'>'+ ^ + ^> ^ . ^`>`.. ^ . ^S>S ^ . ^K>K ^ . ^x>x+ ^ - ^'>- ^ - ^> ^ ! ^`>!! ^ ! ^S> ^ ! ^K> ^ ! ^x>- ^ , ; ^'>', '; ^ , ; ^> ^ @ * ^`>`@@ `** ^ @ * ^S>S S ^ @ * ^K>K K ^ @ * ^x>x, x; ^ ^> ```S>``.@`.*^ ``K>.!^ RULE BEGIN { my $vs = qr/[^ >

]*/; my $ss = qr/ *?/; my $c_rule = qr/(?>($vs)$ss($vs)$ss($vs)$ss>$ss($vs)$ss($vs)$ss($v + s)$)/m; my $nc_rule = qr/(?>(?:$vs$ss){3}>(?:$ss$vs){3}$)/m; } 1 while s/$c_rule(?:

+$nc_rule)*

---

.*?\K\1(.*?)\2(.*?)\3/$4$7$5$8$6/; say ( ( split /

/ )[-1] ); CODE [download]

t ::= S t ::= K t ::= xp p ::= 'p p ::= t ::= `tt [download]

t

p

S

K

x

'

`

`````S`KSKxx'x'' # B = S(KS)K `x`x'x'' # Bxyz = x(yz) `````S``S`K``S`KSKS`KKxx'x'' # C = S(S(K(S(KS)K))S)(KK) = S(S(KB)S)(KK + ) ``xx''x' # Cxyz = xzy ````SS`K``SKKxx' # W = SS(K(SKK)) ``xx'x' # Wxy = xyy [download]

```SKKx

x

``SK``SKK

``SKK

This accepts strings according to the grammarwhereis the start symbol;is another non-terminal; and, andare terminals. It then reduces them according to the rules of the SKI calculus. Here is a sample run, showing that it correctly reduces the basic combinators in the BCKW calculus (the comments are just that, comments; they are not actually typed):This handles only textual, not functional, equivalence; for example, it correctly reducesto, but it does not recognise thatcan be replaced by

Note, incidentally, that, if we had genuine variable-width look-behind (instead of simulating it with \K ), then we could get rid of Perl control structures and just hoist this into a single substitution, at the expense of considerably more wasted space; 1 while s/$orig/$repl/ would become s/(?<=0$orig)/0$repl/g , where 0 could be any symbol that does not occur in the input or any of our productions.

Some projects:

Formalise the idea that, even with recursion, regexes aren't Turing complete.

Give the cellular automaton a way to halt, probably by allowing the user to enter a “stop pattern” and halting when that stop pattern is an initial segment of the tape.

I'm actually using very little of the power of Perl regexes in the recursive substitutions. Do we need anything more than honest-to-goodness regular expressions? We can get rid of \K as described above. I think that we can get rid of capturing groups by ‘sliding’ each proposed match, together with its proposed replacement, along the text until we find an actual match (kind of like the Boyer–Moore algorithm?).

as described above. I think that we can get rid of capturing groups by ‘sliding’ each proposed match, together with its proposed replacement, along the text until we find an actual match (kind of like the Boyer–Moore algorithm?). A related one: As things stand, it is essential to use non-greedy quantifiers and to rely on the semantics of Perl regex matching in some places, to ensure that the right rule is chosen, and applied in the proper place. (That is, the ordering of replacement rules, and the particular spot at which the replacement is applied, are both significant.) Choose a better set of replacement rules, so that we do not have this dependence. (This is probably mostly a matter of using more symbols, so that we can tell, for example, the first . in the template for ```Sttt apart from the second.)

in the template for apart from the second.) Set up a recursive substitution to recognise functional equivalence of combinators (so that it could identify ``SKK and ``SKS as functionally equivalent).

and as functionally equivalent). Actually implement a universal Turing machine this way!