Sorry, I don't buy this proof. In the last code block: sub halts { my $machine = shift; my $input = shift; is_whatever_nullary( qq{ run_turing_equivalent("\Q$machine\E", "\Q$input\E"); sub whatever() {}; } ) } [download] is_whatever_nullary() can certainly easily return a true value without having to execute run_turing_equivalent(). The whatever prototype is a compile-time property and the run-time impact of whatever run_turing_equivalent() does will not change the fact that whatever() is "nullary" at compile time. But it is quite obvious that Perl code can't be reliably parsed without running Perl code by the simple example of: BEGIN { if( 0.5 < rand() ) { eval "sub whatever() { }; 1" or die $@; } else { eval "sub whatever { }; 1" or die $@; } } whatever / 25 ; # / ; die "this dies!"; [download] You can, of course, replace the above conditional with some high-falutin' comp sci construct of your choosing. But I find that such just detracts from the obviousness. You have to run " 0.5 < rand() " in order to parse the last line of the above Perl code (which perl can parse differently each time that it is run, as shown below). > perl -MO=Deparse above.pl # ... whatever / 25; - syntax OK > perl -MO=Deparse above.pl # ... whatever(/ 25 ; # /); die 'this dies!'; - syntax OK [download] - tye

You spotted an unclarity in my presentation. I was vague about what "time" the code strings are to be run. Perhaps this is better: sub halts { my $machine = shift; my $input = shift; is_whatever_nullary( qq{ BEGIN { run_turing_equivalent("\Q$machine\E", "\Q$input\E"); sub whatever() {}; } } ) } [download] Using rand() is indeed more obvious and it was how I tested my code snippets (I lent my Halting Oracle to a friend and she never returned it ... but don't get me started.) I did not use it in the presentation because while it makes the proof more obvious, it's a proof of a weaker theorem. If you interpret rand() as truly random (and not as pseudo-random), then we're dealing with non-deterministic programs. Someone might then say, "as long as there is no non-determinism while compiling, Perl 5 is statically parseable." But it's not and a proof using the Turing machine simulator shows it is not. That Perl 5 is unparseable even when the code is completely deterministic is a much stronger result, and the distinction makes a difference in practice. The Turing simulation in this case is brought in for very practical reasons. I can't claim the credit for that. Adam Kennedy's hint took me in this direction. I suspect he also knew the business about rand(). But with his many hours of real life experience trying to statically parse Perl, he focused on the stronger proof -- the one that would give him the most information about what he was up against in creating PPI.

I suspect he also knew the business about rand(). But with his many hours of real life experience trying to statically parse Perl, he focused on the stronger proof -- the one that would give him the most information about what he was up against in creating PPI.



You pretty much nailed it exactly.



From a practical perspective there's about 5-10 things that, in various combinations, make Perl impossible to parse.



However, there's a big difference between what we believe and what we can prove.



And so from the perspective of knowing absolutely that Perl 5 can't be parsed, the Halting Problem approach was by far the most straight forward way to do it (at least to a layman like me).



This realization let me finally put the possibility of a complete parser out of my mind, and once I wasn't distracted by completeness any more, I had the chance to evaluate and actually define what "good enough" would mean for a "Perl 5 parser".



And that perspective on the problem was the one that took the code in the direction of the working solution.

I don't really find "If you don't try to solve the halting problem, then you can parse Perl" to be a "stronger" conclusion than "If you don't do something random, then you can parse Perl", other than, of course, the presence of high-falutin' comp sci lingo in the former. It is a rather simple fact that you can't reliably parse Perl without running Perl code (arbitrary Perl code). It reminds me of showing my kids a clock face and asking them to list the properties of the numbers on it. They never mention that they are sorted, because that just seems too obvious. The fact that you can't parse Perl is obvious. Adding useless complication to the conjecture makes it sound more like a rigorous proof, but I don't believe it actually helped your proof. A proof that you can't parse Perl, is: BEGIN { my $proto= ""; $proto= "()" if run_some_perl_code(); eval "sub whatever$proto { }"; } whatever / 6; # /; [download] The fact that run_some_perl_code() can be any arbitrary Perl code is quite clear. I guess this then boils down to the argument that somehow static analysis could be used to predict the results of run_some_perl_code() without actually running it. But the rand example clearly shows that not to be the case. As does using if @ARGV or if <STDIN> =~ /^y/i or if LWP::Simple::get("http://perlmonks.org/secretAlgorithm.pl") =~ /true/ or if unlink "/etc/passwd" or if find_prime_of_length( ~0 ) % 3 == 1 . Sure, the fact that run_some_perl_code() could be trying to solve the halting problem also demonstrates something. Of course, if I'm writing Perl support for an IDE, I'm not seriously concerned that the declaration of function prototypes is being based on answers to instances of the halting problem, so I'm likely to just dismiss your proof (based on your own reaction to the rand case). I find "You might have to run arbitrary Perl code" to be both a stronger statement of the result and a more interesting and useful one. I also think there is a flaw in your proof in that it confuses solving the halting problem with running code that may or may not halt. So your conclusion would need to be more like "Here is some code that will either determine that whatever() is nullary or that will never terminate." Since just because the code has been taking 400 years doesn't mean that you can determine that it will never halt, so your code doesn't even have the potential of solving the halting problem. So the lemma that is needed here is that "Static analysis of a program in a Turing-complete language (along with its 'input') can be insufficient to determine its output". Frankly, this is where it becomes uninteresting to me. (Minor updates applied during the 15 minutes after first posting.) - tye

A rather simpler demonstration is: eval $ARGV[ 0 ]; [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

eval $ARGV[ 0 ]; [download] The output from perl -MO=Deparse disagrees with you. eval isn't the "problem", BEGIN is. - tye

That's some mighty fine left brain thinking there( especially for a Monday morning ), but does it in anyway affect any practical aspect of Perl? Like can it be used to show that Perl is more or less reliable/secure? This isn't a criticism of your node, but I left college 35 years ago, and this sort of analysis seems very ivory-tower-ish to me now. It's sort of like saying "one cannot prove self-existence". Is the fact that Perl cannot parse itself a good or bad thing, or can other languages do it? Does that make them superior? Should it be a design criteria for Perl6? What am I missing? I'm not really a human, but I play one on earth. I'm not really a human, but I play one on earth. Cogito ergo sum a bum

As Jeffrey pointed out it means that you can't reliably parse perl code without executing it. This means that things like static code analysis, code transformation and syntax hilighting will never be reliable. This is a drawback indeed, but on the other hand it means that modules can extend Perl's syntax, and that other nifty stuff can be accomplished. So I understand this node as a proof of a property that is seldom fully understood.

Let's see: You can't statically analyse a dynamic language. (Ig)Nobel award materiale if ever I heard it. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

I don't think the fact that Perl is dynamic has anything to do with it. Lisp (of any dialect) is a dynamic language and is trivially easy to parse. The grammar of the language and its type system are two entirely different things.

It means Perl is alive, ALIVE!!

You every try parsing a CAT? Hell, dogs just go right in the parser with very little coaxing -- but cats? Oh man! You need thick rubber gloves and a spotter (usually a dog).

This totally makes my day. Would you mind if I converted this to POD and included it the documentation for PPI?

More to the point here... what should be done about your findings? Perl-5 obviously is “in production,” thousands of times over, and so cannot be changed. The only thing that can be influenced is therefore Perl-6 ... if that. We have had “unresolvable ambiguities” over the years in a great many languages that have nonetheless proved to be practically successful. Ergo... what?

In the sense I think you might mean, nothing should be done. I use Perl 5, I like Perl 5. I don't see a need to regard static unparsability as a problem and "solve" it. Larry apparently is going to static parsability in Perl 6. I don't have a firm opinion about whether that is bad or good. I do think that it's proved wise in the past to listen to Larry in these matters. The reason for a proof is there had been considerable question about whether Perl 5 was statically unparsable just for practical purposes, or just given present parsing techniques. Adam Kennedy's PPI man page says that the module name Parse::Perl had been reserved in case someone invented a way to statically parse Perl. Opinion shifted to the point where the most people felt certain that no static parser for Perl existed. But I think it's nice to get the issue completely nailed down. Lack of static parsability is an obstacle for many utilities that deal with Perl 5 code. That's definitely a downside. But it's not the whole equation, and the ability to do Turing-equivalent fiddling at BEGIN time is an upside.

Larry apparently is going to static parsability in Perl 6. You seem to be missing a "not" in that sentence. The last time I brought this up with Larry, which was admittedly a year ago, he wasn't going to budge on either of the dual-nature of "/" or the ability to prototype a subroutine to influence the number of parameters. Indeed, part of the difficulty in having implemented Perl 6 so far is not being able to have a clean lexical phase, thanks to this wonderful "perfect storm" of desirable features. You must execute Perl code at compile time, even in Perl 6, in order to even tokenize the remainder of the file. The only concession I heard from him is that the results of that executed code would not be promised to be carried forward into the result of the compilation, which seems a bit odd, as it would break things like: my $x; BEGIN { $x = time } [download] Unless the BEGIN block is executed both at compile time and once at pre-run time. Of course, I might be a few months out of date. I got bored following all the Perl 6 stuff, so I'm working on Smalltalk for a while until Perl6 gets a bit closer to a release. -- Randal L. Schwartz, Perl hacker

Actually Perl 6 is even harder to (statically) parse than Perl 5, because it is much easier to change the syntax. You can, for example, introduce new operators very easily: multi sub prefix:<foo> (Str $a) { return $a eq "foo" ?? "bar" !! "foo"; } [download] This will define an unary prefix operator that takes one string as an argument. When somebody writes foo "foo"; [download] You can't know if that's a method call or a call to an operator. You'd think it makes no difference - but wait until precedence comes into the play: BEGIN { eval " multi * prefix:<foo> (Str $a) is tighter(&infix:<~>) { return $a eq 'foo' ?? 'bar' !! 'foo'; "; } # and later in the code: foo "fo" ~ "o"; [download] If that foo parsed as a sub call the precedence is lower than that of the concatenation ~ , and the other way round if it's parsed as an operator. This is quite a complicated example, but there are much easier ones with "real" macros. (But since no implemenation really implements macros by now, I don't really know much about them).

Just wondering: does this same idea apply to any language that allows constructs within the code itself to affect the syntactic structure of that code? Take, for example, the following theoretical Haskell expression: foo =+= bar //|/ baz =+= qux [download] The parsing of this expression depends on the relative precedence of the two operators and possibily, but not necessarily, on the associativity of =+= It may also result in an error, if //|/ has higher precedence and =+= is declared as non-associative. However, the parsing of this does not depend on executing any code in the sense that the OP means, I think. That is, there must be a static declaration somewhere else in the code that says what the properties of these two operators should be. It need not lexically precede the usage, but it has to appear somewhere in the relevant scope, and it cannot be dynamically generated in any way. How would that fit in?

As with Perl, you cannot make sense of an individual source file without having already seen its dependancies. And, if you are correct about Haskell allowing use before definition (which I didn't think it did but...), then multiple passes may be required. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Correctly parsing a Haskell file requires parsing its dependencies, but it doesn't require running any Haskell code, since infix declarations are quite static. What it does mean is that to parse Haskell, you need to figure out the imports and infix declarations before actually parsing expressions. That's not very hard to do. You could, for example, initially parse expressions just into either lists of identifiers or rose trees regardless of precedence. Once you have the AST for a source file, it's easy to see the declarations that you need, and then use the information to fix up the expressions based on precedence.

If I catched all correctly, halts() returns true _after_ ran Turing machine $machine halts. And, consequently, halts() runs till $machine runs...

Any language which has any sort of an eval construct “cannot be parsed” statically, because “that which is to be parsed” is determined at runtime. So it really doesn't take computer science to prove that. Furthermore, let the record show that when you input any Perl program into the Perl compiler, it will be parsed, and you'll either get results or error-messages. Bottom line is: what's the point here? What are you saying about Perl, and about what (if anything) should be done to address this “issue?” What does this formality actually do for us as practitioners who make our daily bread from this tool?

This is nonsense. Eval does not preclude parsing, unless results of evaluating something can alter the parse of another part.

There are many languages that have "eval" and yet have unambiguous parse trees for the actual code of the program (as opposed to the code evaluated at runtime). If you run this Python program you'll see it prints out a complete parse tree of the program BEFORE it runs eval. Before you jump down my throat, I'm not saying that this makes Python a better language than Perl. It's just a fact that a language can be constructed so that "eval" is totally handled at runtime and does not affect the parsability of the program. import sys import parser print parser.suite(open(__file__).read()).totuple() x = eval(sys.stdin.readline()) if x: print "yes", x else: print "no", x [download] To bring this back to the original post, the "parser" module is the Python equivalent of what Parse::Perl would be if Perl parsing were possible.

He is saying that in order to statically parse an eval block, you must know when_it_halts - or in otherwords - what it does; this is impossible with out actually running the eval block, and is clearly related to the halting problem.

Acme is the 'joke' department for cpan modules, a place to prank and have fun. I wouldn't use it as part of a paper.

I know my post is a little late...



Maybe we have a disctinction in our definition of "static parsing", but I think that no language can actually be statically unparseable as long there is a finite number of parsing rules. Under this condition, there should be a finite number of possible derivations for a certain number of generated symbols. In Kennedy's example :



whatever / 25 ; # / ; die "this dies!";



Two interpretations are possible. There should not be any problem if the compiler find the two ways and produces a check branching to the two interpretations. Something like this:

if (is_nullary(whatever)) { whatever / 25; } else { whatever (/ 25 ; # /); die "this dies!"; } When we think about it, the Perl interpreter already does this. So technicaly, parsing Perl is not a undecidable problem. You only proved that there could not be a single way to parse Perl - and yes, it is a problem for text editor's syntax-highlighting.

It's never too late. Your workaround assumes that somehow is_nullary can determine, if perhaps only at run time, whether whatever is nullary or not. But an arbitrary function's prototype is not in general decidable, even at run time. Here's some hand-waving, which I hope can render the basic idea plausible: Suppose you can determine whether a piece of arbitrary code establishes (or not) a nullary prototype for whatever . Since the code is arbitary, it can contain lots of stuff, so you've said you can do more or less a complete analysis of what Perl code does. That of course includes whether it ever finishes or not. So you've also solved the Halting Problem. But that you cannot do. The two issues -- Halting Problem and undecidability, are intimately connected. Perl is unusual in allowing Turing-equivalent computing before the parse is determined, and in allowing this pre-parse computing to do things which will affect the parse. That's why Perl parsing is undecidable, while the parses for most languages are decidable. I've just finished the second of a series of articles in The Perl Review on undecidability in the Perl context. In TPR I lay the proof out more slowly than I did here in perlmonks. My discussion in TPR is aimed at Perl programmers who may not have had any interest in Theory for its own sake. Eventually, I'll put that series of articles on the Internet. This is a genuinely difficult area. Even those academics who've mastered the notation and memorized the results, have a very hard time with the ideas. This is why IMHO these topics so often are poorly explained.

Your workaround assumes that somehow is_nullary can determine, if perhaps only at run time, whether whatever is nullary or not. But this is an entirely different problem. No one claims that calling halt() doesn't make any sense just because it is statically undecidable. This is true for arbitrary programs ( functions ) in a Turing complete language and no one claims that function composition is useless because we can never know whether all of them terminate. The implicit assumption of your article is that the parser derives one and only one parse tree ( solves all ambiguities ) from an expression and only this assumption has to be given up. The assertion becomes much weaker then: A Perl parser can't resolve all ambiguities of a Perl program by context sensitive analysis and the solution to this to derive multiple parse trees for an expression and wrap them into a common node that can represent a conditional expression for example.

Type inference (and as a consequence parsing?) in ML 'and I suspect many other functional languages, is complete in deterministic expotential time. For any practical reasons that's as close to undecidability as possible.

No, you can parse SML without doing type inference. The (only) tricky part of SML parsing is operator fixity, which doesn't depend on types at all. And yes, SML type inference is superexponential, but only for contrived examples. So for "any practical reasons", it's not even remotely close to undecidability.

I'm confused by this proof and I don't buy it, especially since the definitions, including that of "parsing," are too imprecise to warrant some reduction proof. You also don't need reduction to show the existence of an example, which is what you're essentially trying to do. You seem to be making the claim that when you have code BEGIN { x(); sub foo { } } [download] then foo() will be declared only when x() terminates. But that's like writing this Java code for(;;) { } int x; [download] and asking whether "x" will ever be declared, which says absolutely nothing about how this program is parsed (and it's parsed fine). Cheers.

In a series of three articles in The Perl Review, now available online, I laid this proof out much more carefully, in three different versions. The most bullet-proof version derives the result directly from Rice's Theorem. You are right that the proof would not go through if the only way of establishing a prototype for a Perl function was via a function definition. As I said in Perl Review (4.3, p. 28): I must show that I can use Turing-complete Perl code to determine the prototype of the dunno subroutine at compile time. A function definition won’t work. Summarizing from that article, there are at least two ways that the prototype of a subroutine can be established using Turing-complete Perl code. One is with symbol table manipulation, for example: BEGIN { *dunno = sub () { 3 } } A second way is to put the function definition into a string which is eval 'd in a BEGIN block. I believe clever monks will be able to think of others.

You seem to be making the claim that when you have code BEGIN { x(); sub foo { } } [download] then foo() will be declared only when x() terminates. That particular example has a flaw, but it's easy to demonstrate that the argument holds: use Modern::Perl; sub x { say "In x!" } BEGIN { x() } sub foo!bar { say 'In foo!bar!' } BEGIN { foo!bar() } [download] For extra fun, run it through B::Deparse.

Sorry I didn't get this at all, A binary executable called perl is called so when it can execute programs written in perl programming language. If perl(binary named so) cannot parse perl programs its not perl at all!!! perl exists, so it can parse every perl program ever written, or perl doesn't exit at all... And we all know perl exists.