395 lines
9.6 KiB
Markdown
395 lines
9.6 KiB
Markdown
---
|
||
created_at: '2015-06-14T07:21:10.000Z'
|
||
title: 'Perl Cannot Be Parsed: A Formal Proof (2008)'
|
||
url: http://www.perlmonks.org/?node_id=663393
|
||
author: __Joker
|
||
points: 57
|
||
story_text: ''
|
||
comment_text:
|
||
num_comments: 12
|
||
story_id:
|
||
story_title:
|
||
story_url:
|
||
parent_id:
|
||
created_at_i: 1434266470
|
||
_tags:
|
||
- story
|
||
- author___Joker
|
||
- story_9714046
|
||
objectID: '9714046'
|
||
year: 2008
|
||
|
||
---
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
sub halts {
|
||
my $machine = shift;
|
||
my $input = shift;
|
||
is_whatever_nullary(
|
||
qq{
|
||
run_turing_equivalent("\Q$machine\E", "\Q$input\E");
|
||
sub whatever() {};
|
||
}
|
||
)
|
||
}
|
||
[download]
|
||
```
|
||
|
||
is\_whatever\_nullary() can certainly easily return a true value without
|
||
having to execute run\_turing\_equivalent(). The whatever prototype is a
|
||
compile-time property and the run-time impact of whatever
|
||
run\_turing\_equivalent() does will not change the fact that whatever()
|
||
is "nullary" at compile time.
|
||
|
||
But it is quite obvious that Perl code can't be reliably parsed without
|
||
running Perl code by the simple example of:
|
||
|
||
``` code
|
||
BEGIN {
|
||
if( 0.5 < rand() ) {
|
||
eval "sub whatever() { }; 1" or die $@;
|
||
} else {
|
||
eval "sub whatever { }; 1" or die $@;
|
||
}
|
||
}
|
||
whatever / 25 ; # / ; die "this dies!";
|
||
[download]
|
||
```
|
||
|
||
You can, of course, replace the above conditional with some
|
||
high-falutin' comp sci construct of your choosing. But I find that such
|
||
just detracts from the obviousness. You have to run "`0.5 < rand()`" in
|
||
order to parse the last line of the above Perl code (which perl can
|
||
parse differently each time that it is run, as shown below).
|
||
|
||
``` code
|
||
> perl -MO=Deparse above.pl
|
||
# ...
|
||
whatever / 25;
|
||
- syntax OK
|
||
|
||
> perl -MO=Deparse above.pl
|
||
# ...
|
||
whatever(/ 25 ; # /);
|
||
die 'this dies!';
|
||
- syntax OK
|
||
[download]
|
||
```
|
||
|
||
\- [tye](?node=tye)` `
|
||
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
sub halts {
|
||
my $machine = shift;
|
||
my $input = shift;
|
||
is_whatever_nullary(
|
||
qq{
|
||
BEGIN {
|
||
run_turing_equivalent("\Q$machine\E", "\Q$input\E");
|
||
sub whatever() {};
|
||
}
|
||
}
|
||
)
|
||
}
|
||
[download]
|
||
```
|
||
|
||
Using *rand()* is indeed more obvious and it was how I tested my code
|
||
snippets (I lent my Halting Oracle to a friend and she never returned it
|
||
... but don't get me started.) I did not use it in the presentation
|
||
because while it makes the proof more obvious, it's a proof of a weaker
|
||
theorem.
|
||
|
||
If you interpret *rand()* as truly random (and not as pseudo-random),
|
||
then we're dealing with non-deterministic programs. Someone might then
|
||
say, "as long as there is no non-determinism while compiling, Perl 5 is
|
||
statically parseable." But it's not and a proof using the Turing machine
|
||
simulator shows it is not. That Perl 5 is unparseable even when the code
|
||
is completely deterministic is a much stronger result, and the
|
||
distinction makes a difference in practice.
|
||
|
||
The Turing simulation in this case is brought in for very practical
|
||
reasons. I can't claim the credit for that. Adam Kennedy's hint took me
|
||
in this direction. I suspect he also knew the business about *rand()*.
|
||
But with his many hours of real life experience trying to statically
|
||
parse Perl, he focused on the stronger proof -- the one that would give
|
||
him the most information about what he was up against in creating
|
||
[PPI](http://search.cpan.org/perldoc?PPI).
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
BEGIN {
|
||
my $proto= "";
|
||
$proto= "()"
|
||
if run_some_perl_code();
|
||
eval "sub whatever$proto { }";
|
||
}
|
||
whatever / 6; # /;
|
||
[download]
|
||
```
|
||
|
||
The fact that run\_some\_perl\_code() can be any arbitrary Perl code is
|
||
quite clear. I guess this then boils down to the argument that somehow
|
||
static analysis could be used to predict the results of
|
||
run\_some\_perl\_code() without actually running it. But the
|
||
[rand](?node=rand) example clearly shows that not to be the case. As
|
||
does using `if @ARGV` or `if <STDIN> =~ /^y/i` or `if
|
||
LWP::Simple::get("http://perlmonks.org/secretAlgorithm.pl") =~ /true/`
|
||
or `if unlink "/etc/passwd"` or `if find_prime_of_length( ~0 ) % 3
|
||
== 1`.
|
||
|
||
Sure, the fact that run\_some\_perl\_code() could be trying to solve the
|
||
halting problem also demonstrates something. Of course, if I'm writing
|
||
Perl support for an IDE, I'm not seriously concerned that the
|
||
declaration of function prototypes is being based on answers to
|
||
instances of the halting problem, so I'm likely to just dismiss your
|
||
proof (based on your own reaction to the [rand](?node=rand) case). I
|
||
find "You might have to run arbitrary Perl code" to be both a stronger
|
||
statement of the result and a more interesting and useful one.
|
||
|
||
I also think there is a flaw in your proof in that it confuses solving
|
||
the halting problem with running code that may or may not halt. So your
|
||
conclusion would need to be more like "Here is some code that will
|
||
either determine that whatever() is nullary or that will never
|
||
terminate." Since just because the code has been taking 400 years
|
||
doesn't mean that you can determine that it will never halt, so your
|
||
code doesn't even have the potential of solving the halting problem.
|
||
|
||
So the lemma that is needed here is that "Static analysis of a program
|
||
in a Turing-complete language (along with its 'input') can be
|
||
insufficient to determine its output". Frankly, this is where it becomes
|
||
uninteresting to me.
|
||
|
||
(Minor updates applied during the 15 minutes after first posting.)
|
||
|
||
\- [tye](?node=tye)` `
|
||
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
eval $ARGV[ 0 ];
|
||
[download]
|
||
```
|
||
|
||
Examine what is said, not who speaks -- Silence betokens consent -- Love
|
||
the truth but pardon error.
|
||
|
||
"Science is about questioning the status quo. Questioning authority".
|
||
|
||
In the absence of evidence, opinion is indistinguishable from prejudice.
|
||
|
||
["Too many \[\] have been sedated by an oppressive environment of
|
||
political correctness and risk
|
||
aversion."](http://news.bbc.co.uk/1/hi/education/6202877.stm)
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
my $x;
|
||
BEGIN { $x = time }
|
||
[download]
|
||
```
|
||
|
||
Unless the BEGIN block is executed both at compile time and once at
|
||
pre-run time.
|
||
|
||
Of course, I might be a few months out of date. I got bored following
|
||
all the Perl 6 stuff, so I'm working on
|
||
[Smalltalk](http://methodsandmessages.vox.com) for a while until Perl6
|
||
gets a bit closer to a release.
|
||
|
||
\-- [Randal L. Schwartz, Perl hacker](http://www.stonehenge.com/merlyn/)
|
||
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
multi sub prefix:<foo> (Str $a) {
|
||
return $a eq "foo" ?? "bar" !! "foo";
|
||
}
|
||
[download]
|
||
```
|
||
|
||
This will define an unary prefix operator that takes one string as an
|
||
argument. When somebody writes
|
||
|
||
``` code
|
||
foo "foo";
|
||
[download]
|
||
```
|
||
|
||
You can't know if that's a method call or a call to an operator. You'd
|
||
think it makes no difference - but wait until precedence comes into the
|
||
play:
|
||
|
||
``` code
|
||
BEGIN {
|
||
eval
|
||
"
|
||
multi * prefix:<foo> (Str $a) is tighter(&infix:<~>) {
|
||
return $a eq 'foo' ?? 'bar' !! 'foo';
|
||
";
|
||
}
|
||
|
||
# and later in the code:
|
||
foo "fo" ~ "o";
|
||
[download]
|
||
```
|
||
|
||
If that `foo` parsed as a sub call the precedence is lower than that of
|
||
the concatenation `~`, and the other way round if it's parsed as an
|
||
operator.
|
||
|
||
This is quite a complicated example, but there are much easier ones with
|
||
"real" macros. (But since no implemenation really implements macros by
|
||
now, I don't really know much about them).
|
||
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
foo =+= bar //|/ baz =+= qux
|
||
[download]
|
||
```
|
||
|
||
The parsing of this expression depends on the relative precedence of the
|
||
two operators and possibily, but not necessarily, on the associativity
|
||
of =+= It may also result in an error, if //|/ has higher precedence and
|
||
=+= is declared as non-associative.
|
||
|
||
However, the parsing of this does not depend on executing any code in
|
||
the sense that the OP means, I think. That is, there must be a static
|
||
declaration somewhere else in the code that says what the properties of
|
||
these two operators should be. It need not lexically precede the usage,
|
||
but it has to appear somewhere in the relevant scope, and it cannot be
|
||
dynamically generated in any way. How would that fit in?
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
if (is_nullary(whatever))
|
||
{
|
||
whatever / 25;
|
||
}
|
||
else
|
||
{
|
||
whatever (/ 25 ; # /);
|
||
die "this dies!";
|
||
}
|
||
|
||
When we think about it, the Perl interpreter already does this. So
|
||
technicaly, parsing Perl is not a undecidable problem. You only proved
|
||
that there could not be a single way to parse Perl - and yes, it is a
|
||
problem for text editor's syntax-highlighting.
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
import sys
|
||
import parser
|
||
|
||
print parser.suite(open(__file__).read()).totuple()
|
||
|
||
x = eval(sys.stdin.readline())
|
||
|
||
if x:
|
||
print "yes", x
|
||
else:
|
||
print "no", x
|
||
[download]
|
||
```
|
||
|
||
To bring this back to the original post, the "parser" module is the
|
||
Python equivalent of what Parse::Perl would be if Perl parsing were
|
||
possible.
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
BEGIN {
|
||
x();
|
||
sub foo { }
|
||
}
|
||
[download]
|
||
```
|
||
|
||
then foo() will be declared only when x() terminates.
|
||
|
||
But that's like writing this Java code
|
||
|
||
``` code
|
||
for(;;) { }
|
||
int x;
|
||
[download]
|
||
```
|
||
|
||
and asking whether "x" will ever be declared, which says absolutely
|
||
nothing about how this program is parsed (and it's parsed fine).
|
||
|
||
Cheers.
|
||
|
||
<!-- end list -->
|
||
|
||
<!-- end list -->
|
||
|
||
``` code
|
||
use Modern::Perl;
|
||
|
||
sub x { say "In x!" }
|
||
|
||
BEGIN { x() }
|
||
|
||
sub foo!bar { say 'In foo!bar!' }
|
||
|
||
BEGIN { foo!bar() }
|
||
[download]
|
||
```
|
||
|
||
For extra fun, run it through
|
||
[B::Deparse](http://search.cpan.org/search?mode=module&query=B%3A%3ADeparse).
|
||
|
||
<!-- end list -->
|