hn-classics/9714046.md at master

9.6 KiB

Raw Permalink Blame History

created_at

title

url

author

points

story_text

comment_text

num_comments

story_id

story_title

story_url

parent_id

created_at_i

_tags

objectID

year

2015-06-14T07:21:10.000Z

Perl Cannot Be Parsed: A Formal Proof (2008)

http://www.perlmonks.org/?node_id=663393

__Joker

1434266470

story

author___Joker

story_9714046

9714046

2008

sub halts {
    my $machine = shift;
    my $input = shift;
    is_whatever_nullary(
       qq{
          run_turing_equivalent("\Q$machine\E", "\Q$input\E");
          sub whatever() {};
       }
    )
}
[download]

is_whatever_nullary() can certainly easily return a true value without having to execute run_turing_equivalent(). The whatever prototype is a compile-time property and the run-time impact of whatever run_turing_equivalent() does will not change the fact that whatever() is "nullary" at compile time.

But it is quite obvious that Perl code can't be reliably parsed without running Perl code by the simple example of:

BEGIN {
    if(  0.5 < rand()  ) {
        eval "sub whatever() { }; 1" or die $@;
    } else {
        eval "sub whatever { }; 1" or die $@;
    }
}
whatever  / 25 ; # / ; die "this dies!";
[download]

You can, of course, replace the above conditional with some high-falutin' comp sci construct of your choosing. But I find that such just detracts from the obviousness. You have to run "0.5 < rand()" in order to parse the last line of the above Perl code (which perl can parse differently each time that it is run, as shown below).

 > perl -MO=Deparse above.pl
# ...
whatever / 25;
- syntax OK

 > perl -MO=Deparse above.pl
# ...
whatever(/ 25 ; # /);
die 'this dies!';
- syntax OK
[download]

- tye

sub halts {
    my $machine = shift;
    my $input = shift;
    is_whatever_nullary(
       qq{
          BEGIN {
               run_turing_equivalent("\Q$machine\E", "\Q$input\E");
               sub whatever() {};
          }
       }
    )
}
[download]

Using rand() is indeed more obvious and it was how I tested my code snippets (I lent my Halting Oracle to a friend and she never returned it ... but don't get me started.) I did not use it in the presentation because while it makes the proof more obvious, it's a proof of a weaker theorem.

If you interpret rand() as truly random (and not as pseudo-random), then we're dealing with non-deterministic programs. Someone might then say, "as long as there is no non-determinism while compiling, Perl 5 is statically parseable." But it's not and a proof using the Turing machine simulator shows it is not. That Perl 5 is unparseable even when the code is completely deterministic is a much stronger result, and the distinction makes a difference in practice.

The Turing simulation in this case is brought in for very practical reasons. I can't claim the credit for that. Adam Kennedy's hint took me in this direction. I suspect he also knew the business about rand(). But with his many hours of real life experience trying to statically parse Perl, he focused on the stronger proof -- the one that would give him the most information about what he was up against in creating PPI.

BEGIN {
    my $proto= "";
    $proto= "()"
        if  run_some_perl_code();
    eval "sub whatever$proto { }";
}
whatever / 6; # /;
[download]

The fact that run_some_perl_code() can be any arbitrary Perl code is quite clear. I guess this then boils down to the argument that somehow static analysis could be used to predict the results of run_some_perl_code() without actually running it. But the rand example clearly shows that not to be the case. As does using if @ARGV or if <STDIN> =~ /^y/i or if LWP::Simple::get("http://perlmonks.org/secretAlgorithm.pl") =~ /true/ or if unlink "/etc/passwd" or if find_prime_of_length( ~0 ) % 3 == 1.

Sure, the fact that run_some_perl_code() could be trying to solve the halting problem also demonstrates something. Of course, if I'm writing Perl support for an IDE, I'm not seriously concerned that the declaration of function prototypes is being based on answers to instances of the halting problem, so I'm likely to just dismiss your proof (based on your own reaction to the rand case). I find "You might have to run arbitrary Perl code" to be both a stronger statement of the result and a more interesting and useful one.

I also think there is a flaw in your proof in that it confuses solving the halting problem with running code that may or may not halt. So your conclusion would need to be more like "Here is some code that will either determine that whatever() is nullary or that will never terminate." Since just because the code has been taking 400 years doesn't mean that you can determine that it will never halt, so your code doesn't even have the potential of solving the halting problem.

So the lemma that is needed here is that "Static analysis of a program in a Turing-complete language (along with its 'input') can be insufficient to determine its output". Frankly, this is where it becomes uninteresting to me.

(Minor updates applied during the 15 minutes after first posting.)

- tye

eval $ARGV[ 0 ];
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

my $x;
BEGIN { $x = time }
[download]

Unless the BEGIN block is executed both at compile time and once at pre-run time.

Of course, I might be a few months out of date. I got bored following all the Perl 6 stuff, so I'm working on Smalltalk for a while until Perl6 gets a bit closer to a release.

-- Randal L. Schwartz, Perl hacker

multi sub prefix:<foo> (Str $a) {
     return $a eq "foo" ?? "bar" !! "foo";
}
[download]

This will define an unary prefix operator that takes one string as an argument. When somebody writes

foo "foo";
[download]

You can't know if that's a method call or a call to an operator. You'd think it makes no difference - but wait until precedence comes into the play:

BEGIN {
    eval 
    "
multi * prefix:<foo> (Str $a) is tighter(&infix:<~>) {
     return $a eq 'foo' ?? 'bar' !! 'foo';
    ";
}

# and later in the code:
foo "fo" ~ "o";
[download]

If that foo parsed as a sub call the precedence is lower than that of the concatenation ~, and the other way round if it's parsed as an operator.

This is quite a complicated example, but there are much easier ones with "real" macros. (But since no implemenation really implements macros by now, I don't really know much about them).

foo =+= bar //|/ baz =+= qux
[download]

The parsing of this expression depends on the relative precedence of the two operators and possibily, but not necessarily, on the associativity of =+= It may also result in an error, if //|/ has higher precedence and =+= is declared as non-associative.

However, the parsing of this does not depend on executing any code in the sense that the OP means, I think. That is, there must be a static declaration somewhere else in the code that says what the properties of these two operators should be. It need not lexically precede the usage, but it has to appear somewhere in the relevant scope, and it cannot be dynamically generated in any way. How would that fit in?

if (is_nullary(whatever))
{
  whatever / 25;
}
else
{
  whatever (/ 25 ; # /);
  die "this dies!";
}

When we think about it, the Perl interpreter already does this. So technicaly, parsing Perl is not a undecidable problem. You only proved that there could not be a single way to parse Perl - and yes, it is a problem for text editor's syntax-highlighting.

import sys
import parser

print parser.suite(open(__file__).read()).totuple()

x = eval(sys.stdin.readline())

if x:
        print "yes", x
else:
        print "no", x
[download]

To bring this back to the original post, the "parser" module is the Python equivalent of what Parse::Perl would be if Perl parsing were possible.

  BEGIN {
    x();
    sub foo { }
  }
[download]

then foo() will be declared only when x() terminates.

But that's like writing this Java code

  for(;;) { }
  int x;
[download]

and asking whether "x" will ever be declared, which says absolutely nothing about how this program is parsed (and it's parsed fine).

Cheers.

use Modern::Perl;

sub x { say "In x!" }

BEGIN { x() }

sub foo!bar { say 'In foo!bar!' }

BEGIN { foo!bar() }
[download]

For extra fun, run it through B::Deparse.

9.6 KiB Raw Permalink Blame History

9.6 KiB

Raw Permalink Blame History