[10] Notes for the compiler, part 2

This commit is contained in:
Nemo 2020-07-07 03:37:24 +05:30
parent b0aa03980c
commit a6d3e1f6bd
3 changed files with 62 additions and 15 deletions

View File

@ -107,7 +107,64 @@ I can definitely solve it, but I want to do it properly. I've also realized why
4. Static Typing
Update: A draft for this is at https://github.com/captn3m0/captn3m0.github.com/blob/master/_drafts/what-php-gets-right.md
## Compilation Engine
I'm hard-coding stuff a lot, with a lot of asserts
would be nice once I have structure to actually generate the rules from the GRAMMAR
This took a long time. I attempted to write something that worked on pure python builtins for grammar declaration, and that stumped me for a while. Important insights:
1. Extending native classes (Enum, list, dict) is very easy in Python
2. Defining the right data structures for a parser is very important (things like Any/Many/OneOrMore)
I'm using dictionaries for lookahead, which looks perfect for LL0 grammar, but when it gets to recursive terms, it gets much more trickier. The entire parser uses a `matchOnly` mode for various calls, which lets me ensure that only matches happens without any cursor being advanced.
How do you handle recursive declarations in Pythonic grammar? I cheat by using lambdas:
```python
IF_STATEMENT = Element('ifStatement', Sequence([
Atom.PAREN_OPEN,
EXPRESSION,
Atom.PAREN_CLOSE,
Atom.BRACE_OPEN,
lambda: STATEMENTS,
Atom.BRACE_CLOSE,
# This is the tricky one
( Atom.ELSE, Atom.BRACE_OPEN, lambda:STATEMENTS, Atom.BRACE_CLOSE)
]))
```
I use various things for various usecases:
- a token (denoted by a Atom enum)
- a bitwise mask of the Atom enum to denote multiple possibilities
- Another list, to denote zero-or-more of a inner-sequence
- A tuple, to denote zero-or-one of a inner-sequence
- A lambda denotes a recursive call, so we follow that accordingly
- An instance of the Element class that is handled recursively
- An instance of the Sequence class that is exactly the grammar sequence as in the list (Sequence extends from list)
- A dictionary, which must have a tuple as its keys, which are used for the lookup. The follow-up sequence is then given by the dictionary values.
As another example, this is how I define statement:
```python
STATEMENT = {
(Atom.LET,): LET_STATEMENT,
(Atom.IF,): IF_STATEMENT,
(Atom.WHILE,): WHILE_STATEMENT,
(Atom.DO,): DO_STATEMENT,
(Atom.RETURN,): RETURN_STATEMENT
}
```
I'm not very happy with this, since this results in a stupid edge case where the let keyword is parsed before the let statement is opened, and I have to deal with it.
If I get time, I'd like to improve on the following:
- [ ] Create a proper `Any` class, and use that. I attempted this a bit, but didn't get too far
- [ ] Remove the MatchDict implementation, it isn't nice, replace it with Any
- [ ] Implement ZeroOrMany and ZeroOrOne as classes, and define their behaviour within the Compile method
- [ ] Write a BNF to the pythonic-flavored-grammar (what I've described above) convertor.
- [ ] Better exceptions and forceful errors, instead of failing quietely. If the parser expects an atom, and doesn't find it - it should error out
I could have made this a lot easier by allowing "rewind" and dealing with the entire list of tokens as a list (so I could do tokens[current-1] for eg), but I was trying to avoid that.

View File

@ -148,6 +148,6 @@ Final hack instruction set count in brackets as before.
### Parser (Compilation Engine)
- [ ] Square
- [ ] ExpressionLessSquare
- [ ] TestArray
- [x] Square
- [x] ExpressionLessSquare
- [x] TestArray

View File

@ -27,7 +27,6 @@ class Engine:
self.jt.advance()
def ZeroOrMany(self, grammarList, matchOnly):
# print("ZOM called")
ret = self.compile(grammarList[0], matchOnly)
if matchOnly:
return ret
@ -78,10 +77,6 @@ class Engine:
self.advance()
print(lookup_keys)
print("grammar inside matchDict ")
print(grammar)
# Grammar can be none
if grammar:
self.compile(grammar)
@ -116,7 +111,6 @@ class Engine:
self.advance()
return True
else:
print("%s != %s" % (current, expected))
return False
def open(self, el):
@ -137,10 +131,6 @@ class Engine:
elif isinstance(grammar, Element):
ret = self.compile(grammar.grammar, True)
if grammar.name == 'term':
print(ret)
print(self.atom())
if (matchOnly == False and ret) or grammar.empty:
self.open(grammar)
# Avoid useless compilation