Adds NOTES for compiler implementation

This commit is contained in:
Nemo 2020-06-16 03:20:29 +05:30
parent 19f0d670ac
commit cece143368
3 changed files with 36 additions and 9 deletions

View File

@ -81,3 +81,27 @@ Learnt quite a lot. Interesting gotchas:
1. Stack manipuation is hard. Keeping track of registers is hard. I was going by the diagrams which always have "arguments" going from 0..n, which screws up the one case where you don't have arguments for a function, and ARG points to the same location where the return address is stored. In case the VM writes the return value to ARG[0], and you have zero arguments - it will also overwrite the return address, and your whole stack will go haywire (I got cool designs on my screen because of this). 1. Stack manipuation is hard. Keeping track of registers is hard. I was going by the diagrams which always have "arguments" going from 0..n, which screws up the one case where you don't have arguments for a function, and ARG points to the same location where the return address is stored. In case the VM writes the return value to ARG[0], and you have zero arguments - it will also overwrite the return address, and your whole stack will go haywire (I got cool designs on my screen because of this).
2. I got into a weird state where FibonacciElement test was passing, but the SimpleFunction was failing for me. Ended up wasting a lot of time reverting back and forth to figure out the differences. If you're stuck here, check the [project page](https://www.nand2tetris.org/project08) for details on the intermediate `NestedCall.vm` testcase, which comes with a [detailed survival guide](https://www.nand2tetris.org/copy-of-hdl-survival-guide) and RAM states at various points in the call history: https://www.nand2tetris.org/copy-of-nestedcall. 2. I got into a weird state where FibonacciElement test was passing, but the SimpleFunction was failing for me. Ended up wasting a lot of time reverting back and forth to figure out the differences. If you're stuck here, check the [project page](https://www.nand2tetris.org/project08) for details on the intermediate `NestedCall.vm` testcase, which comes with a [detailed survival guide](https://www.nand2tetris.org/copy-of-hdl-survival-guide) and RAM states at various points in the call history: https://www.nand2tetris.org/copy-of-nestedcall.
# Writing Jack (Chapter 9)
I thought of writing a Ultimate Tic Toe game in Jack, but decided against it in the interest of time. The book specifically asks you to treat the project as a learning exercise to get a feel for Jack, and not try to become an expert Jack programmer. Writing UTT would have resulted in a lot of Yak Shaving which I wanted to avoid. I instead spent the time working on [other projects](https://github.com/captn3m0/modernart).
# Compiler - Tokenizer
Writing the Tokenizer was fun. The hardest part was, surprisingly, comments. Since we're removing comments before we have a complete tokenization, parsing the following line becomes super-hard:
```java
/*Open a comment */ let s = "/** This is not a comment */"; /* But this is */ do a.b;
```
I decided to ignore such edge cases, and focus on getting the base ideas correct. I haven't corrected for either of the two issues:
- multiple multi-line comments on the same line
- multi-line comments inside of strings
I can definitely solve it, but I want to do it properly. I've also realized why I love PHP, and not Python so much -
1. The standard library is much _easier to use_. PHP is built for developer productivity first, and terseness doesn't matter. Examples: Creating a directory recursively in [PHP][https://www.php.net/manual/en/function.mkdir.php] vs [Python](https://stackoverflow.com/a/600612)
2. PHP's language documentation is aimed at users, while Python throws so much useless stuff at you. I'm yet to find a language documentation that rivals PHP, to be fair - but Python gets so much wrong. Searching for "condition" on Python docs gets you: a page on something called Condition Objects, Conditional Expressions, and "More on conditions" - none of which actually detail what are the conditional statements and how they work. Look at the [control structures](https://www.php.net/manual/en/language.control-structures.php) page on PHP website instead. Python docs also like talking about language implementation details too much. For eg, BNF notation is peppered throughout the docs. PHP on the other hand uses only one language in its docs - PHP.
3. Lack of examples in documentation. You're left to figure out so many things. PHP gets this correct, for every function in the standard library. If examples are missing, the comments will usually have them.
4. Static Typing

View File

@ -1,11 +1,19 @@
# nand2tetris # nand2tetris ![Status Badge](https://img.shields.io/badge/status-in%20progress-red)
Working my way through the [Nand to Tetris Course](https://www.nand2tetris.org/) Working my way through the [Nand to Tetris Course](https://www.nand2tetris.org/)
- Download the latest `nand2tetris.zip` from the book website, and overwrite everything in the `projects` and `tools` directory. - Download the latest `nand2tetris.zip` from the book website, and overwrite everything in the `projects` and `tools` directory.
- Remember to run `chmod +X tools/*.sh` if you're on \*nix. - Remember to run `chmod +X tools/*.sh` if you're on \*nix.
My notes are in [NOTES.md](NOTES.md). ## High level implementation notes
1. Projects 1-5 as is
2. Project 6 (Assembler) done in ruby with a port to Rust in progress
3. Project 7-8 (VM) done in PHP
4. Project 9 - Wrote a small 2 player Tic Tac Toe game. Plan to upgrade it to Ultimate Tic Tac Toe when I get time.
5. Project 10-11 - Writing the compiler in Python
Detailed notes documenting progress updates are in [NOTES.md](NOTES.md).
## [Project 1: Boolean Logic](https://www.nand2tetris.org/project01) ## [Project 1: Boolean Logic](https://www.nand2tetris.org/project01)

View File

@ -83,16 +83,13 @@ class JackTokenizer:
# If this line as a single line comment anywhere # If this line as a single line comment anywhere
# strip the line to start of // # strip the line to start of //
if line.find("//") != -1: if line.find("//") != -1:
# print("Starting single line comment on %s" % line)
line = line[:line.find("//")].strip() line = line[:line.find("//")].strip()
if self.insideMultiLineComment: if self.insideMultiLineComment:
if line.find("*/") == -1: if line.find("*/") == -1:
# print("Still inside multi line comment, continuing %s" % line)
# The comment doesn't end in this line # The comment doesn't end in this line
return [] return []
else: else:
# print("Closing multi line comment, continuing %s" % line)
self.insideMultiLineComment = False self.insideMultiLineComment = False
# comments ends here, huzzah! # comments ends here, huzzah!
line = line[:line.find("*/")].strip() line = line[:line.find("*/")].strip()
@ -102,13 +99,10 @@ class JackTokenizer:
elif line.find("/*") != -1: elif line.find("/*") != -1:
# The comment ends on the same line # The comment ends on the same line
if line.find("*/") != -1: if line.find("*/") != -1:
# TODO: This doesn't handle multiple multi-line comments on the same line
# TODO: this also breaks on /* inside strings :( # TODO: this also breaks on /* inside strings :(
# TODO: This also breaks on multiple multi-line comments on the same line
line = line[:line.find("/*")] + line[line.find("*/") + 2:].strip() line = line[:line.find("/*")] + line[line.find("*/") + 2:].strip()
# print("This line has a /* and */ %s" % line)
# print("This line has a /* and */ %s" % len(line))
else: else:
# print("Starting multi line comment on %s" % line)
line = line[:line.find("/*")].strip() line = line[:line.find("/*")].strip()
self.insideMultiLineComment = True self.insideMultiLineComment = True
@ -120,6 +114,7 @@ class JackTokenizer:
# 1. Keywords # 1. Keywords
# 2. Symbols # 2. Symbols
# 3. Identifiers # 3. Identifiers
# 4. Strings
regex = re.compile("(class|constructor|function|method|field|static|var|int|char|boolean|void|true|false|null|this|let|do|if|else|while|return|\(|\)|\[|\]|,|\+|-|;|<|>|=|~|&|{|}|\*|\/|\||\.|[a-zA-Z_]+\w*|\".*\")") regex = re.compile("(class|constructor|function|method|field|static|var|int|char|boolean|void|true|false|null|this|let|do|if|else|while|return|\(|\)|\[|\]|,|\+|-|;|<|>|=|~|&|{|}|\*|\/|\||\.|[a-zA-Z_]+\w*|\".*\")")
return [e.strip() for e in regex.split(line) if e != None and e.strip()!=''] return [e.strip() for e in regex.split(line) if e != None and e.strip()!='']