diff --git a/content/post/2023-01-15-advent-of-code.md b/content/post/2023-01-15-advent-of-code.md index 6cc670d..4723fb1 100644 --- a/content/post/2023-01-15-advent-of-code.md +++ b/content/post/2023-01-15-advent-of-code.md @@ -1,8 +1,8 @@ --- title: "Advent of Parsers" -date: 2023-01-15T16:30:26+01:00 +date: 2023-03-18T16:30:26+01:00 hero: /content/images/2023/01/2023-01-15-banner.jpg -excerpt: If the Advent of Code is mainly about parsing inputs, why not solve it only using parsers? +excerpt: If the Advent of Code is mainly about parsing inputs, why not solve it only using parsers? A slightly too detailed introduction to compiler frontends. slug: "advent-of-parsers" tags: ["programming", "compiler"] authors: ["felix"] @@ -11,7 +11,7 @@ draft: false For the past years, I've been challenging myself to solve the [Advent of Code](https://adventofcode.com/2022/about) challenges in the most cumbersome ways: Using a different language each day or by using only C/C++[^php]. -This year, I originally didn't want to participate at all as I had a lot to do at work and little time to spare. +Last year, I originally didn't want to participate at all as I had a lot to do at work and little time to spare. But as I left for the christmas holidays, I finally had some time to unwind from thinking about compilers all day. And then it hit me: Couldn't I just [solve every task by writing a compiler](https://twitter.com/mycoliza/status/824809235632447492)? @@ -35,7 +35,7 @@ Both are by now tried and tested tools for writing parsers and lexers, respectiv And, true to the spirit of the challenge, they generate C or C++ code, even though the latter doesn't compile. But how does my self-inflicted tech stack work? -Let me give you a quick (and maybe over-simplified -- please don't roast me, dear colleagues) primer on parsing input the _proper_ way (using C and flex+bison). +Let me give you an overly long (but still overly simplified -- please don't roast me, dear colleagues) primer on parsing input the _proper_ way (using C and flex+bison). ### A Running Example @@ -48,7 +48,7 @@ The CPU has exactly one internal register `X` and features 2 operations: - `addx val`: modifies the internal register by adding the value to it. This operation takes 2 cycles to complete. - `noop`: sleeps for one CPU cycle. -The first task is then to take your puzzle input in the form of a bit over 100 lines of instructions and parse it. +The first task is then to take your [puzzle input](https://code.dummyco.de/feliix42/aoc-2022/src/commit/841b9ec20bdf828f721c12675564e571cc51d9ad/day_10/input.txt) in the form of a bit over 100 lines of instructions and parse it. You are supposed to take the register contents during the 20th cycle, multiply it with the cycle count and repeat the process every 40 cycles. All results then have to be summed to form your solution. @@ -77,7 +77,7 @@ noop ``` Every input possibly encountered by the lexer must be described using a rule, otherwise we'll run into errors. -First, we define a few shorthands which we can then use in our lexing rules (for small examples like this it's a bit overkill but at least it's good style): +First, we define a few shorthands ([full source file here](https://code.dummyco.de/feliix42/aoc-2022/src/commit/841b9ec20bdf828f721c12675564e571cc51d9ad/day_10/lexer.l)) which we can then use in our lexing rules (for small examples like this it's a bit overkill but at least it's good style): ```flex NOP "noop" @@ -127,7 +127,7 @@ Note, that `yytext` is a variable exposed by `flex` itself. It contains the character string that matched the rule on the left-hand side. We then assign the parsed number to `yylval`. But where are the tokens we emit and the `yylval` variable defined? -Well, they're both coming from the parser file in `bison` and must come from there. +Well, they're both coming from the parser file in `bison` and must be defined there. This is, in my opinion, one of the great hurdles when starting out with these tools: They are so deeply intertwined that learning them can result in a lot of headaches as things usually don't work as you expect them to at first. For example: I originally tried to use C++ this time for at least some Quality of Life improvements over pure C. @@ -138,12 +138,12 @@ The last rules in our lexer declare that we ignore any spaces (we're not in Pyth The final rule matches on all remaining lexemes and emits an error message to inform the user of a syntax error occuring. With that, we defined the lexer appropriately. -The full file (containing all set-up instructions and options) can be found [here](). +The full file (containing all set-up instructions and options) can be found [here](https://code.dummyco.de/feliix42/aoc-2022/src/commit/841b9ec20bdf828f721c12675564e571cc51d9ad/day_10/lexer.l). ### Bring in the Grammar! -If reading that section heading gave you bap flashbacks to your language lessons, don't worry. +If reading that section heading gave you bad flashbacks to your language lessons, don't worry. If it gave you flashbacks to your formal systems lectures, I have bad news. Job of a parser is to assign semantics to the tokens we produced in the previous step, and these semantics are expressed using a formal _grammar_. @@ -175,7 +175,7 @@ Now we can obviously start writing an absolute unit of an if-else chain matching But, as you might have guessed, the problem of "parsing things" has been solved a long time ago and there exist solutions that don't want to make you pull out your hair in agony. So, we lay our eyes upon the holy grail that has been gifted to us by [Richard M. Stallman](https://rms.sexy) himself[^rms]: `bison`. [GNU Bison](https://en.wikipedia.org/wiki/GNU_Bison) is a [parser generator](https://en.wikipedia.org/wiki/Compiler-compiler) that allows us to give it a grammar definition written in BNF, from which a parser is generated. -The choice for this tool was mainly motivated by the fact that I used both `flex` and `bison` to implement my own shell[^shell]. +The choice for this tool was mainly motivated by the fact that I used both `flex` and `bison` in the past to implement my own shell[^shell]. [^actually]: For actual programming languages, these grammars are [quite complex](https://github.com/jorendorff/rust-grammar/blob/master/Rust.g4). In our case however, the individual grammars for each day's tasks are rather simple. [^rms]: Actually, `bison` was written by Robert Corbett. Richard Stallman just made it compatible to another parser generator named `yacc`. @@ -298,7 +298,7 @@ instruction As you can see, the rules look similar to the ones we defined before. And that's it already, there is our grammar to parse the whole task and solve part one automagically while processing the input. -We of course need some more setup code which you'll find in the complete `parser.y` file [here](). +We of course need some more setup code which you'll find in the complete `parser.y` file [here](https://code.dummyco.de/feliix42/aoc-2022/src/commit/841b9ec20bdf828f721c12675564e571cc51d9ad/day_10/parser.y). The last part we have to take care of is invoking the parser. But now, that's as easy as pie. We just write a quick main function at the end of our `bison` file that initializes our state and the lexer and runs the parser to end by invoking `yyparse` with the necessary arguments[^args]: @@ -369,21 +369,14 @@ The result is a derivation of our grammars' rules, a so-called _parse tree_ (ple By doing this seemingly stupid challenge, I realized that what I was doing was actually not too different from what you'd normally do in the advent of code. The parser framework just gives you the right set of tools to reason about the input (in the most cases, for some tasks it felt very forced). +So in a sense, using the parser infrastructure produces code that is a lot cleaner. +Also, writing the grammar rules was rather easy, as the tasks for each day are usually just a textual representation of the grammar. +Not all things can be solved during the parse, though. +A notable example for this is the challenge of [day 8](https://adventofcode.com/2022/day/8) which requires you to compute some properties on a square full of trees. +Here, most of the leg work has to be done after the parse as reasoning over the complete data structure is necessary. -- It's a lot cleaner (you got a grammar and all) +Overall, it was a very fun experience to do this challenge (once I discarded the idea of doing it in C++, of course). +It refreshed a lot of things I learned in the Compiler Construction lecture and was unconventional. +I'm looking forward to the next (i.e., this) year! -What I learned: -- C++ parsers are a major PITA -- especially if not even the official example compiles -- (almost) everything can be parsed - - sometimes it takes a bit of extra code - - natural limitations: day 3 -- Left-associative parsers: take care (LR(1)) -- input is mostly a representation of data -> something parsers are for - - -## Structure -- explain parser/lexer structure briefly -- walk through 1-2 examples? - - assembly interpreter - - shell output parser diff --git a/static/content/images/2023/01/parse-tree.png b/static/content/images/2023/01/parse-tree.png index 7bff3d1..13120f9 100644 Binary files a/static/content/images/2023/01/parse-tree.png and b/static/content/images/2023/01/parse-tree.png differ