mirror of
https://github.com/Feliix42/dummyco.de.git
synced 2024-11-23 10:36:30 +00:00
finalize AoC post
This commit is contained in:
parent
71c2dd6a62
commit
729f06b254
2 changed files with 19 additions and 26 deletions
|
@ -1,8 +1,8 @@
|
||||||
---
|
---
|
||||||
title: "Advent of Parsers"
|
title: "Advent of Parsers"
|
||||||
date: 2023-01-15T16:30:26+01:00
|
date: 2023-03-18T16:30:26+01:00
|
||||||
hero: /content/images/2023/01/2023-01-15-banner.jpg
|
hero: /content/images/2023/01/2023-01-15-banner.jpg
|
||||||
excerpt: If the Advent of Code is mainly about parsing inputs, why not solve it only using parsers?
|
excerpt: If the Advent of Code is mainly about parsing inputs, why not solve it only using parsers? A slightly too detailed introduction to compiler frontends.
|
||||||
slug: "advent-of-parsers"
|
slug: "advent-of-parsers"
|
||||||
tags: ["programming", "compiler"]
|
tags: ["programming", "compiler"]
|
||||||
authors: ["felix"]
|
authors: ["felix"]
|
||||||
|
@ -11,7 +11,7 @@ draft: false
|
||||||
|
|
||||||
For the past years, I've been challenging myself to solve the [Advent of Code](https://adventofcode.com/2022/about) challenges in the most cumbersome ways:
|
For the past years, I've been challenging myself to solve the [Advent of Code](https://adventofcode.com/2022/about) challenges in the most cumbersome ways:
|
||||||
Using a different language each day or by using only C/C++[^php].
|
Using a different language each day or by using only C/C++[^php].
|
||||||
This year, I originally didn't want to participate at all as I had a lot to do at work and little time to spare.
|
Last year, I originally didn't want to participate at all as I had a lot to do at work and little time to spare.
|
||||||
But as I left for the christmas holidays, I finally had some time to unwind from thinking about compilers all day.
|
But as I left for the christmas holidays, I finally had some time to unwind from thinking about compilers all day.
|
||||||
And then it hit me: Couldn't I just [solve every task by writing a compiler](https://twitter.com/mycoliza/status/824809235632447492)?
|
And then it hit me: Couldn't I just [solve every task by writing a compiler](https://twitter.com/mycoliza/status/824809235632447492)?
|
||||||
|
|
||||||
|
@ -35,7 +35,7 @@ Both are by now tried and tested tools for writing parsers and lexers, respectiv
|
||||||
And, true to the spirit of the challenge, they generate C or C++ code, even though the latter doesn't compile.
|
And, true to the spirit of the challenge, they generate C or C++ code, even though the latter doesn't compile.
|
||||||
|
|
||||||
But how does my self-inflicted tech stack work?
|
But how does my self-inflicted tech stack work?
|
||||||
Let me give you a quick (and maybe over-simplified -- please don't roast me, dear colleagues) primer on parsing input the _proper_ way (using C and flex+bison).
|
Let me give you an overly long (but still overly simplified -- please don't roast me, dear colleagues) primer on parsing input the _proper_ way (using C and flex+bison).
|
||||||
|
|
||||||
|
|
||||||
### A Running Example
|
### A Running Example
|
||||||
|
@ -48,7 +48,7 @@ The CPU has exactly one internal register `X` and features 2 operations:
|
||||||
- `addx val`: modifies the internal register by adding the value to it. This operation takes 2 cycles to complete.
|
- `addx val`: modifies the internal register by adding the value to it. This operation takes 2 cycles to complete.
|
||||||
- `noop`: sleeps for one CPU cycle.
|
- `noop`: sleeps for one CPU cycle.
|
||||||
|
|
||||||
The first task is then to take your puzzle input in the form of a bit over 100 lines of instructions and parse it.
|
The first task is then to take your [puzzle input](https://code.dummyco.de/feliix42/aoc-2022/src/commit/841b9ec20bdf828f721c12675564e571cc51d9ad/day_10/input.txt) in the form of a bit over 100 lines of instructions and parse it.
|
||||||
You are supposed to take the register contents during the 20th cycle, multiply it with the cycle count and repeat the process every 40 cycles.
|
You are supposed to take the register contents during the 20th cycle, multiply it with the cycle count and repeat the process every 40 cycles.
|
||||||
All results then have to be summed to form your solution.
|
All results then have to be summed to form your solution.
|
||||||
|
|
||||||
|
@ -77,7 +77,7 @@ noop
|
||||||
```
|
```
|
||||||
|
|
||||||
Every input possibly encountered by the lexer must be described using a rule, otherwise we'll run into errors.
|
Every input possibly encountered by the lexer must be described using a rule, otherwise we'll run into errors.
|
||||||
First, we define a few shorthands which we can then use in our lexing rules (for small examples like this it's a bit overkill but at least it's good style):
|
First, we define a few shorthands ([full source file here](https://code.dummyco.de/feliix42/aoc-2022/src/commit/841b9ec20bdf828f721c12675564e571cc51d9ad/day_10/lexer.l)) which we can then use in our lexing rules (for small examples like this it's a bit overkill but at least it's good style):
|
||||||
|
|
||||||
```flex
|
```flex
|
||||||
NOP "noop"
|
NOP "noop"
|
||||||
|
@ -127,7 +127,7 @@ Note, that `yytext` is a variable exposed by `flex` itself.
|
||||||
It contains the character string that matched the rule on the left-hand side.
|
It contains the character string that matched the rule on the left-hand side.
|
||||||
We then assign the parsed number to `yylval`.
|
We then assign the parsed number to `yylval`.
|
||||||
But where are the tokens we emit and the `yylval` variable defined?
|
But where are the tokens we emit and the `yylval` variable defined?
|
||||||
Well, they're both coming from the parser file in `bison` and must come from there.
|
Well, they're both coming from the parser file in `bison` and must be defined there.
|
||||||
|
|
||||||
This is, in my opinion, one of the great hurdles when starting out with these tools: They are so deeply intertwined that learning them can result in a lot of headaches as things usually don't work as you expect them to at first.
|
This is, in my opinion, one of the great hurdles when starting out with these tools: They are so deeply intertwined that learning them can result in a lot of headaches as things usually don't work as you expect them to at first.
|
||||||
For example: I originally tried to use C++ this time for at least some Quality of Life improvements over pure C.
|
For example: I originally tried to use C++ this time for at least some Quality of Life improvements over pure C.
|
||||||
|
@ -138,12 +138,12 @@ The last rules in our lexer declare that we ignore any spaces (we're not in Pyth
|
||||||
The final rule matches on all remaining lexemes and emits an error message to inform the user of a syntax error occuring.
|
The final rule matches on all remaining lexemes and emits an error message to inform the user of a syntax error occuring.
|
||||||
|
|
||||||
With that, we defined the lexer appropriately.
|
With that, we defined the lexer appropriately.
|
||||||
The full file (containing all set-up instructions and options) can be found [here](). <!-- TODO -->
|
The full file (containing all set-up instructions and options) can be found [here](https://code.dummyco.de/feliix42/aoc-2022/src/commit/841b9ec20bdf828f721c12675564e571cc51d9ad/day_10/lexer.l).
|
||||||
|
|
||||||
|
|
||||||
### Bring in the Grammar!
|
### Bring in the Grammar!
|
||||||
|
|
||||||
If reading that section heading gave you bap flashbacks to your language lessons, don't worry.
|
If reading that section heading gave you bad flashbacks to your language lessons, don't worry.
|
||||||
If it gave you flashbacks to your formal systems lectures, I have bad news.
|
If it gave you flashbacks to your formal systems lectures, I have bad news.
|
||||||
Job of a parser is to assign semantics to the tokens we produced in the previous step, and these semantics are expressed using a formal _grammar_.
|
Job of a parser is to assign semantics to the tokens we produced in the previous step, and these semantics are expressed using a formal _grammar_.
|
||||||
|
|
||||||
|
@ -175,7 +175,7 @@ Now we can obviously start writing an absolute unit of an if-else chain matching
|
||||||
But, as you might have guessed, the problem of "parsing things" has been solved a long time ago and there exist solutions that don't want to make you pull out your hair in agony.
|
But, as you might have guessed, the problem of "parsing things" has been solved a long time ago and there exist solutions that don't want to make you pull out your hair in agony.
|
||||||
So, we lay our eyes upon the holy grail that has been gifted to us by [Richard M. Stallman](https://rms.sexy) himself[^rms]: `bison`.
|
So, we lay our eyes upon the holy grail that has been gifted to us by [Richard M. Stallman](https://rms.sexy) himself[^rms]: `bison`.
|
||||||
[GNU Bison](https://en.wikipedia.org/wiki/GNU_Bison) is a [parser generator](https://en.wikipedia.org/wiki/Compiler-compiler) that allows us to give it a grammar definition written in BNF, from which a parser is generated.
|
[GNU Bison](https://en.wikipedia.org/wiki/GNU_Bison) is a [parser generator](https://en.wikipedia.org/wiki/Compiler-compiler) that allows us to give it a grammar definition written in BNF, from which a parser is generated.
|
||||||
The choice for this tool was mainly motivated by the fact that I used both `flex` and `bison` to implement my own shell[^shell].
|
The choice for this tool was mainly motivated by the fact that I used both `flex` and `bison` in the past to implement my own shell[^shell].
|
||||||
|
|
||||||
[^actually]: For actual programming languages, these grammars are [quite complex](https://github.com/jorendorff/rust-grammar/blob/master/Rust.g4). In our case however, the individual grammars for each day's tasks are rather simple.
|
[^actually]: For actual programming languages, these grammars are [quite complex](https://github.com/jorendorff/rust-grammar/blob/master/Rust.g4). In our case however, the individual grammars for each day's tasks are rather simple.
|
||||||
[^rms]: Actually, `bison` was written by Robert Corbett. Richard Stallman just made it compatible to another parser generator named `yacc`.
|
[^rms]: Actually, `bison` was written by Robert Corbett. Richard Stallman just made it compatible to another parser generator named `yacc`.
|
||||||
|
@ -298,7 +298,7 @@ instruction
|
||||||
|
|
||||||
As you can see, the rules look similar to the ones we defined before.
|
As you can see, the rules look similar to the ones we defined before.
|
||||||
And that's it already, there is our grammar to parse the whole task and solve part one automagically while processing the input.
|
And that's it already, there is our grammar to parse the whole task and solve part one automagically while processing the input.
|
||||||
We of course need some more setup code which you'll find in the complete `parser.y` file [here](). <!-- TODO -->
|
We of course need some more setup code which you'll find in the complete `parser.y` file [here](https://code.dummyco.de/feliix42/aoc-2022/src/commit/841b9ec20bdf828f721c12675564e571cc51d9ad/day_10/parser.y).
|
||||||
The last part we have to take care of is invoking the parser.
|
The last part we have to take care of is invoking the parser.
|
||||||
But now, that's as easy as pie.
|
But now, that's as easy as pie.
|
||||||
We just write a quick main function at the end of our `bison` file that initializes our state and the lexer and runs the parser to end by invoking `yyparse` with the necessary arguments[^args]:
|
We just write a quick main function at the end of our `bison` file that initializes our state and the lexer and runs the parser to end by invoking `yyparse` with the necessary arguments[^args]:
|
||||||
|
@ -369,21 +369,14 @@ The result is a derivation of our grammars' rules, a so-called _parse tree_ (ple
|
||||||
|
|
||||||
By doing this seemingly stupid challenge, I realized that what I was doing was actually not too different from what you'd normally do in the advent of code.
|
By doing this seemingly stupid challenge, I realized that what I was doing was actually not too different from what you'd normally do in the advent of code.
|
||||||
The parser framework just gives you the right set of tools to reason about the input (in the most cases, for some tasks it felt very forced).
|
The parser framework just gives you the right set of tools to reason about the input (in the most cases, for some tasks it felt very forced).
|
||||||
|
So in a sense, using the parser infrastructure produces code that is a lot cleaner.
|
||||||
|
Also, writing the grammar rules was rather easy, as the tasks for each day are usually just a textual representation of the grammar.
|
||||||
|
|
||||||
|
Not all things can be solved during the parse, though.
|
||||||
|
A notable example for this is the challenge of [day 8](https://adventofcode.com/2022/day/8) which requires you to compute some properties on a square full of trees.
|
||||||
|
Here, most of the leg work has to be done after the parse as reasoning over the complete data structure is necessary.
|
||||||
|
|
||||||
- It's a lot cleaner (you got a grammar and all)
|
Overall, it was a very fun experience to do this challenge (once I discarded the idea of doing it in C++, of course).
|
||||||
|
It refreshed a lot of things I learned in the Compiler Construction lecture and was unconventional.
|
||||||
|
I'm looking forward to the next (i.e., this) year!
|
||||||
|
|
||||||
What I learned:
|
|
||||||
- C++ parsers are a major PITA -- especially if not even the official example compiles
|
|
||||||
- (almost) everything can be parsed
|
|
||||||
- sometimes it takes a bit of extra code
|
|
||||||
- natural limitations: day 3
|
|
||||||
- Left-associative parsers: take care (LR(1))
|
|
||||||
- input is mostly a representation of data -> something parsers are for
|
|
||||||
|
|
||||||
|
|
||||||
## Structure
|
|
||||||
- explain parser/lexer structure briefly
|
|
||||||
- walk through 1-2 examples?
|
|
||||||
- assembly interpreter
|
|
||||||
- shell output parser
|
|
||||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 33 KiB After Width: | Height: | Size: 32 KiB |
Loading…
Reference in a new issue