I started reading crafting interpreters in december 2023 and I've been working on my language off and on ever since. I've parted ways with Lox ever since I read that all nimbers are floats. Still it's super helpful to see you approach the challenge and see how you do things differently. It's also amazing to see you do something perfectly from the beginning that took me miltiple attempts to get right.
Very cool! You inspired me to try the CodeCrafters challenge myself, and just completed the whole thing... Alas, the CodeCrafters site only goes up through basic statement evaluation with global variables, and doesn't cover the rest of the book. I suppose nothing is stopping me from continuing on myself, though. Thanks for doing this! It's fun to see how different our approaches were.
Small correction: cpython (the python interpreter) is indeed a bytecode interpreter, so there is a compilation step that goes from AST to a bespoke bytecode instruction set. Old versions of ruby and perl are a better example for tree walking interpreters, as they run directly on the parsed AST (even though I believe modern versions also compile to bytecode for efficiency reasons)
Super cool, thank you for making this! Can't wait to get through the video. I took a crack at Crafting Interpreters in Rust earlier this year, it was a lot of fun. Now I'm setting an early new year resolution to build a Hindley-Milner, LLVM based language (in Rust, of course).
This was really fun, personally I'd love if you added more "best practices" / convenience crates like camino etc. as I feel a lot of the educational value comes from seeing how to do things well. E.g. all the work on the Iterator that uses references to the source string is much more useful to see than just cloning the string as you go character by character.
I've been playing around with a parser style where the state machine is more explicit, e.g. with immutable state and functions LexerState -> Result, and where you combine and compose those functions with monad language. Which is really cool, but then I took uncomfortably long to write out a sexp parser, and I decided I wanted to get back to the basics for a bit. Good old ~linear parsing, no big lookahead/backtracking, but still with good type safety, healthy memory idioms and good error reporting. Plus I haven't touched Rust in a while, so this video is the perfect excuse. Thanks for the videos as always, they rock -- hope we get a part 2 for this one!
What you are doing is called a parser combinator. It work and it's probably a standard to create your own parser theses days. Like most generic interface it can be tough to understand what's going on at some point. And painfull to maintain - especially when it come to error message. Bare function call with a side-effect parser is the opposite, like Jon did. They are dual to each others. I have no real preference tbh, both work equally well and have advantage/disadvantage. Using parser generator is also a solution, like yacc/bison.
Hahaha wow my favorite rust explainer person did the exact thing I'm organizing in a local rust meetup, but apparently in one 8-hour livestream. Incredible, kudos! As a complete beginner to implementing interpreters before this, I will be skimming this video to see what you did to complete it that quickly.
@@resb64 youtube keeps deleting my comment, I think perhaps because I was linking to it. It's a short course of supplementary skills that one might not develop during an ordinary CS education but which are very useful.
Regarding the first Q&A part (about 20 minutes in), the way I'd explain bytecode is that it's machine code for an ISA that doesn't physically exist, but instead is implemented in software, and that implementation can then be through e.g. an interpreter or a JIT compiler.
The reason that the book includes the original string in every token is basically only for error reporting. With miette you handle all of the necessary state to track that during iteration, so it would probably be reasonable to only save anything for idents, numbers, and strings. With that you could keep everything in TokenKind. Actually on second thought, that doesn't really work until you get to the compile-as-you go bytecode interpreter, you lose your original place with runtime errors in the tree-walk interpreter. It could be an interesting exercise in how to make it more "rusty" if you get to the bytecode section, I guess.
Pratt parsing for statements seems like huge overkill. Just match on the first item, if it's a keyword handle that particular form, if it's not, go to (Pratt) parsing expressions. Pratt parsing's win is handling all the nesting and recursion and you have none of that at the statement or block level. Notably, there is no binding power at the expression or block level, so you don't need to pass around r_bp everywhere.
At 5:21:00 he implements peek, but he only implements the peeking mechanism and not the reset after peeking. When calling next, it should clear the self.peeked value. This way, if you call .peek() and then call .next(), and then .peek() again, you get he value that was returned in .next just now.
Around 5:00:00; parsing things like ‘for’ and function calls as a special type of operator reminds me of a more general idea from some functional programming languages - particularly interactive theorem provers like Agda and Coq - called *mixfix* operators. In that setting, you can have an operator like ‘if_then_else_’ which is a prefix operator with two ‘internal arguments’, with the positions of the arguments specified by the _’s. This isn’t something that can be handled by basic Pratt parsing though, and most of the implementations out there go the extra step of allowing user defined operators too, so it’s definitely overkill for this project!
A really small query: when Jon said, "... as you get characters from the input you produce tokens from the output...", at ~52:22 did he mean you produce tokens for the output? Just new to this and want to make sure I haven't missed something about how input/output interact.
Yep. Tokens could be enum variants, references to the original input, etc. Maybe he meant “from” as a pipe side. Characters are taken _from_ the input and tokens are produced _from_ the output in an iterator.
Crafting Interpreters is one of my all time favorite books, Bob Nyström wrote it about 4 years (!) and polished it to a level seldom seen. In that light, it was funny to see even a superprogrammer like Jon going gung-ho and making a scanner that looks, in comparison, like a vomit-spaghetti 😂. (I only watched the scanner part, it looked horrible enough for me).
The link to matklads article in the desription seems broken, I get a 404 on github pages. I think youtube parses the parenthesis as part of the link, the irony!
1:00:29 I always thought you couldn't have multicursor in nvim? Is this a plugin? I've been using multiline selection with :s/find/replace in these cases.
It's a standard vim feature: type Ctrl-V -> select lines -> insert mode -> change the text -> Esc, after that the change is applied to all the selected lines.
I think of the syntax tree as the pure representation of the program. I don’t think that is a representation of the semantics: the interpreter is the representation of a semantic for that syntax tree. The essential characteristic of the syntax tree is that it’s source code purified.
@@ilyapopov823 Though it looks like it will be disabled by default for now and currently doesn't do a whole lot of optimizations yet. But definitely opens up a lot of possible performance improvements.
An interesting observation to me is that he just does not test. Not even a standard input just to make sure nothing broke or some unit tests. Nothing. I usually get very lost if i am 3 hours in with out at least sanity checking that my code at least runs. I wonder if that is because i have to use Python at ${job} and he can use rust or if it is just that he is that much better than i am. Or maybe both.
Not showing a website because it’s bright and “people don’t like that” is the STUPIDEST thing I’ve heard all week. At what point did we start caring about what some small group of small-minded people prefer when they’re watching videos at 3am?
Why are you adding so much unnecessary boilerplate and being so fancy? One of the reasons I started liking zig more than rust is because I've seen in a lot of rust projects with crazy amounts of abstractions, unnecessary type wrappers and misdirections where I can barely follow the code.
4 hours in, don't see what you were talking about... I mean I personally disliked the ginormous TokenKind - I thought it may've been useful to break it up (e.g. have the MaybeEqual tokens as a separate subtype, which in turn is a variant of TokenKind), but that would actually be an _additional_ level of abstraction that I though was missing. So what exactly are you talking about?
Wtf do you even mean? There's no abstraction here, just separation of concern. Instead of just blindly complaining, give us an explaination, so we can try to figure things out. Now to the part where i state my opinion, i don't think you at the level where you can follow these kind of codes, that's why you couldn't follow it.
Awesome content! This is what Tech UA-cam needs more of.
I started reading crafting interpreters in december 2023 and I've been working on my language off and on ever since. I've parted ways with Lox ever since I read that all nimbers are floats.
Still it's super helpful to see you approach the challenge and see how you do things differently. It's also amazing to see you do something perfectly from the beginning that took me miltiple attempts to get right.
Very cool! You inspired me to try the CodeCrafters challenge myself, and just completed the whole thing... Alas, the CodeCrafters site only goes up through basic statement evaluation with global variables, and doesn't cover the rest of the book. I suppose nothing is stopping me from continuing on myself, though. Thanks for doing this! It's fun to see how different our approaches were.
Small correction: cpython (the python interpreter) is indeed a bytecode interpreter, so there is a compilation step that goes from AST to a bespoke bytecode instruction set. Old versions of ruby and perl are a better example for tree walking interpreters, as they run directly on the parsed AST (even though I believe modern versions also compile to bytecode for efficiency reasons)
Very small.
Super cool, thank you for making this! Can't wait to get through the video. I took a crack at Crafting Interpreters in Rust earlier this year, it was a lot of fun. Now I'm setting an early new year resolution to build a Hindley-Milner, LLVM based language (in Rust, of course).
You are such an amazing teacher. Thanks so much for everything you have done.
This was really fun, personally I'd love if you added more "best practices" / convenience crates like camino etc. as I feel a lot of the educational value comes from seeing how to do things well.
E.g. all the work on the Iterator that uses references to the source string is much more useful to see than just cloning the string as you go character by character.
I've been playing around with a parser style where the state machine is more explicit, e.g. with immutable state and functions LexerState -> Result, and where you combine and compose those functions with monad language. Which is really cool, but then I took uncomfortably long to write out a sexp parser, and I decided I wanted to get back to the basics for a bit. Good old ~linear parsing, no big lookahead/backtracking, but still with good type safety, healthy memory idioms and good error reporting.
Plus I haven't touched Rust in a while, so this video is the perfect excuse. Thanks for the videos as always, they rock -- hope we get a part 2 for this one!
What you are doing is called a parser combinator.
It work and it's probably a standard to create your own parser theses days. Like most generic interface it can be tough to understand what's going on at some point. And painfull to maintain - especially when it come to error message. Bare function call with a side-effect parser is the opposite, like Jon did. They are dual to each others. I have no real preference tbh, both work equally well and have advantage/disadvantage. Using parser generator is also a solution, like yacc/bison.
@@arialpew5248 I know, I've been looking into them recently. But yeah, I felt like I wanted to explore the other side too.
Hahaha wow my favorite rust explainer person did the exact thing I'm organizing in a local rust meetup, but apparently in one 8-hour livestream. Incredible, kudos! As a complete beginner to implementing interpreters before this, I will be skimming this video to see what you did to complete it that quickly.
Ah I see, you parse but do not evaluate yet.
I love the 'Implementation' videos from you Jon, keep up the good work! 💪🏾
Just saw that you were one of the magicians who wrote The Missing Semester Course too. You have put out some incredibly useful resources.
what's the missing semester course?
@@resb64 youtube keeps deleting my comment, I think perhaps because I was linking to it. It's a short course of supplementary skills that one might not develop during an ordinary CS education but which are very useful.
@@stretch8390 thanks, it is indeed a nice little course!
Please make more such projects building in Rust with industry practices.
Regarding the first Q&A part (about 20 minutes in), the way I'd explain bytecode is that it's machine code for an ISA that doesn't physically exist, but instead is implemented in software, and that implementation can then be through e.g. an interpreter or a JIT compiler.
The reason that the book includes the original string in every token is basically only for error reporting. With miette you handle all of the necessary state to track that during iteration, so it would probably be reasonable to only save anything for idents, numbers, and strings. With that you could keep everything in TokenKind.
Actually on second thought, that doesn't really work until you get to the compile-as-you go bytecode interpreter, you lose your original place with runtime errors in the tree-walk interpreter. It could be an interesting exercise in how to make it more "rusty" if you get to the bytecode section, I guess.
this is elite content
Can't help but notice "rustige muziek" was one of your suggestions at 11:48. Are you a secret Dutchie?
Pratt parsing for statements seems like huge overkill. Just match on the first item, if it's a keyword handle that particular form, if it's not, go to (Pratt) parsing expressions. Pratt parsing's win is handling all the nesting and recursion and you have none of that at the statement or block level. Notably, there is no binding power at the expression or block level, so you don't need to pass around r_bp everywhere.
programming with read documentation definitely the best things
Hi Jon. What are you using for Code Actions? Its lit. 🎉
yeah. to switch text from upper to lower case or vice versa you can use gu or gU optionally you select using a motion and apply it on selection.
At 5:21:00 he implements peek, but he only implements the peeking mechanism and not the reset after peeking. When calling next, it should clear the self.peeked value. This way, if you call .peek() and then call .next(), and then .peek() again, you get he value that was returned in .next just now.
1:39:45 - Convert text to uppercase and lowercase with U or u in visual select.
Around 5:00:00; parsing things like ‘for’ and function calls as a special type of operator reminds me of a more general idea from some functional programming languages - particularly interactive theorem provers like Agda and Coq - called *mixfix* operators. In that setting, you can have an operator like ‘if_then_else_’ which is a prefix operator with two ‘internal arguments’, with the positions of the arguments specified by the _’s. This isn’t something that can be handled by basic Pratt parsing though, and most of the implementations out there go the extra step of allowing user defined operators too, so it’s definitely overkill for this project!
A really small query: when Jon said, "... as you get characters from the input you produce tokens from the output...", at ~52:22 did he mean you produce tokens for the output? Just new to this and want to make sure I haven't missed something about how input/output interact.
Yep. Tokens could be enum variants, references to the original input, etc. Maybe he meant “from” as a pipe side. Characters are taken _from_ the input and tokens are produced _from_ the output in an iterator.
So you do not parse negative numbers now, you do that in a later stage of interpretation?
Crafting Interpreters is one of my all time favorite books, Bob Nyström wrote it about 4 years (!) and polished it to a level seldom seen.
In that light, it was funny to see even a superprogrammer like Jon going gung-ho and making a scanner that looks, in comparison, like a vomit-spaghetti 😂.
(I only watched the scanner part, it looked horrible enough for me).
The link to matklads article in the desription seems broken, I get a 404 on github pages. I think youtube parses the parenthesis as part of the link, the irony!
UA-cam's video descriptions are truly a mess 😅 How's it now?
@@jonhoo It works now! Thanks :)
1:00:29 I always thought you couldn't have multicursor in nvim? Is this a plugin? I've been using multiline selection with :s/find/replace in these cases.
It's a standard vim feature: type Ctrl-V -> select lines -> insert mode -> change the text -> Esc, after that the change is applied to all the selected lines.
He's not doing multicursor, he's just going into Visual mode. Visual mode allows you to write a column, not actual multicorsor (Ctrl+D)
@@beyondcatastrophe_ Thank you both, I shall try this.
Just got to this part and wondered the same.
hi, jon, it seems like the parser isn't fully complete, will there be another stream about this?
Yeah, I think this warrants a part two!
👍
Always learning more concepts about rust. Appreciate a lot from Kenya . Though I could suggest for a copy of your book 😊. I’ll appreciate a lot.
I think this kind of book could be given for free in PDF after a couple of years. Money is hard to come by.
Kenyans 😂. The web version of the book is free. Enda tu website.
I think of the syntax tree as the pure representation of the program.
I don’t think that is a representation of the semantics: the interpreter is the representation of a semantic for that syntax tree.
The essential characteristic of the syntax tree is that it’s source code purified.
Python's CPython doesn't do JIT btw. Though it does have alternate interpreters that do JIT (pypy)
Next version, 3.13 (to be released soon), has a JIT
@@ilyapopov823 Though it looks like it will be disabled by default for now and currently doesn't do a whole lot of optimizations yet. But definitely opens up a lot of possible performance improvements.
Did you ever find out who was committing to your repo?
Why don't you use some darkmode extensions?
Because I like black text on white background for websites :)
Why `(echo | psub)` rather than `
Fish doesn't support the latter syntax :)
@jonhoo okay, so it's just another reason to avoid fish 😜
PS: I'm enjoying your videos! Thank you for them!
did he actually did this in one 8hr sitting?
Yep, can confirm!
An interesting observation to me is that he just does not test. Not even a standard input just to make sure nothing broke or some unit tests. Nothing. I usually get very lost if i am 3 hours in with out at least sanity checking that my code at least runs.
I wonder if that is because i have to use Python at ${job} and he can use rust or if it is just that he is that much better than i am. Or maybe both.
You're killing me with the light mode, Jon. Great content though.
00:00 my eyes are toasted from that light theme
Not showing a website because it’s bright and “people don’t like that” is the STUPIDEST thing I’ve heard all week. At what point did we start caring about what some small group of small-minded people prefer when they’re watching videos at 3am?
Why are you adding so much unnecessary boilerplate and being so fancy? One of the reasons I started liking zig more than rust is because I've seen in a lot of rust projects with crazy amounts of abstractions, unnecessary type wrappers and misdirections where I can barely follow the code.
Which part? I missed the live stream, but looking at the resulting code at the github, everything seem straightforward.
4 hours in, don't see what you were talking about... I mean I personally disliked the ginormous TokenKind - I thought it may've been useful to break it up (e.g. have the MaybeEqual tokens as a separate subtype, which in turn is a variant of TokenKind), but that would actually be an _additional_ level of abstraction that I though was missing. So what exactly are you talking about?
Wtf do you even mean? There's no abstraction here, just separation of concern. Instead of just blindly complaining, give us an explaination, so we can try to figure things out. Now to the part where i state my opinion, i don't think you at the level where you can follow these kind of codes, that's why you couldn't follow it.
3:14:26 i think we need ...or_else(|| self.rest.len() -1).
Otherwise my code crashed, when the comment was the last line