I'm so glad that you are still uploading! So many of these series online are only like one or two videos before the creator stops uploading. Excited to see where this goes!
@@RenderDragonOh no, this is what being a celebrity feels like lol 😅. As I'm sure you could guess, I end up getting recommended loads of programming related stuff, so you'll probably see me around in plenty of these sort of comment sections if you look hard enough.
So, about approaching operator precedence in your compiler. In our "programming languages and compilers" class we were shown how you should do it, but it was never explained why. Luckily I had enough experience with parsers after that to understand it myself. Instead of writing prec = 0 or something while describing your grammar, use this way of describing it: BinExpr -> ExprPrec1 + ExprPrec1 | ExprPrec1 | // add subtractions here ExprPrec1 -> ExprPrec2 * ExprPrec2 | ExprPrec2 | // add division here ExprPrec2 -> Expr | // add any higher precedence operators here It's not self-explanatory why whould you do that and add so many additional (and kind of ugly when you have like 7+ of them) non-terminals, but just drawing an AST for this would probably be enough to get the idea. In this way, higher precedence equals to operators being lower in resulting AST. That means that walking the tree from leaves to the root would resualt in correct evaluation without any smart precedence resolution algorithms
i commented this on part 2 as well but you can put the commands on one line if you separate them with a semicolon like this: ./out; echo $? The reason ./out && echo $? didn't work is that the right hand side of a && in bash only executes if the left hand side has exit code 0.
it’s so exciting to see the whole process, i really appreciate the fact that there is no cut, and allows us to see every stage, from confusion to accomplishment. i love this format, looking forward to see the next episodes
I'm enjoying your content, I'm enjoying the "thinking out loud" as you problem solve I'm enjoying just how good a programmer you are quick thoughts (on the parsing): 1. parse the whole text through linear passes first (not recursively) 2. use indexes to sections in the string (no need for pointer magic) 3. collect the keywords and variables (then can easily check if identifier has already been declared or not) in a vector of indexes (eg: type can be {index int, length int} ) 4. after collecting all identifiers and expressions, then simple rules to check if syntax rules obeyed (eg, math expression has lhs n rhs, expression precedence, identifiers preceeded by let, type system obeyed, etc) i think biggest benefits: 1. easy to reason algorithmically 2. no need for allocator magic (which looks really cool btw) 3. parsing becomes a linguistics problem not a data structure problem 4. parse through function calls that check if expected expression is found at expected location because I'm seeing your project already has so many different types (hence needing allocator magic when they recurse), when there's no need to split from the file/string (which already has all that information encoded)
would'nt this make parsing (as well as further steps such as semantic analysis) horrendously complex? Types are used to abstract concepts, such as production rules of the formal grammar in this project. Also, i'm not sure I understand your point fully. Do you mean there would be no tokenization, only a parser that does multiple passes?
This is a great series. I'm 10+ years into programming, and I wanted to learn about interpreters and compilers by just building them since it's hard to retain that much info by just reading. I followed another series where the dude built an interpreter in python, and now I have the bones of my language working well. But his videos only implemented an interpreter, so I'm really glad you're getting into compilation, since it's apparently just interpretation with one extra step lol. I'm gonna try to use your videos to make my language compile to llvm bytecode!
Instead of parsing the math this way, you should take a look at something called the 'shunting yard algorithm'. I've used it in my own compiler, and it saved me a bunch of headaches. Great videos btw, cant wait to see the result!
23:17 Just a quick note: I usually use char pointers (char*) for pointers that I do arithmetic with since its exactly a byte long, std::byte is just a fancy feature, that makes you type longer :) Also, it's more safe to do a reinterpret_cast for pointers as it explicitly says you are reinterpreting the pointer instead of casting the value of the variable, which is a pointer anyways.
Tip: When parsing expressions TDLR if you perform parsing in order of precedence operation, you get order of precedence for free. For instance, when parsing a term, first look to see if it's an integer literal, then look to see if it's a "MulOp" (that is multiplication or division), then check to see if it's an "AddOp" (addition or subtraction), and generate your AST nodes as you discover each. In this case, expression nodes with left/right always come out in the right order. For a far better explanation than I can give in a UA-cam comment, see Jack Crenshaws "let's build a compiler" series, the parts on expression parsing.
Hey, awesome work there buddy! Just so you know, in formal grammars you can enforce precedence in operations like this: [Prog] -> [Stmt]* [Stmt] -> exit([Expr]); | let ident = [Expr] [Expr] -> [Term] | [BinSumExpr] [BinSumExpr] -> [Expr] + [Expr] | [BinMultExpr] [BinMultExpr] -> [Expr] * [Expr] [Term] -> int_lit | ident This way, you're defining the grammar rules to reflect the desired operator precedence without the need for additional variables ("prec"). It will be especially helpful when you'll deal with more complex operations like subtraction, division, parentheses, or exponentiation... just follow the same rule!
Great series so far! Just wanted to warn you that your ArenaAllocator might be unsafe since you're not checking whether the buffer has enough space left in it to allocate the requested type and could go out of bounds. I haven't used C++ myself in a decade so I'm not 100% sure that it'll cause an issue, but it made me nervous lol Looking forward to part 5!
If you did everything right, then this would work also, right? let x = 1 + 2 + 3; let y = x + 5; exit(y); BTW loving this series. My C++ is very rusty and not up to current standards and I'm learning some tricks. Great stuff!
You are right that it ignores alignment but considering our AST nodes only use simple types, this is unlikely to be an issue. If I was making a library for the allocator, I would be more thorough. If the issue arises, it will be addressed.
I totally understand why constantly checking if current point + potential size of new node < arena size limit is annoying, but then again, so is iterating this kind of array on a complex set of nodes allocated in this linear fashion without resorting to post order traversal.
@SimGunther That's not what we are talking about. We're talking about alignment, i.e. that every type has a requirement that it's address has to be a multiple of some number. For basic types like int32_t it's their size. In this case 4 bytes. So the adress will have to end in 0x00 such that it is a multiple of 4.
@pixeled-yt In rust we have the following: "Raw pointers can be unaligned or null. However, when a raw pointer is dereferenced (using the * operator), it must be non-null and aligned." I couldn't find out if the same holds for cpp. It seems to be true for C though.
As others have mentioned, a really nice series of videos. Are there any plans to make the language turing complete so that their own compiler can be written in it? xd :D
Just for clarification: Turing-completness is the property of (1) a programming language or (2) a machine such that it (1) is possible to express Turing-computable functions (2) is an implementation of the Turing machine. The concept you meant is called self-hosting.
I don’t wanna hate on you, everyone starts somewhere and I was there too. Your C++ code is bad, I think using a language you are actually familiar with would’ve been better for this series, bc a big part is you figuring out how C++ works, and that’s not very entertaining for myself at least.
I'm so glad that you are still uploading! So many of these series online are only like one or two videos before the creator stops uploading. Excited to see where this goes!
i know right or they upload 1 video and then dont upload for 6 months or a year and then randomly upload a new video
OMG! The Dark Side! You are the guy who uploaded full first part of Notch coding Minicraft! Didn't really expect it to see you here :)
@@RenderDragonOh no, this is what being a celebrity feels like lol 😅. As I'm sure you could guess, I end up getting recommended loads of programming related stuff, so you'll probably see me around in plenty of these sort of comment sections if you look hard enough.
In 2077: Hydrogen 71.2 update. Creating OS in Hydrogen
you spoke too soon... :(
“Premature optimization is the root of all evil”
“We need an arena allocator”
😂
Still peak content! Loving the series
So, about approaching operator precedence in your compiler. In our "programming languages and compilers" class we were shown how you should do it, but it was never explained why. Luckily I had enough experience with parsers after that to understand it myself.
Instead of writing prec = 0 or something while describing your grammar, use this way of describing it:
BinExpr -> ExprPrec1 + ExprPrec1 | ExprPrec1 | // add subtractions here
ExprPrec1 -> ExprPrec2 * ExprPrec2 | ExprPrec2 | // add division here
ExprPrec2 -> Expr | // add any higher precedence operators here
It's not self-explanatory why whould you do that and add so many additional (and kind of ugly when you have like 7+ of them) non-terminals, but just drawing an AST for this would probably be enough to get the idea.
In this way, higher precedence equals to operators being lower in resulting AST. That means that walking the tree from leaves to the root would resualt in correct evaluation without any smart precedence resolution algorithms
Glad to know Pratt parsers trivialize expression parsing :D
@@SimGunther never heard the term before actually)
@@MrVladoCCChidi Williams and Matklad have great writings on the topic that make me feel warm and fuzzy inside 😊
i commented this on part 2 as well but you can put the commands on one line if you separate them with a semicolon like this: ./out; echo $?
The reason ./out && echo $? didn't work is that the right hand side of a && in bash only executes if the left hand side has exit code 0.
Thanks!
it’s so exciting to see the whole process, i really appreciate the fact that there is no cut, and allows us to see every stage, from confusion to accomplishment. i love this format, looking forward to see the next episodes
I'm enjoying your content,
I'm enjoying the "thinking out loud" as you problem solve
I'm enjoying just how good a programmer you are
quick thoughts (on the parsing):
1. parse the whole text through linear passes first (not recursively)
2. use indexes to sections in the string (no need for pointer magic)
3. collect the keywords and variables (then can easily check if identifier has already been declared or not) in a vector of indexes (eg: type can be {index int, length int} )
4. after collecting all identifiers and expressions, then simple rules to check if syntax rules obeyed
(eg, math expression has lhs n rhs, expression precedence, identifiers preceeded by let, type system obeyed, etc)
i think biggest benefits:
1. easy to reason algorithmically
2. no need for allocator magic (which looks really cool btw)
3. parsing becomes a linguistics problem not a data structure problem
4. parse through function calls that check if expected expression is found at expected location
because I'm seeing your project already has so many different types (hence needing allocator magic when they recurse),
when there's no need to split from the file/string (which already has all that information encoded)
I'm not a pro, learning by watching,
just some thoughts
would'nt this make parsing (as well as further steps such as semantic analysis) horrendously complex? Types are used to abstract concepts, such as production rules of the formal grammar in this project.
Also, i'm not sure I understand your point fully. Do you mean there would be no tokenization, only a parser that does multiple passes?
This is a great series. I'm 10+ years into programming, and I wanted to learn about interpreters and compilers by just building them since it's hard to retain that much info by just reading. I followed another series where the dude built an interpreter in python, and now I have the bones of my language working well. But his videos only implemented an interpreter, so I'm really glad you're getting into compilation, since it's apparently just interpretation with one extra step lol.
I'm gonna try to use your videos to make my language compile to llvm bytecode!
Instead of parsing the math this way, you should take a look at something called the 'shunting yard algorithm'.
I've used it in my own compiler, and it saved me a bunch of headaches.
Great videos btw, cant wait to see the result!
23:17
Just a quick note:
I usually use char pointers (char*) for pointers that I do arithmetic with since its exactly a byte long, std::byte is just a fancy feature, that makes you type longer :)
Also, it's more safe to do a reinterpret_cast for pointers as it explicitly says you are reinterpreting the pointer instead of casting the value of the variable, which is a pointer anyways.
I'm typically a lurker on many vids and never comment... But just wanted to say how much I'm enjoying the content! Thanks and keep it up!
Really great series! I can't wait to have all the steps clear before attempting my own compiler in Rust!
This is amazing, please don't stop this series!
Tip: When parsing expressions TDLR if you perform parsing in order of precedence operation, you get order of precedence for free.
For instance, when parsing a term, first look to see if it's an integer literal, then look to see if it's a "MulOp" (that is multiplication or division), then check to see if it's an "AddOp" (addition or subtraction), and generate your AST nodes as you discover each. In this case, expression nodes with left/right always come out in the right order. For a far better explanation than I can give in a UA-cam comment, see Jack Crenshaws "let's build a compiler" series, the parts on expression parsing.
curious to see how you're gonna handle functions and variable scopes
Loving this series, can't wait to see what comes next!
I love this series. Keep it up!
Hey, awesome work there buddy! Just so you know, in formal grammars you can enforce precedence in operations like this:
[Prog] -> [Stmt]*
[Stmt] -> exit([Expr]); | let ident = [Expr]
[Expr] -> [Term] | [BinSumExpr]
[BinSumExpr] -> [Expr] + [Expr] | [BinMultExpr]
[BinMultExpr] -> [Expr] * [Expr]
[Term] -> int_lit | ident
This way, you're defining the grammar rules to reflect the desired operator precedence without the need for additional variables ("prec"). It will be especially helpful when you'll deal with more complex operations like subtraction, division, parentheses, or exponentiation... just follow the same rule!
I'm new to language design and I always skip material covering formal grammars, but this series and this comment are very helpful to me, Thank you
@@gnarusg8708 you're welcome! that's actually very intuitive once you figure out all the rules ;)
wait is it like this:
[BinSumExpr] -> ([Expr] + [Expr]) | [BinMultExpr]
or like this:
[BinSumExpr] -> [Expr] + ([Expr] | [BinMultExpr])
Looking forward to the rest of the journey
Great series so far! Just wanted to warn you that your ArenaAllocator might be unsafe since you're not checking whether the buffer has enough space left in it to allocate the requested type and could go out of bounds. I haven't used C++ myself in a decade so I'm not 100% sure that it'll cause an issue, but it made me nervous lol
Looking forward to part 5!
17:27 he mentioned it here
-- Mom, can we get Tsoding, pleeeeease?
-- No, we already have Tsoding at home.
Tsoding at home:
Adding adding to my language
Loving this series
Cannot wait for the next part. The series is so great!
Great vid, just one nitpick: mb would be mili bits
Thanks for content! I really love to watch these videos
This series is really cool
This series is great! Please keep it up!
the door is open!
36:18 Sounds like he started to sing “All Star”
I genuinely get excited to watch these videos from this series, thank you for making em!
Also first comment 😎 GET REKT
Another excellent episode :D
This is amazing stuff!
underrated!
This is awesome!
This series is amazing !!!
Awesome - keep up the good work mate ❤🫵
If you did everything right, then this would work also, right?
let x = 1 + 2 + 3;
let y = x + 5;
exit(y);
BTW loving this series. My C++ is very rusty and not up to current standards and I'm learning some tricks. Great stuff!
It does work, I tested it and it exited with 11 :)
I love these videos ❤❤
Can’t wait till you subtract subtraction from the language 👍
keep going!
Very good videos, what about implementing reassignment before continueing with other mathematical operations ?
30:16 Why was that so funny?
Nice Thumbnail
Instead of returning an optional of a pointer, why not just return the pointer directly and replace empty optionals with return nullptr ?
30:10 I am following along, could you by any chance tell me what you changed?
Great video! Keep it up!
Why do you use CLion for your coding but VSCode for looking at MD files?
Great work 👍 can you make content on solidity compiler
Is it a problem that your arena allocator ignores alignement for the types that are allocated?
You are right that it ignores alignment but considering our AST nodes only use simple types, this is unlikely to be an issue. If I was making a library for the allocator, I would be more thorough. If the issue arises, it will be addressed.
I totally understand why constantly checking if current point + potential size of new node < arena size limit is annoying, but then again, so is iterating this kind of array on a complex set of nodes allocated in this linear fashion without resorting to post order traversal.
@SimGunther That's not what we are talking about. We're talking about alignment, i.e. that every type has a requirement that it's address has to be a multiple of some number. For basic types like int32_t it's their size. In this case 4 bytes. So the adress will have to end in 0x00 such that it is a multiple of 4.
@pixeled-yt In rust we have the following: "Raw pointers can be unaligned or null. However, when a raw pointer is dereferenced (using the * operator), it must be non-null and aligned."
I couldn't find out if the same holds for cpp. It seems to be true for C though.
when discord?
love the vids!!
As others have mentioned, a really nice series of videos. Are there any plans to make the language turing complete so that their own compiler can be written in it? xd :D
I think he mentioned that in part 1 i think?
Just for clarification: Turing-completness is the property of (1) a programming language or (2) a machine such that it (1) is possible to express Turing-computable functions (2) is an implementation of the Turing machine. The concept you meant is called self-hosting.
Just Keep On Bro You Can Do It :D
Keep it up!
Why not just make BinaryExpression a pointer? It would greatly simplify things.
Would love to see if it handles
let x = 3
let y = 5 + x
I’m sure it will work but kinda curious lol
Just tried it, it works correctly for exiting with both x and y :)
@@pixeled-yt nice good job! Btw amazing series keep it up
Ah, yes. The PEMDAS
Adding adding XDDD
That "4mb" should actually be "4mib"
Dangit, I just came here to see how you allocated registers for binary expressions and you just used the stack instead 😢
me too lol
VS Community + JetBrains ReSharper is better than Clion.
Slow down, uploading these vids too fast lol, give me time to finish the last one
I don’t wanna hate on you, everyone starts somewhere and I was there too. Your C++ code is bad, I think using a language you are actually familiar with would’ve been better for this series, bc a big part is you figuring out how C++ works, and that’s not very entertaining for myself at least.
But it is entertaining to me. 😊
By the way, you can do: ./out; echo $? 🙂