So glad my university teaches Haskell. Even if you dont ever use it in your work, learning to program in that language makes you a better programmer and writing a compiler is usually a thing that Haskell will be used for.
Yup, compilers are one of the more functionally pure things you can program. They are VERY complex, but they just are fancy "String input, string output" machines when you look at the big picture
The key (which IMHO was not so well explained in the video) is that you have a function (parser combinator) that takes other functions (parsers) as input, and produces a function (more complex parser) as a result. This is rather similar to "normal" recursive-descent parsing, except you don't do the combining yourself, and instead let the parser combinator do it for you.
That's one of the key takeaways of Functional programming, "Higher-Order Functions", or functions that use other functions as parameters. A short and sweet paper is the Functional Pearl about Sixth-Order Functions. Why would you need a Sixth-Order Function? When you are using parser combinators!
Doesn't sound like something I would need a video for. It's clear from the name "parser combinator". It's clearly something that combines parsers into a parser, and then it's obvious those are then combined by further combinators and that's how you can build the whole root parser. - This video shows how to do that with monadic combinators, and the which I guess is just a concat function for the lists.
@@cparks1000000it's an alternative operation. If the first operation suceeds, it returns the result of first operation or the second if the first fails. Or nothing if both fail.
@@PauxloE Agreed - I guess the video couldn't get much longer though, but I'd have loved to see his parser return an AST and then implement the logic to interpret it!
@@PauxloE It evaluates the tree as it goes along., and it is walking the tree even though we're not seeing it. To get a tree, you'd modify the grammar to so that the do block returns a tree, rather than the the result of evaluating the math expression. The biggest problem I see with this toy example is there's no error handling. Malformed input can just get lost without affecting the tree or warning the user.
It was so confusing how he did that switcheroo. I was there thinking "how does `return (x+y)` make it produce a node `"+"` with branches `x` and `y`?" And even after acknowledging that this is really evaluation, he kept calling it a parser, even though the beginning of the video he insisted that what a parser is to him is something that produces a tree as output.
Regarding the "list of results" remark, maybe a fun example would be a parser that can handle all possible interpretations of "time flies like an arrow, fruit flies like a banana"
The import declaration allows use of the exported definitions from module Parsing. There's also a standard library module Prelude which is imported by default if not mentioned.
Equals= defines values.
Do and >=) operator, i.e.: do { b >= (\b -> c)
Module Prelude exports typeclass methods add(+), multiply(*), alternate(), bind(>>=), and return, and also exports the implementations of (+) and (*) for whichever of the various number types the end program might use. The defaulting rules in the REPL session seem to select the Integer type.
Module Parsing defines type Parse, function char, and method implementations for (), (>>=), and return. The bind(>>=) method does all the heavy lifting of combining two parsers into one larger parser.
The main module defines functions expr, term, and factor.
Graham Hutton. Your Haskell book was incredibly useful!! Never clicked so fast on a computerphile video. Parsing in Haskell was a bit hard in the beginning though.
> Parsing in Haskell was a bit hard in the beginning though. Getting used to all those new operators (, , , among others) was brain-breaking, initially.
Oh he wrote that book. That book got me through first year in uni. I bought it in desperation for my resit. It was the only module I had to resit and I still harbour a special hatred of Haskell.
This should be a whole series on its own channel... I think a channel dedicated to programming the simple things. Things like this video, but things like... Graphics, calculate a line, circle, plasma, transparncy, sprites, theres heaps and heaps of things like that, 3D rotation vertex and bitmap.. processing raw data by a microphone is another 100 videos. I could go on... Maybe
I did this with Java, I wrote a postfix expression parser that could do symbolic differentiation and simplification. I used polymorphism to implement the tree structure
WOW! Anyone saw the beautiful recursion in the definition of these expression parsers? I don't have the kind of brain to quickly think recursively like this
It's funny, it feels like Haskell makes a lot more sense to me when used in this context? I don't know, I usually cringe a bit when seeing the syntax but this example falls beautifully into place and feels very elegant? Also, the stdlib Parsing library being likes a hundred lines is pretty interesting too, probably both this and the example in the video play very well with the strengths of the language?
@@JobvanderZwan Thanks for the info - makes a lot more sense to me than people pushing Haskell as a general-purpose programming language. At least that's how I feel right now (I do Python for a living), but I might revise my judgement in the future :-)
Agreed. In my current lecture we're implementing a parser in OCaml (close enough I guess). The thing is that parser/compiler need a lot of things that are close to theory (recursion, trees etc...) and pattern matching, all of which is at home in any functional language.
Thanks for such visual and enlightening video. Next time, however, I'd rather see this kind of video (coding) as a screen cast, rather than camera jumping back and forth. Professors head talking just obscured the point he wanted to make about the primitives library. Overall, it was surprising how the parsing library neatly applied to the problem.
@@sebastiangudino9377 It doesn't come with Windows, child. Why would it? Windows has specialized IDEs of all sorts for all kinds of R&D. I am currently using Eclipse with an embedded microcontroller environment. The hardware debugger is fully integrated with the code editor and I can step through my program line by line. I could never do that with vi. OK, now you got your two minutes of attention. I hope your basement has gotten a little warmer. ;-)
The implementation reflects the grammar. Alternative parse paths are separated by (), where alternate expressions of the same term are separated by (|).
@@Dan-gs3kg If all parsers always produce singleton list, then parser monad = state monad as you said. So parser monad could be used as state monad. *But can you show the converse?* because that is what it means for me to say 'parser monad is effectively state monad' and I don't yet see how.
@@ancbi I'm a bit fast and loose here, but you can see the correspondence between say: > Parser a = P (String -> (a, String)) > State s a = S (s -> (a, s)) Where (s ~ String), and (~) is Type Equality The more general statement: > Parses a = P (String -> [(a, String)]) Is a bit different, but then we can also make a more general statement about what the State effect is. > StateT s m a = S (s -> m (a, s)) Where (s ~ String, m ~ List), we get (Parses) Where (s ~ String, m ~ Identity), we get (Parser) If we want zero or one parse results, we replace List with Maybe, if we want literate parse errors, then we use Either. All of this hinges off of the initial State Effect.
@@ancbi you're welcome. If you want to look at how "stacked" a parser combinator library gets look for the packages: megaparsec, parser-combinators, and yoctoparsec. The first is a fully featured library that generally dominates in performance, and expressiveness in both passing and error reporting. Yet it operates on the same principle seen previously. The second is a library that can interface with any parser combinator library. This is possible because of the algebraic structure of parser combinators. In other words, those principles are very permissive. And the third is very reductive of what a parser combinator library is. It's effectively a pile of free theorems (invariants) and one primitive: The single token matching primitive. At some point you figure out that a principled outlook is the most constructive.
A thousand upvotes! I'm building a simple parser in SCSS of all languages. Let's say it's an interesting adventure. Especially considering that my programming skills are rusty.
WOW... I just had to wrote a parser and evaluator, in Kotlin and I had 3 classes, working in conjunction to get final answer. While this feels like magic.
its a variation of it. Basically a functional/monadic version of it. Solves some problems like running out of Stack. It also makes backtracking a lot easier and the syntax smoother (but more complex). It also means that you can skip the lexer part. recursive descent rarely works well in imperative languages and it has a lot of boilerplate. It works a lot better in functional and is a lot compacter. EDIT: The correct term is "parser combinator" when you wanna lookup more on google/wikipedia.
@@Ceelvain Yes and no. Specially if you are context sensitive (which is completely possible here) backtracking can give you some nice advantages. Obviously for the cost of performance. That being said, if you keep the parts where you use backtracking short and/or use mnemonics. This way the performance overhead is pretty slim. If you make mistakes with this, it will cost you performance. The big advantage however is fast prototyping and relatively easy unit testing and that you can do things you cant with regular parser generators. Best case performance is close to parser generators as you can skip lexing. What I have seen in the real world however is usually quite a bit worse. However if I only have to build parser for smaller (max 1k lines) Inputs I prefer this option.
One way that parser combinators differ from recursive descent parsers is that they are declarative, rather than imperative. This means that, for example, you might choose to implement the resulting parser as a LR(1) parser, or maybe using CYK, because the parser itself does not describe the implementation, but only the grammar. Of course, all the actual parser combinator libraries I know use recursive descent in the end :)
@@_aullik Recursive descent has worked very well in imperative languages for many decades. As for skipping lexing, that is a matter of optimisation, as I understand it.
There is a Channel on UA-cam called low-level-javascript and the guy has a series building a parsing combinator from scratch. It is super interesting and o highly recommend, he's helping me a Lot with my parsing studies
He replaced the Tree-type with an arbitrary Type 'a'. So a parser can now calculate anything you want. You might want to keep calculating a tree, then a would be equal to Tree or you want to calculate something different like e.g. an integer (a=Int).
To be specific, the 'a' is a type variable. In Java, C++, Rust, or similar syntaxes it would be the T in Parser. In Haskell syntax, types and their parameters are written one after the other without brackets, just like Haskell values. Some examples and their C-like equivalents: List (Pair Int Char) = List Pair (List Int) Char = Pair List Pair Int Char = List (this is probably an error.) Parser a = Parser In C-like syntax type variables have to be introduced explicitly, such as with the template keyword in C++, but in Haskell you know any name that starts with a lowercase letter is a variable. Again, just like in term-level expressions.
Just wrote an abstract syntax tree in my utils package for tokenizing symbolic math expressions, would be cool if you guys went over how ASTs are used in compilers / interpreters as well
That gets into recursion schemes. Instead of consuming strings, you consume and rebuild trees. There's quite a bit involved there as different schemes have drastically different capabilities. The important bit is that those pieces compose well, and have some very nice optimisations.
This is neat but a little too simple to be useful, imo. For example, how do you also parse subtraction ("-") and division ("/") and also allow any term to be negated (for example "2 + 3 * 4 / -(-10/5)" -- Note the "-" in front of the compound term. Just a bit more complexity would allow us to see how the parser source code scales as the number of terms increase.
Of course it's not useful. It's an example for explanation no industry standard parser. Subtraction and division are incredible similar to their counterparts. A negation would be a different rule like "-" expr but this introduces a parse conflict since the "-" can be subtraction or negation (so called "shift/reduce conflict). That needs to be handled but suddenly the video would be 1hr long and way harder to follow.
I think you're referring to the difference between a binary and unary minus operator. / has essentially the same precedence as multiplication so it shouldn't be hard.
@@chud-dot-us-dot-gov Well partially, yes, I'd like to see how both unary and binary operators get handled. But I'm also not seeing exactly how operators with equal precedence (like +- and */) get handled with this scheme as well, so an example of that would be helpful.
So if we wrote some code that ran parsers in an infinite loop and then buried that computer would it produce infinite trees and thus solve deforestation?
I felt that way when I first saw Haskell, too. I've been learning it as a hobby since August and it's totally revitalized my interest in computer science on the whole.
One of the limitations of top-down parsing methods such as functional parsing is that left recursion results in parsers that loop. One possible solution is to transform the grammar first to eliminate the left recursion. Another is to define special-purpose combinators to parse operators that associate to the left.
Okay, so expr ::= term + expr | term. Right. Now, lets add minus, will we? expr::= term + expr | term - expr | term. Whoops, now what? Oh. And, btw, is the order of operands to significant? How about left recursion?
instance Alternative Parser where empty = P (\inp -> []) p q = P (\inp -> case parse p inp of [] -> parse q inp [(v, out)] -> [(v, out)]) this special 'or' operator executes left side parsing (in this case '-' char followed by natural number), if it succeeds it returns result, if not it will return right side parsing result (in this case just natural number)
I know it is a simplification of what a parser is. A parser is a map that take som sequence of "ordered elements" and return some transformation of the sequence into another structure given that the sequence Withhold the parsing scheme. To dump it a little Down, a parser can also be used on other types og input both simple and complex types as long as the data can be ordered.
It looks like a complex grammar and a carefully crafted example could cause the parser to generate a list of trees exponential in the string length. Which would be very bad.
Either don't return a list of possible results, return a proper stream of possible results, or something else. Eg "Parser a = String -> (a, String)" which only returns the one. The grammar is very literate, though the snag is learning what the combinators mean initially.
Thank you for this video. but why do I get this exception? I'm running it on GHCi 7.6.3 *Parsing> parse (some digit) "123" *** Exception: Parsing.hs:38:10-21: No instance nor default method for class operation GHC.Base.return I also get this warning when I load Parsing.hs [1 of 1] Compiling Parsing ( Parsing.hs, interpreted ) Parsing.hs:38:10: Warning: No explicit method or default declaration for `return' In the instance declaration for `Monad Parser' Ok, modules loaded: Parsing.
Apple, definitely upgrade your GHC. Try not to use the versions of GHC that come with the package managers of various OS choices as they tend to be quite out of date. Use the curl (or wget) install scripts that you’ll find on the websites for either of the things I’m about to mention in quotes. Either install the Haskell “Stack” tool for managing isolated sandboxed versions of GHC on your system (so you can have different projects operating with different GHC versions) or use “ghcup” to install Cabal for you to do something similar. But yeah, avoid whatever is in your package manager like the plague. And GHC 7.6.3 is pretty out of date with how monads are implemented. There’s been some significant language shifts since then.
Apple that’s a shame. What university/subject is this? Lectures should really update their course materials + backend. In GHC 7.10 the “Functor-Applicative-Monad Proposal” was implemented that made all Monads have Applicative as a superclass, which is a pretty big change and something that should be part of your learning ecosystem.
Did prof Hutton's code check for leading zeroes being invalid for integer like input was "000123" would be kind of invalid input for an integer. Even "022" would be invalid in a traditional sense for an integer.
My parser in php is just a function with string_replace() and preg_replace_callback() in it. I think a parser is pretty easy stuff until you have nested matches.
I'm not going to lie, this seems like a terrible idea. Using repeated string replace is a great way to turn your parser (anything that fills the ROLE of a parser, no matter the mechanism) into a turing machine that treats malformed/malicious input as a turing-complete programming language. I'd recommend looking into the LangSec approach to parsing. The computational power of the automaton you use to parse input should match the complexity of Grammer required to describe the set of valid inputs. No more and no less.
So it you have a structure of sort with leveled information and break out thing that meet specific conditions, is that parsing your explanation is so filled with lingo? I hear you talk aaout returning trees and pairs is that a request for being a parser. What about if you split up a string consisting of a leveled record tree" maybe first into substrings knowing which type of information in each string and the sub iterate the record structures within them using the record brackets and identifiers isn't that parsing top down?
It uses recursion. Higher precedence operators appear lower in the stack. It does multiplication before adding because in the grammer and parser multiplication has to be parsed and evaluated before addition. Look up Jack Crenshaw's series Let's Build A Compiler for another explanation. His approach uses recursive descent parsing instead of parser combinators but it is the same idea
at the last example, I was expecting to either have something parsed or get the "unconsumed" string... but the result was nothing with nothing "unconsumed"
It's giving a list of all the ways the input can match an expr with something left over. The last example gives the answer "it can't be an expr with anything left over". In a context where you could have an expr or something else, it would first pass the whole string to the expr parser, then pass the whole string to the other parser; choices remember the original string, and pass that to each parser, rather than passing the unconsumed part to the next parser like sequences do. If the expr parser gave [(nothing,"(2+4")], that would mean it was okay for an expr to consist of nothing.
can anyone tell me a practical use for parsers? like I watched the entire video and know what they do, but I feel a little unguided, in what common situation you'd use specifically a parser and not just a for loop
I mean, a parser could be built on a for-loop. Parsing is basically just the process of reading some data, like a mathematical expression, and understanding it. Math expressions can't always be calculated left-to-right (i.e. in the order you or your program would read them), like (1 + 3 * (7 - 1)), so one approach you may take is to decode the expression into objects -- a tree structure -- and then, after you've read and understood the whole expression, you'd use that data to calculate it in the right order. The video's approach is to use functions that are chained in sequence, and those functions could in theory use a for loop. For example, we may have a parser function for variable names, which would want to use a for-loop to read until it sees a character that can't be a variable name. We may have a parser function for whole numbers rather than single digits, which would take the same approach.
Programming language interpreters and compilers. JSON parsers, syntax highlighters, linters, transpilers (like Babel) web browsers that parse html and css, web scrapers all use parsers. Whether the average working programmer uses them is another story. Another example would be if you find yourself making a DSL for adding content to your video game and would rather not write the content in C#, you might consider writing a parser. Anytime you take input and transform it to produce some for of structured output, you are in essence writing or using a parser
According to LangSec, all input should be parsed into a well defined structure (i.e. a type that represents only the set of valid input) before being processed. That way you aren't making assumptions about your input that can be exploited by attackers. The type of parser used should match the complexity of the Grammer used to define valid input, following Chomsky's hierarchy of Grammers.
that's an interesting topic and it was well explained for as much as I watched but ultimately I had to stop at because of the mouth noises. Give that poor man a glass of water
This video is confusing. It's confusing how he doesn't put `Parsing.` before the things imported from that library. Which words are keywords of the language and which words are imported? It's confusing how he has words differing only in case. `char` and `Char`. The library has `P` quite a few times. What is `P`? In some lines it looks like it's the name of an argument to a function being defined, but some other functions don't have `P` in the place where I'd expect arguments to be named. It's confusing how he started with a bit of the defining the parser, then jumped to high level using his already finished parser library, then briefly showed us a flash of the library for a few seconds, then went back to using it again. How does it even work? 18:60 It's confusing that he keeps refering to this as parsing, giving us the idea of the tree mentioned earlier being the output. "How does `return (x+y)` build up that tree" I asked. - It doesn't, it's evaluating the expression.
`char` is a function, `Char` is a type. Types and constructors start with upper case. `P` is a data constructor, which is a function. If you think of `Parser a` as a type wrapping a `String -> [(a,String)]`, then `P` is the constructor that has type `P :: (String -> [(a,String)]) -> Parser a`. In one place (the definition of function `parse`), `P` is on the left hand side, to pattern match and extract the `String -> [(a,String)]` from `Parse a`.
You don't have to prefix unqualified imports. The bit about the passing primitives is uninteresting, as it's pretty much bit masking on character literals. The combinators are... Probably the most magical and important part of it. To make it construct an AST, replace "+, *" with AST constructors.
Types 20+ lines of Haskell program in a single breath, then gets 0 warnings or errors from GHC, then goes on as if it is totally normal...
Profile picture checks out.
Because it is normal. He's done this kind of code many times before.
And no StackOverflow screen up!
The Graham Haskell Compiler
That's totally normal
So glad my university teaches Haskell. Even if you dont ever use it in your work, learning to program in that language makes you a better programmer and writing a compiler is usually a thing that Haskell will be used for.
Yup, compilers are one of the more functionally pure things you can program. They are VERY complex, but they just are fancy "String input, string output" machines when you look at the big picture
Now go learn apl
The key (which IMHO was not so well explained in the video) is that you have a function (parser combinator) that takes other functions (parsers) as input, and produces a function (more complex parser) as a result.
This is rather similar to "normal" recursive-descent parsing, except you don't do the combining yourself, and instead let the parser combinator do it for you.
That's one of the key takeaways of Functional programming, "Higher-Order Functions", or functions that use other functions as parameters.
A short and sweet paper is the Functional Pearl about Sixth-Order Functions. Why would you need a Sixth-Order Function?
When you are using parser combinators!
Doesn't sound like something I would need a video for. It's clear from the name "parser combinator". It's clearly something that combines parsers into a parser, and then it's obvious those are then combined by further combinators and that's how you can build the whole root parser.
- This video shows how to do that with monadic combinators, and the which I guess is just a concat function for the lists.
The operator is most likely a custom operator definited in the package.
@@cparks1000000it's an alternative operation. If the first operation suceeds, it returns the result of first operation or the second if the first fails. Or nothing if both fail.
say it takes a string as an input and returns a tree as an output one...more...time
... and then the example parser doesn't even return a tree, but just an integer.
@@PauxloE Agreed - I guess the video couldn't get much longer though, but I'd have loved to see his parser return an AST and then implement the logic to interpret it!
@@PauxloE It evaluates the tree as it goes along., and it is walking the tree even though we're not seeing it. To get a tree, you'd modify the grammar to so that the do block returns a tree, rather than the the result of evaluating the math expression. The biggest problem I see with this toy example is there's no error handling. Malformed input can just get lost without affecting the tree or warning the user.
gi70st I can fix that for you... It takes an array of chars and gives back an AST
It was so confusing how he did that switcheroo. I was there thinking "how does `return (x+y)` make it produce a node `"+"` with branches `x` and `y`?"
And even after acknowledging that this is really evaluation, he kept calling it a parser, even though the beginning of the video he insisted that what a parser is to him is something that produces a tree as output.
Had to pause the video say, after studying parsers a lot recently I think the beginning explanation is on of the best of what a parser actually is
I think it sounds like an ideal explanation, when you know the subject well beforehand, interesting.
Regarding the "list of results" remark, maybe a fun example would be a parser that can handle all possible interpretations of "time flies like an arrow, fruit flies like a banana"
That's a cool one.
"It doesn't matter if you know any Haskell, because I'll be explaining everything as I go along. So you won't understand a thing either way."
> claims we don't need to know Haskell
> isn't explicit about which words are imported from his module and which are keywords.
As of the finished module in 19:35:
The import declaration allows use of the exported definitions from module Parsing. There's also a standard library module Prelude which is imported by default if not mentioned.
Equals= defines values.
Do and >=) operator, i.e.: do { b >= (\b -> c)
Module Prelude exports typeclass methods add(+), multiply(*), alternate(), bind(>>=), and return, and also exports the implementations of (+) and (*) for whichever of the various number types the end program might use. The defaulting rules in the REPL session seem to select the Integer type.
Module Parsing defines type Parse, function char, and method implementations for (), (>>=), and return. The bind(>>=) method does all the heavy lifting of combining two parsers into one larger parser.
The main module defines functions expr, term, and factor.
Graham Hutton. Your Haskell book was incredibly useful!! Never clicked so fast on a computerphile video. Parsing in Haskell was a bit hard in the beginning though.
what is the name of the book? okok i just google it. =D
@@keestv3116 Programming in Haskell :)
I, too, used the book. Excellent resource for starting Haskell.
> Parsing in Haskell was a bit hard in the beginning though.
Getting used to all those new operators (, , , among others) was brain-breaking, initially.
Oh he wrote that book. That book got me through first year in uni. I bought it in desperation for my resit. It was the only module I had to resit and I still harbour a special hatred of Haskell.
The videos of this guy are a treasure, thanks a lot!
Thanks, that was very illuminating. And it’s nice to see Haskell being used in a practical setting.
One thing here is that parser he implemented does binary opetator evaluation right-to left. Left-to-right binary operators are more involving.
This should be a whole series on its own channel... I think a channel dedicated to programming the simple things. Things like this video, but things like... Graphics, calculate a line, circle, plasma, transparncy, sprites, theres heaps and heaps of things like that, 3D rotation vertex and bitmap.. processing raw data by a microphone is another 100 videos. I could go on...
Maybe
Having trouble parsing Grahams accent. Sounds like its from everywhere in the UK at once! Fascinating
He sounds unambiguously Irish to me, but as far as I know, he is actually Scottish.
I did this with Java, I wrote a postfix expression parser that could do symbolic differentiation and simplification. I used polymorphism to implement the tree structure
WOW! Anyone saw the beautiful recursion in the definition of these expression parsers? I don't have the kind of brain to quickly think recursively like this
What an excellent complement to the material found in his book! Hope to see more
It is a bit annoying that the camera switches from head to laptop to screen constantly
Where was this video for my Language Engineering unit at university?! This would've been so useful to refer to!
Excellent explanation, very interesting to watch!
Wow! So cool how much you can do in a few lines of Haskell. Like defining a grammar, and a parser for it!
Why don't you use Maybe monad instead of List monad?
I think it would be better and more intuitive.
I mean:
type Parser a=String->Maybe (a,String)
That’s because what he already has uses Lists, and lists can handle multiple outputs (like from ambiguous parsing).
"4 and half screen fulls and is full fledged and can basically implement any parser you like" - mic drop
It's funny, it feels like Haskell makes a lot more sense to me when used in this context? I don't know, I usually cringe a bit when seeing the syntax but this example falls beautifully into place and feels very elegant? Also, the stdlib Parsing library being likes a hundred lines is pretty interesting too, probably both this and the example in the video play very well with the strengths of the language?
It's not just you - there's a reason quite a few experimental compilers are implemented in Haskell.
@@JobvanderZwan Thanks for the info - makes a lot more sense to me than people pushing Haskell as a general-purpose programming language. At least that's how I feel right now (I do Python for a living), but I might revise my judgement in the future :-)
Agreed. In my current lecture we're implementing a parser in OCaml (close enough I guess). The thing is that parser/compiler need a lot of things that are close to theory (recursion, trees etc...) and pattern matching, all of which is at home in any functional language.
More videos with prof Hutton and functional programming. My favorite topic
More of this, please! Graham is an outstanding teacher.
Sorry, but back in the day, I used Lex and YACC. This takes all the fun out of it
Thanks for such visual and enlightening video. Next time, however, I'd rather see this kind of video (coding) as a screen cast, rather than camera jumping back and forth. Professors head talking just obscured the point he wanted to make about the primitives library. Overall, it was surprising how the parsing library neatly applied to the problem.
Excellent tutorial.
Great video! I've written some parsers in Haskell, but Prof. Hutton makes it look easy.
Pleased to observe that he uses the vi editor - a REAL man :-)
That's not a sign of manhood. It's a sign that he has a clinical case of needing to be different. ;-)
@@lepidoptera9337Yeah, "different". Because he uses the text editor that comes with every computer ever. He is so quirky :/
@@sebastiangudino9377 It doesn't come with Windows, child. Why would it? Windows has specialized IDEs of all sorts for all kinds of R&D. I am currently using Eclipse with an embedded microcontroller environment. The hardware debugger is fully integrated with the code editor and I can step through my program line by line. I could never do that with vi.
OK, now you got your two minutes of attention. I hope your basement has gotten a little warmer. ;-)
@@lepidoptera9337 WSL is a POSIX system that comes with windows from Windows 10 onwards. It includes vi
Neat but you lost me when you started implementing the grammar.
Also would have liked to see the tree before it was evaluated.
The implementation reflects the grammar. Alternative parse paths are separated by (), where alternate expressions of the same term are separated by (|).
Shouldn't (some digit) return multiple results, parsing 1, 2 or 3 digits? Otherwise the parser wouldn't return all possibilities right?
I already know Haskell and quite a number of monads already but haven't got time to learn parser as a monad. So this video is perfect for me.
It's effectively the State Monad.
@@Dan-gs3kg If all parsers always produce singleton list, then parser monad = state monad as you said. So parser monad could be used as state monad. *But can you show the converse?* because that is what it means for me to say 'parser monad is effectively state monad' and I don't yet see how.
@@ancbi I'm a bit fast and loose here, but you can see the correspondence between say:
> Parser a = P (String -> (a, String))
> State s a = S (s -> (a, s))
Where (s ~ String), and (~) is Type Equality
The more general statement:
> Parses a = P (String -> [(a, String)])
Is a bit different, but then we can also make a more general statement about what the State effect is.
> StateT s m a = S (s -> m (a, s))
Where (s ~ String, m ~ List), we get (Parses)
Where (s ~ String, m ~ Identity), we get (Parser)
If we want zero or one parse results, we replace List with Maybe, if we want literate parse errors, then we use Either.
All of this hinges off of the initial State Effect.
@@Dan-gs3kg Hey. Thanks for making it clear. It was my ignorance of StateT making me say what I said.
@@ancbi you're welcome.
If you want to look at how "stacked" a parser combinator library gets look for the packages: megaparsec, parser-combinators, and yoctoparsec.
The first is a fully featured library that generally dominates in performance, and expressiveness in both passing and error reporting. Yet it operates on the same principle seen previously.
The second is a library that can interface with any parser combinator library. This is possible because of the algebraic structure of parser combinators. In other words, those principles are very permissive.
And the third is very reductive of what a parser combinator library is. It's effectively a pile of free theorems (invariants) and one primitive: The single token matching primitive. At some point you figure out that a principled outlook is the most constructive.
very nice and informative video, thank you.
Beautiful! Respect!
A thousand upvotes! I'm building a simple parser in SCSS of all languages. Let's say it's an interesting adventure. Especially considering that my programming skills are rusty.
WOW... I just had to wrote a parser and evaluator, in Kotlin and I had 3 classes, working in conjunction to get final answer. While this feels like magic.
Nice video. Would like it if you'd switch screens less to make it feel more "dynamic" while there's code being shown.
So, your definition of parser includes both the tokenizer and the syntax checker?
You can say that a tokeniser is a parser from a String to a List of Tokens. Rinse and repeat with the List of Tokens until you have what you want.
Awesome and excellent video as usual
Looks rather like recursive descent parsing to me.
its a variation of it. Basically a functional/monadic version of it. Solves some problems like running out of Stack. It also makes backtracking a lot easier and the syntax smoother (but more complex). It also means that you can skip the lexer part.
recursive descent rarely works well in imperative languages and it has a lot of boilerplate. It works a lot better in functional and is a lot compacter.
EDIT: The correct term is "parser combinator" when you wanna lookup more on google/wikipedia.
@@_aullik Backtracking?
Wouldn't one want his grammar deterministic? So that, you know, parsing time is linear in the length of the input.
@@Ceelvain Yes and no. Specially if you are context sensitive (which is completely possible here) backtracking can give you some nice advantages. Obviously for the cost of performance.
That being said, if you keep the parts where you use backtracking short and/or use mnemonics. This way the performance overhead is pretty slim. If you make mistakes with this, it will cost you performance.
The big advantage however is fast prototyping and relatively easy unit testing and that you can do things you cant with regular parser generators.
Best case performance is close to parser generators as you can skip lexing. What I have seen in the real world however is usually quite a bit worse. However if I only have to build parser for smaller (max 1k lines) Inputs I prefer this option.
One way that parser combinators differ from recursive descent parsers is that they are declarative, rather than imperative. This means that, for example, you might choose to implement the resulting parser as a LR(1) parser, or maybe using CYK, because the parser itself does not describe the implementation, but only the grammar. Of course, all the actual parser combinator libraries I know use recursive descent in the end :)
@@_aullik Recursive descent has worked very well in imperative languages for many decades. As for skipping lexing, that is a matter of optimisation, as I understand it.
More haskell please, I love it
There is a Channel on UA-cam called low-level-javascript and the guy has a series building a parsing combinator from scratch. It is super interesting and o highly recommend, he's helping me a Lot with my parsing studies
Yes that guy is a genious
Where did the tree go? At 5:02 he deletes the tree and I didn’t see it again!
He replaced the Tree-type with an arbitrary Type 'a'. So a parser can now calculate anything you want. You might want to keep calculating a tree, then a would be equal to Tree or you want to calculate something different like e.g. an integer (a=Int).
To be specific, the 'a' is a type variable. In Java, C++, Rust, or similar syntaxes it would be the T in Parser. In Haskell syntax, types and their parameters are written one after the other without brackets, just like Haskell values. Some examples and their C-like equivalents:
List (Pair Int Char) = List
Pair (List Int) Char = Pair
List Pair Int Char = List (this is probably an error.)
Parser a = Parser
In C-like syntax type variables have to be introduced explicitly, such as with the template keyword in C++, but in Haskell you know any name that starts with a lowercase letter is a variable. Again, just like in term-level expressions.
Just wrote an abstract syntax tree in my utils package for tokenizing symbolic math expressions, would be cool if you guys went over how ASTs are used in compilers / interpreters as well
That gets into recursion schemes. Instead of consuming strings, you consume and rebuild trees. There's quite a bit involved there as different schemes have drastically different capabilities.
The important bit is that those pieces compose well, and have some very nice optimisations.
I wrote a text adventure game in the mid 80's, so I guess the programming that interprets the English commands is a parser :)
It is, but more of a stone tool than this kind of weapon.
This is neat but a little too simple to be useful, imo. For example, how do you also parse subtraction ("-") and division ("/") and also allow any term to be negated (for example "2 + 3 * 4 / -(-10/5)" -- Note the "-" in front of the compound term. Just a bit more complexity would allow us to see how the parser source code scales as the number of terms increase.
Of course it's not useful. It's an example for explanation no industry standard parser. Subtraction and division are incredible similar to their counterparts. A negation would be a different rule like "-" expr but this introduces a parse conflict since the "-" can be subtraction or negation (so called "shift/reduce conflict). That needs to be handled but suddenly the video would be 1hr long and way harder to follow.
> parse expr "2 + 3 * 4 / -(-10/5)"
*Drake nah meme
> 2 + 3 * 4 / -(-10/5)
*Drake yeah meme
@@JNCressey I just realized that I want all my compiler errors to be presented as memes
I think you're referring to the difference between a binary and unary minus operator. / has essentially the same precedence as multiplication so it shouldn't be hard.
@@chud-dot-us-dot-gov Well partially, yes, I'd like to see how both unary and binary operators get handled. But I'm also not seeing exactly how operators with equal precedence (like +- and */) get handled with this scheme as well, so an example of that would be helpful.
So if we wrote some code that ran parsers in an infinite loop and then buried that computer would it produce infinite trees and thus solve deforestation?
Cutting back and forth between the computer, the screen, and the speaker is not the best decision. It's very frustrating.
Strange instance of applicative where he uses only head of the list
So interesting! Thank you so much! That is quality teaching.
I very much like using functional programming for translator writing. It's explicit and elegant.
I feel like I need to take a ten week course on this.
I felt that way when I first saw Haskell, too. I've been learning it as a hobby since August and it's totally revitalized my interest in computer science on the whole.
why this sir doesn't have a single playlist at all? do computerphile forgot to make one? "The functional programming series maybe" ?
I’am try to write a functional parser in Scheme!
Is Hascal some sort of hybrid language?
Brilliant! Thank you!
How would one use this with left recursion and left associativity?
One of the limitations of top-down parsing methods such as functional parsing is that left recursion results in parsers that loop. One possible solution is to transform the grammar first to eliminate the left recursion. Another is to define special-purpose combinators to parse operators that associate to the left.
Maybe it's because I don't know haskell but it's unclear to me how expr chooses whether to do the '+' branch or not
It backtracks. It tries to do the '+' case and if it can't it backtracks and does just the term case.
Nooo, the list scares me 😭
Why couldn't you just use a normal Maybe
oh, do more haskell please :DDDD
Okay, so expr ::= term + expr | term. Right. Now, lets add minus, will we?
expr::= term + expr | term - expr | term. Whoops, now what? Oh. And, btw, is the order of operands to significant? How about left recursion?
13:39 what happens if there is no "-", how does Haskell know that it has now to use the nat parser?
instance Alternative Parser where
empty = P (\inp -> [])
p q = P (\inp -> case parse p inp of
[] -> parse q inp
[(v, out)] -> [(v, out)])
this special 'or' operator executes left side parsing (in this case '-' char followed by natural number), if it succeeds it returns result, if not it will return right side parsing result (in this case just natural number)
this looks like DCG in Prolog, but DCG is more declarative
I'm wondering, is a parser a function that takes a string as an input and gives a tree as an output?
List of results? Wouldn't an Either be better? Then you either get the tree, or an error back
Vim is based
I know it is a simplification of what a parser is.
A parser is a map that take som sequence of "ordered elements" and return some transformation of the sequence into another structure given that the sequence Withhold the parsing scheme.
To dump it a little Down, a parser can also be used on other types og input both simple and complex types as long as the data can be ordered.
12:56 "And then I'm gonna *simply* return it"
Ah yes, that's what happens there, no magic at play at all :v
Yeees :v
It looks like a complex grammar and a carefully crafted example could cause the parser to generate a list of trees exponential in the string length.
Which would be very bad.
Either don't return a list of possible results, return a proper stream of possible results, or something else.
Eg "Parser a = String -> (a, String)" which only returns the one.
The grammar is very literate, though the snag is learning what the combinators mean initially.
That’s only a problem if you need to read the entire list. If you only look at the first couple of items, there’s no problem.
Thank you for this video. but why do I get this exception?
I'm running it on GHCi 7.6.3
*Parsing> parse (some digit) "123"
*** Exception: Parsing.hs:38:10-21: No instance nor default method for class operation GHC.Base.return
I also get this warning when I load Parsing.hs
[1 of 1] Compiling Parsing ( Parsing.hs, interpreted )
Parsing.hs:38:10: Warning:
No explicit method or default declaration for `return'
In the instance declaration for `Monad Parser'
Ok, modules loaded: Parsing.
Apple, definitely upgrade your GHC. Try not to use the versions of GHC that come with the package managers of various OS choices as they tend to be quite out of date. Use the curl (or wget) install scripts that you’ll find on the websites for either of the things I’m about to mention in quotes. Either install the Haskell “Stack” tool for managing isolated sandboxed versions of GHC on your system (so you can have different projects operating with different GHC versions) or use “ghcup” to install Cabal for you to do something similar. But yeah, avoid whatever is in your package manager like the plague. And GHC 7.6.3 is pretty out of date with how monads are implemented. There’s been some significant language shifts since then.
Apple also add the line “return = pure” within the Monad instance declaration (...just above the definition for (>>=) is where you’d normally put it).
NWN thank you very much! Unfortunately I need to use specifically this version for my university coursework
Apple that’s a shame. What university/subject is this? Lectures should really update their course materials + backend. In GHC 7.10 the “Functor-Applicative-Monad Proposal” was implemented that made all Monads have Applicative as a superclass, which is a pretty big change and something that should be part of your learning ecosystem.
amazing speaker
Idear
parzer
Graham Hutton wasn't even born when the early papers on parsing were written. Is this dude even real?????
Thanks for sharing the link to the library, I will drink a beer now, and try to write my own parser.
I had to google the pronounciation of parsing. I didn't know that there were words pronounced differently in en-US and en-GB.
There are all kinds of words that are pronounced differently in each dialect
So Parser is just ReadS?
Did prof Hutton's code check for leading zeroes being invalid for integer like input was "000123" would be kind of invalid input for an integer. Even "022" would be invalid in a traditional sense for an integer.
My parser in php is just a function with string_replace() and preg_replace_callback() in it. I think a parser is pretty easy stuff until you have nested matches.
I'm not going to lie, this seems like a terrible idea. Using repeated string replace is a great way to turn your parser (anything that fills the ROLE of a parser, no matter the mechanism) into a turing machine that treats malformed/malicious input as a turing-complete programming language.
I'd recommend looking into the LangSec approach to parsing. The computational power of the automaton you use to parse input should match the complexity of Grammer required to describe the set of valid inputs. No more and no less.
That was awesome. Thank you!
Professor Hutton looks like Moby.
This video inspired me to write my own parser, compiler, and executer in C++ for a JS like language I just invented
> has got a Mac
> codes on Vi
vdisj ::= "either" vp "or" vp
ndisj ::= "either" np "or" np
"An expression can either be a term plus an expression or a term."
Parse error!
Okay so now let's build this in C
I thought I was having a stroke at 1:40!
Using C could to implement a separate lexer, and parser using the shunting yard algorithm for infix expressions is more didactic IMO
You should make a video. I would like to see how it works
Haskell can be a bit of a 'culture shock' to people like me who are used to procedural programming, but I definitely want to learn more about it.
Interesting. How do combinator parsers compare with table parsers like Earley parsers? Is it possible yo combine approaches?
s/yo/to/ !
So it you have a structure of sort with leveled information and break out thing that meet specific conditions, is that parsing your explanation is so filled with lingo?
I hear you talk aaout returning trees and pairs is that a request for being a parser.
What about if you split up a string consisting of a leveled record tree" maybe first into substrings knowing which type of information in each string and the sub iterate the record structures within them using the record brackets and identifiers isn't that parsing top down?
What is the main definition of a parser, i thought it was breaking out information from structured data using rules?
@@JmanNo42 Yep, and the structured data is always a tree when calculating expressions. Operations always return a single result.
I have no Idea how this worked...
The tricky part for me is how it knew how to sequence the execution. How is the priority of * over + understood?
It uses recursion. Higher precedence operators appear lower in the stack. It does multiplication before adding because in the grammer and parser multiplication has to be parsed and evaluated before addition. Look up Jack Crenshaw's series Let's Build A Compiler for another explanation. His approach uses recursive descent parsing instead of parser combinators but it is the same idea
at the last example, I was expecting to either have something parsed or get the "unconsumed" string... but the result was nothing with nothing "unconsumed"
It's giving a list of all the ways the input can match an expr with something left over. The last example gives the answer "it can't be an expr with anything left over". In a context where you could have an expr or something else, it would first pass the whole string to the expr parser, then pass the whole string to the other parser; choices remember the original string, and pass that to each parser, rather than passing the unconsumed part to the next parser like sequences do. If the expr parser gave [(nothing,"(2+4")], that would mean it was okay for an expr to consist of nothing.
No Tripod?
can anyone tell me a practical use for parsers? like I watched the entire video and know what they do, but I feel a little unguided, in what common situation you'd use specifically a parser and not just a for loop
I mean, a parser could be built on a for-loop. Parsing is basically just the process of reading some data, like a mathematical expression, and understanding it.
Math expressions can't always be calculated left-to-right (i.e. in the order you or your program would read them), like (1 + 3 * (7 - 1)), so one approach you may take is to decode the expression into objects -- a tree structure -- and then, after you've read and understood the whole expression, you'd use that data to calculate it in the right order.
The video's approach is to use functions that are chained in sequence, and those functions could in theory use a for loop. For example, we may have a parser function for variable names, which would want to use a for-loop to read until it sees a character that can't be a variable name. We may have a parser function for whole numbers rather than single digits, which would take the same approach.
Programming language interpreters and compilers. JSON parsers, syntax highlighters, linters, transpilers (like Babel) web browsers that parse html and css, web scrapers all use parsers. Whether the average working programmer uses them is another story. Another example would be if you find yourself making a DSL for adding content to your video game and would rather not write the content in C#, you might consider writing a parser. Anytime you take input and transform it to produce some for of structured output, you are in essence writing or using a parser
According to LangSec, all input should be parsed into a well defined structure (i.e. a type that represents only the set of valid input) before being processed. That way you aren't making assumptions about your input that can be exploited by attackers. The type of parser used should match the complexity of the Grammer used to define valid input, following Chomsky's hierarchy of Grammers.
What is the Haskell parsing library without IO, Maybe, Option and Exceptions?
Well, the same I guess.
@@bingloveskoki Nah, this one is just a toy :)
painful
that's an interesting topic and it was well explained for as much as I watched but ultimately I had to stop at because of the mouth noises. Give that poor man a glass of water
For those interested in funtional parsing in python, check the parsy library.
This video is confusing.
It's confusing how he doesn't put `Parsing.` before the things imported from that library. Which words are keywords of the language and which words are imported?
It's confusing how he has words differing only in case. `char` and `Char`.
The library has `P` quite a few times. What is `P`? In some lines it looks like it's the name of an argument to a function being defined, but some other functions don't have `P` in the place where I'd expect arguments to be named.
It's confusing how he started with a bit of the defining the parser, then jumped to high level using his already finished parser library, then briefly showed us a flash of the library for a few seconds, then went back to using it again. How does it even work?
18:60 It's confusing that he keeps refering to this as parsing, giving us the idea of the tree mentioned earlier being the output. "How does `return (x+y)` build up that tree" I asked. - It doesn't, it's evaluating the expression.
`char` is a function, `Char` is a type. Types and constructors start with upper case.
`P` is a data constructor, which is a function. If you think of `Parser a` as a type wrapping a `String -> [(a,String)]`, then `P` is the constructor that has type `P :: (String -> [(a,String)]) -> Parser a`.
In one place (the definition of function `parse`), `P` is on the left hand side, to pattern match and extract the `String -> [(a,String)]` from `Parse a`.
You don't have to prefix unqualified imports.
The bit about the passing primitives is uninteresting, as it's pretty much bit masking on character literals.
The combinators are... Probably the most magical and important part of it.
To make it construct an AST, replace "+, *" with AST constructors.
import argparse