serious question: when you gotta make a program or a piece of code, whatever. how "original" it needs to be until it is acceptable to you? i mean, how many lines of code you can copy without have a guilty conscience? (not literally, but you get it, i think) also having in mind that you don't know that specific algorithm but you got to do it anyway
@@NoGentleIt relally does depend on what you are trying to achive If it's for learning purpopses why would you use someone elses solution to a problem, why not make it yourself? that implies that by copying you mean literally copying the code line by line but if by copying you mean that someone just has the idea of the solution to it you solve x by doing z thing and y thing you still have to code z and y thing even tough you know in what way you should i think these are what you call patents
@@NoGentleI would say that if you are programming something and you copy code because you know it but don’t want to type it all out then it’s fine. Alternatively, you could also copy code to try and pick it apart and learn it better. There are no rules though, so do what you think is best for your situation
@@krolmuch bro what's with the attack? I'm a visual learner. I struggle to read. Video education is just easier for me to understand. It was mostly a joke anyway.
honestly this does open some interesting philosophical ideas about how genius solutions and algorithms come to be. the best ideas are those that even though took a while to come up with, are comparatively easy to teach after they've been discovered.
you’d better consider yourself lucky if you get edit distance asked in an interview. It’s popular, intuitive and fairly moderate in complexity. I mean, it’s solving a real world problem and I’m all for it. There can be many DP problems that are just bad for interviews.
Excellent explanation. Modern spell checkers also use other techniques. One is transposition because that is one of teh most common spelling mistakes. Another is nearness of letters on the keyboard because people can mistype letters that are clise to each other.
This algorithm actually paved the way for a lot of modern bioinformatics algorithms used to align two DNA sequences together, some of the most famous being Smith-Waterman and Needleman-Wunsch! It’s so cool to see the overlap!
@@aswinsnair1702try look up the terms that @ScienceSuds commented in Google Scholar, as well as terms such as "Sequence Alignment". There's really a ton of work in this field!
Oh yeah don’t biologists check for mutations and differences in a genome by pasting it into word and spellchecking it when the original is in the spellcheckers dictionary?
I always thought spellcheckers would incorporate the keyboard layout into their suggestions, as in correcting "worls" to "world", because s is one key away from d
This and parts of speech. Maybe track the most common errors based on vocab and document length, like the UA-cam algorithm recommending videos based on age, gender, etc.
Of course it does now. There is a video from Enrico Tartarotti released recently "The LIES That Make Your Tech ACTUALLY Work" where you can learn more about your idea and how it is implemented!
Would have loved UA-cam 30 yrs ago. In my day-yeah, I’m old-I had a class in which the last assignment was an assembly program for the intel 8086 that implemented a spellchecker. Prof said it would take 40 hrs if we knew what we were doing. No mention of Levenstein, Gorin, or any known algos. I took a 0, as I was behind in other things.
@@maker0824 not sure what they meant w the typewriter mention but i read this as to mean implementing something that provides a diff-like output (it being character-by character instead of by line tho)
I used a Levenshtein program to match software names and categories to a list that I scraped from somewhere or other. Worked nicely, but took a while to run, even running on 12 cores because it was running 140,000 unsorted items against 40,000 items with a category and type. Still, 5mins isn’t bad compared to how long it’d have taken to do it manually.
Levenshtein algorithm can also be extended to calculate the Damerau-Levenshtein distance. Simply put, this means that you get another operation that switches two neighbouring letters. E.g. the words "world" and "wordl" have Levenshtein distance 2, but Damerau-Levenshtein distance 1 since it is enough to switch the last 2 letters. Especially in keyboard typing, such errors are common. It is also possible to fine tune even more by giving weights different from 1 to the operations.
Essentially what they're proposing is that the weight of of 2 substitution actually has the weight of 1 substitution. And this goes into a much deeper topic which is that realistically the weight between each mistake(insertion, deletion or substitution) is actually not equal. There are things like phonetic mistakes where two similarly phonetic letters are interchanged, happens a lot in French for example. Common spelling errors have roots and generally it's because of phonetics, double consonants sound the same as single consonants etc... In the Deep Learning approach, you could build a model which would in fact be able to extract these features including not by limited to insertion, deletion, substitution, phonetic difference, common spelling mistakes that would determine the true distance between spelling errors and better determine your intent when writing.
@@stt.9433It’s not a proposal, it’s an algo from 1965, a base for all search engines. levenstein distance in pure form is very insufficient for real applications and been used since forever without any need for ML (doh)
For anyone confused he's saying that because the levenshtein distance is considered a "metric space" Which basically means that if you imagine all strings as points in space, that the levenshtein distance works much the same as distance in real space. It sounds kind of meaningless at first but if you use it that way it actually unlocks certain properties of strings that enable some other clever algorithms for searching text.
pathfinding makes it possible to backtrack, this does not, it has only 1 thing in common with patfinding - finding the shortest path, this alghoritm however works completely different from pathfinding and has nothing in common with it.
This is awesome, I learned a lot, thank you! I heard that some spell checkers use tries (prefix trees) for better auto-completion. I'd love to see a video on those as well, I adore your way of explaining!!
Great video! Love the username. I will definitely be watching more and subscribed. I'm a python novice and don't code but love to see how the sausage is made. Maybe one day I'll get into the sausage biz.
Wow this is also what i learned in uni but then in the context of DNA sequences because they can also have deletions, insertions, and substitutions (i studied bioinformatics)
Great video! Thanks for making it! I would love to see you expand this video/topic to include the use of different types of edits having different probabilities and/or 'costs', which is a useful and interesting application for things like calculating the 'distance' between two things which have different physical/theoretical processes for causing different kinds of edits. For example, in DNA sequences, nucleotide substitutions might be much more common than deletions or insertions. And perhaps deletions are more common than insertions -- or vice versa. One way to model this is to have less-common types of edits 'cost' more than more-common ones. Another way to model this is to go by actual probabilities (aka likelihoods). There are algorithms which incorporate such ideas, and can be solved in a similar way to the Wagner & Fischer method, but unfortunately I can't recall the name(s) off the top of my head. But still it is both a really interesting question with really interesting and instructive solutions, so IMHO I think it would make a great topic for a follow-up video. What do you think? Cheers!
As others also pointed out this is so similar to needleman Wunsch and or Smith-Waterman, and its insane to me. I learned about bioinformatics algorithms and now end up in a situation in which I can think of sequence aligning being responsible for my spellchecking
Nice video, very pedagogical, if you ever get tempted to make a follow up there is an optimization where instead of computing the entire matrix you only compute the distance bellow a threshold d, this corresponds to computing a wide diagonal in the middle of the dynamic programming matrix.
Two points: it’s actually the Damerau-Levenshtein algorithm; and the implementation given is O of n^2, which is unnecessary. You can use a moving window into the grid that is a diagonal stripe wide enough to hold the maximum acceptable edit distance. That makes the algorithm O of n.
6:36 Wagner-Fischer algorithm looks a lot like NeedleMan-Wunsch algorithm(it also a dynamic programming algorithm that is used for alignment of nucleotide, protein and other genetic sequences). It’s possibly the same algorithm but repurposed for alignment in genetic sequences.
I thought it'd be obvious to incrporate the distance between two letters on a keyboard into the calculation but I was surprised that after so many iterations its still not there!!!
Best way to teach Dynamic Programming is just simple hashmap memoization of the recursive function, and only teaching the 2D matrix after solving multiple DP problems with memoization.
Dynamic Programming is one of the coolest design techniques in computer science. First time I learned it I was amazed. Cudos to the Richard Bellman who first developed the idea for it
As well as Edit Distance ( same as LCS ) - This is the first thing I noticed when he started explaining the video, as it felt as if I had solved this before.
The matrix is similar to an action table used to determine the symmetry of a group in accordance to an operation. Basically the math of dp which is you think of it is fractal
Very good video, but I guess using a binary search tree on a pre-sorted list for words is more efficient which would make a worst case of roughly O(log n) in the above example. It will also perform both checking if the word is correct or not and finding the suggested words by traversing through the tree only one time. correct me if i'm wrong please
One thing I’ve always wondered is how with find the distance between strings from a dictionary- if the strings contain a close substring, starting at an ambiguous index. It’s not very intuitive, but thanks for the video.
I wrote this version of the Levenshtein formula in C just now. It's recursive, however I optimized the two length checks so they only happen once and the we just increment the length value down as we increment the string pointer up. /*Levenshtein distance formula*/ #define min(a,b) ((a < b) ? a : b) int _lev(char *s1,int sl1, char *s2, int sl2); int lev(char *s1,char *s2) { int sl1 = strlen(s1); int sl2 = strlen(s2); return _lev(s1,sl1,s2,sl2); } int _lev(char *s1,int sl1,char *s2,int sl2) { if (sl2 == 0) return sl1; if (sl1 == 0) return sl2; if (s1[0] == s2[0]) return _lev(s1+1,sl1 - 1,s2+1, sl2 - 1);
int a = _lev((s1+1),sl1-1,s2,sl2); int b = _lev(s1,sl1, (s2+1),sl2-1); int c = _lev(s1+1, sl1-1,s2+1,sl2 - 1); return 1 + min(min(a,b),c); }
Your explanation helped me a lot! But i think that I identificate a little mismatch in your explanation: i think that when m[0][j] == m[i][0] we should to copy the value in m[i-][j-1] instead of select the minimun value of the three neighbors positions. In some tests your method works, but sometimes it fails. Sorry for my english...
something that I have always found interesting and Amazing , and have wondered about, is How windows defrag works. how it goes through everything, sorts, puts things aside that are in the wrong place, deletes data that is no longer needed, and then reassembles everything in the correct order. I have a hard time getting my head around how it does that! 😵💫
00:01 Spellcheckers rely on a sophisticated algorithm for accuracy 01:33 The Lenin distance algorithm was crucial for enhancing spell checkers. 03:10 The algorithm follows guard clauses and recursive comparisons. 04:54 Lenin distance algorithm is not practical due to its recursive nature 06:37 Wagner-Fischer algorithm uses dynamic programming for efficient spell checking. 08:23 Explanation of operations involved in transforming strings. 10:02 Wagner Fisher approach calculates edit distance efficiently 11:42 Spell checkers use edit distance to suggest correct words. Crafted by Merlin AI.
I tried thinking of a way to check against a dictionary faster. while Levenshtein distance is computable in O(nm), using it repeatedly would lead to O(nmk) if the dictionary has k words. The string space sort of behaves like a metric space, with stuff like the triangle inequality. I believe in computational geometry we know how to efficiently find the “k nearest neighbors” in Euclidean space, but Idk how to do that for the space of strings . I was curious if there’s a way to use Levenshtein distance smartly to only perform something like log k queries. If that were possible, the running time would effectively be O(log k) since the lengths of individual words are much smaller than the length of an entire dictionary.
9:30 why so? is this equivalent to the square bracket in the levenshtein formula? if yes, which box stands for which formula in the square bracket? or perhaps this is left as an exercise for the reader lmao. im a bit lazy ill look over it one more time😅
11:29 In your wagner_fischer implementation, why are you incrementing change? (line 17) If "previous_row[j-1]" was guaranteed to always be the smallest value, and none others shared that value, maybe it would work. Why not choose the minimum first and then add 1 to it after checking if the two letters are not the same? Or am I misunderstanding something?
Do modern spell checkers take into account likely errors due to typing. Ie onky is probably only, it’s not only one edit distance away, but that edit is only one key away too.
This video wouldve been super helpful 3 years ago in college. A professor had us make a spellchecker. It didnt not go well
serious question:
when you gotta make a program or a piece of code, whatever.
how "original" it needs to be until it is acceptable to you?
i mean, how many lines of code you can copy without have a guilty conscience? (not literally, but you get it, i think)
also having in mind that you don't know that specific algorithm but you got to do it anyway
@@NoGentleIt relally does depend on what you are trying to achive
If it's for learning purpopses why would you use someone elses solution to a problem, why not make it yourself? that implies that by copying you mean literally copying the code line by line but if by copying you mean that someone just has the idea of the solution to it
you solve x by doing z thing and y thing
you still have to code z and y thing even tough you know in what way you should i think these are what you call patents
@@NoGentleI would say that if you are programming something and you copy code because you know it but don’t want to type it all out then it’s fine. Alternatively, you could also copy code to try and pick it apart and learn it better. There are no rules though, so do what you think is best for your situation
it wouldn't help you at all... you can't do basic research
@@krolmuch bro what's with the attack? I'm a visual learner. I struggle to read. Video education is just easier for me to understand. It was mostly a joke anyway.
It took 20 years to solve the Edit Distance problem for the first time, but they want us to solve it in 1 hour of interview.
honestly this does open some interesting philosophical ideas about how genius solutions and algorithms come to be. the best ideas are those that even though took a while to come up with, are comparatively easy to teach after they've been discovered.
you’d better consider yourself lucky if you get edit distance asked in an interview. It’s popular, intuitive and fairly moderate in complexity. I mean, it’s solving a real world problem and I’m all for it. There can be many DP problems that are just bad for interviews.
Excellent explanation. Modern spell checkers also use other techniques. One is transposition because that is one of teh most common spelling mistakes. Another is nearness of letters on the keyboard because people can mistype letters that are clise to each other.
excellent showcasing of tranpsosition and nearnesd
it seems like the modern ones bridged the difference betweens actual spelling errors and what we might call typos
I see whay yuo did there
@@arandomguy9669 hwat a mitzure
This algorithm actually paved the way for a lot of modern bioinformatics algorithms used to align two DNA sequences together, some of the most famous being Smith-Waterman and Needleman-Wunsch! It’s so cool to see the overlap!
do you know where i could find more about bioinformatics algorithms?
@@aswinsnair1702^^
@@aswinsnair1702try look up the terms that @ScienceSuds commented in Google Scholar, as well as terms such as "Sequence Alignment". There's really a ton of work in this field!
@@aswinsnair1702 search for fasta and blasta methods
Oh yeah don’t biologists check for mutations and differences in a genome by pasting it into word and spellchecking it when the original is in the spellcheckers dictionary?
I always thought spellcheckers would incorporate the keyboard layout into their suggestions, as in correcting "worls" to "world", because s is one key away from d
I'm sure some do.
Same! I’m always like “why can’t you tell that I just missed one letter!!!”
Keep in mind many different keyboard layouts exist. You could also have a case where a written file is OCR'd in which case that wouldn't be relevant.
This and parts of speech.
Maybe track the most common errors based on vocab and document length, like the UA-cam algorithm recommending videos based on age, gender, etc.
Of course it does now. There is a video from Enrico Tartarotti released recently "The LIES That Make Your Tech ACTUALLY Work" where you can learn more about your idea and how it is implemented!
Would have loved UA-cam 30 yrs ago. In my day-yeah, I’m old-I had a class in which the last assignment was an assembly program for the intel 8086 that implemented a spellchecker. Prof said it would take 40 hrs if we knew what we were doing. No mention of Levenstein, Gorin, or any known algos. I took a 0, as I was behind in other things.
That's insane
What is an interesting addition to the algorithm is actually providing a list of the changes between the two, like for a typewriter.
I read this like 7 times and I can’t tell what you are trying to say
@@maker0824 not sure what they meant w the typewriter mention but i read this as to mean implementing something that provides a diff-like output (it being character-by character instead of by line tho)
You should absolutely make more videos like this! You're extremely good at explaining things and this video was genuinely so interesting. Well done :)
that was the best explanation of Dynamic programing ive ever heard
It's very nice to discover dev channels with quality content and interesting topics, keep up the good work!
Love your work ❤. Make a series on Programming Algorithms 🙌
I used a Levenshtein program to match software names and categories to a list that I scraped from somewhere or other. Worked nicely, but took a while to run, even running on 12 cores because it was running 140,000 unsorted items against 40,000 items with a category and type. Still, 5mins isn’t bad compared to how long it’d have taken to do it manually.
im using it in my application to actively read text boxes and compare them to a script im using.
Levenshtein algorithm can also be extended to calculate the Damerau-Levenshtein distance. Simply put, this means that you get another operation that switches two neighbouring letters. E.g. the words "world" and "wordl" have Levenshtein distance 2, but Damerau-Levenshtein distance 1 since it is enough to switch the last 2 letters. Especially in keyboard typing, such errors are common. It is also possible to fine tune even more by giving weights different from 1 to the operations.
Essentially what they're proposing is that the weight of of 2 substitution actually has the weight of 1 substitution. And this goes into a much deeper topic which is that realistically the weight between each mistake(insertion, deletion or substitution) is actually not equal. There are things like phonetic mistakes where two similarly phonetic letters are interchanged, happens a lot in French for example. Common spelling errors have roots and generally it's because of phonetics, double consonants sound the same as single consonants etc...
In the Deep Learning approach, you could build a model which would in fact be able to extract these features including not by limited to insertion, deletion, substitution, phonetic difference, common spelling mistakes that would determine the true distance between spelling errors and better determine your intent when writing.
@@stt.9433It’s not a proposal, it’s an algo from 1965, a base for all search engines. levenstein distance in pure form is very insufficient for real applications and been used since forever without any need for ML (doh)
i like your videos, because it dive deep in tiny important stuf which realy helps a lot
This was a great video, this explanation was made so intuitive and I have wondered in the past how spell checkers work
levenshtein distances is basically a pathfinding algorithm.
what??? its not even remotely close to that
@@Zaary i agree
Yes, its working out the unknown path (there could be more than one) from one word to another, thats true.
For anyone confused he's saying that because the levenshtein distance is considered a "metric space" Which basically means that if you imagine all strings as points in space, that the levenshtein distance works much the same as distance in real space.
It sounds kind of meaningless at first but if you use it that way it actually unlocks certain properties of strings that enable some other clever algorithms for searching text.
pathfinding makes it possible to backtrack, this does not, it has only 1 thing in common with patfinding - finding the shortest path, this alghoritm however works completely different from pathfinding and has nothing in common with it.
Oh my god! This was so simple to understand.
Thank you so much. Please keep these coming :)
This is awesome, I learned a lot, thank you! I heard that some spell checkers use tries (prefix trees) for better auto-completion. I'd love to see a video on those as well, I adore your way of explaining!!
Great video! You should make more where you explain interesting algortihms. Maybe you can do Bresenham’s line drawing algorithm next. Keep it up. 😃
Thank you for this! The visuals are great!
Great video! Love the username. I will definitely be watching more and subscribed. I'm a python novice and don't code but love to see how the sausage is made. Maybe one day I'll get into the sausage biz.
Although I was familiar with the algorithms presented in the video visualizations were great and helped to understand them much better. Thank you.
Great video! Thanks for the excellent explanation. I found it really friendly and easy to understand.
I would love to see more of these history of algorithms videos.
It was extremely wonderful. Thanks for your great explanations 😍
I saw "spell checker alogrithm" I subbed. thanks for the video and hoping to see moreeee!❤
Wow this is also what i learned in uni but then in the context of DNA sequences because they can also have deletions, insertions, and substitutions (i studied bioinformatics)
Wow this video is amazing and now I have learnt the core concept behind the spell check :)
i was hoping to learn about the modern algortihms, but well, now i know the history behind it. hope to see a part 2
Great video! Thanks for making it!
I would love to see you expand this video/topic to include the use of different types of edits having different probabilities and/or 'costs', which is a useful and interesting application for things like calculating the 'distance' between two things which have different physical/theoretical processes for causing different kinds of edits.
For example, in DNA sequences, nucleotide substitutions might be much more common than deletions or insertions. And perhaps deletions are more common than insertions -- or vice versa.
One way to model this is to have less-common types of edits 'cost' more than more-common ones. Another way to model this is to go by actual probabilities (aka likelihoods).
There are algorithms which incorporate such ideas, and can be solved in a similar way to the Wagner & Fischer method, but unfortunately I can't recall the name(s) off the top of my head.
But still it is both a really interesting question with really interesting and instructive solutions, so IMHO I think it would make a great topic for a follow-up video. What do you think?
Cheers!
As others also pointed out this is so similar to needleman Wunsch and or Smith-Waterman, and its insane to me. I learned about bioinformatics algorithms and now end up in a situation in which I can think of sequence aligning being responsible for my spellchecking
Nice video, very pedagogical, if you ever get tempted to make a follow up there is an optimization where instead of computing the entire matrix you only compute the distance bellow a threshold d, this corresponds to computing a wide diagonal in the middle of the dynamic programming matrix.
Great explanation, and your voice is pretty soothing
What an exceptionally good and well researched video
Two points: it’s actually the Damerau-Levenshtein algorithm; and the implementation given is O of n^2, which is unnecessary. You can use a moving window into the grid that is a diagonal stripe wide enough to hold the maximum acceptable edit distance. That makes the algorithm O of n.
I meant that the commonly used algorithm is Damerau-Levenshtein.
really helpful ngl, didn't know much about spell checkers! but now i understand we really need NN's in this area because of how bad the functions are
I've done some word with Levenshtein distances in the past, but it's cool to see what's actually going on under the hood. Thanks for this!
This was a great video, MAKE ANOTHER ONEEEE !❤❤❤
Thank you so much for this video!
6:36 Wagner-Fischer algorithm looks a lot like NeedleMan-Wunsch algorithm(it also a dynamic programming algorithm that is used for alignment of nucleotide, protein and other genetic sequences). It’s possibly the same algorithm but repurposed for alignment in genetic sequences.
This is very interesting! Great video.
i really thought it was just ai, didnt realize it was already this old, good video quality!
Just amazing explanation
Can you upload DSA contents with visualizations? It would really help. Enjoyed this video, will try implementing it myself.
Omg could this be a channel about algorithms? 🤩
Amazing video extremely interesting, simple and high quality
Algorithms with historical context videos are the best.
Wonderful video!
Amazing video! Well done!
I thought it'd be obvious to incrporate the distance between two letters on a keyboard into the calculation but I was surprised that after so many iterations its still not there!!!
I love the kingsman reference
This was awesome!! I think I just found an awesome new channel :D
super informative .. thanksss
Best way to teach Dynamic Programming is just simple hashmap memoization of the recursive function, and only teaching the 2D matrix after solving multiple DP problems with memoization.
this video is causing flashbacks to the time I wrote autocorrect for bash
Great video. How did you create thise animations. Did you use manim or something else, or have you done it with after effects. Im just curious
I'm curious too!
excelente !!! muchas gracias !!!
Dynamic Programming is one of the coolest design techniques in computer science. First time I learned it I was amazed. Cudos to the Richard Bellman who first developed the idea for it
This algorithm is very similar to one you use in finding longest common subsequence between 2 strings a very popular LeetCode question
As well as Edit Distance ( same as LCS ) - This is the first thing I noticed when he started explaining the video, as it felt as if I had solved this before.
my head hurts, crazy sudoku
The matrix is similar to an action table used to determine the symmetry of a group in accordance to an operation. Basically the math of dp which is you think of it is fractal
Very good video, but I guess using a binary search tree on a pre-sorted list for words is more efficient which would make a worst case of roughly O(log n) in the above example. It will also perform both checking if the word is correct or not and finding the suggested words by traversing through the tree only one time. correct me if i'm wrong please
That's very interesting. I'm wondering if perhaps this algorithm could be implemented more efficiently in an array programming language like APL or J?
Wow I really want to write the levenstein algo!
One thing I’ve always wondered is how with find the distance between strings from a dictionary- if the strings contain a close substring, starting at an ambiguous index.
It’s not very intuitive, but thanks for the video.
How do you do this animations? Are you using some kind of library like Manim?
Whenever I see matrices, I think GPU. GPU accelerated spell checker?
now this is awesome
This algorithm is also used in the field of bioinformatics, to solve sequence alignment problem.
I wrote this version of the Levenshtein formula in C just now. It's recursive, however I optimized the two length checks so they only happen once and the we just increment the length value down as we increment the string pointer up.
/*Levenshtein distance formula*/
#define min(a,b) ((a < b) ? a : b)
int _lev(char *s1,int sl1, char *s2, int sl2);
int lev(char *s1,char *s2)
{
int sl1 = strlen(s1);
int sl2 = strlen(s2);
return _lev(s1,sl1,s2,sl2);
}
int _lev(char *s1,int sl1,char *s2,int sl2)
{
if (sl2 == 0)
return sl1;
if (sl1 == 0)
return sl2;
if (s1[0] == s2[0])
return _lev(s1+1,sl1 - 1,s2+1, sl2 - 1);
int a = _lev((s1+1),sl1-1,s2,sl2);
int b = _lev(s1,sl1, (s2+1),sl2-1);
int c = _lev(s1+1, sl1-1,s2+1,sl2 - 1);
return 1 + min(min(a,b),c);
}
Reminds me of my 1st semester in uni, but ZIEGE and TIGER instead of FLOAT and BOAT
what a great video. food for curiosity
great video!
Your explanation helped me a lot! But i think that I identificate a little mismatch in your explanation: i think that when m[0][j] == m[i][0] we should to copy the value in m[i-][j-1] instead of select the minimun value of the three neighbors positions. In some tests your method works, but sometimes it fails. Sorry for my english...
Great video!
whats the font? looks good
overall nice video
something that I have always found interesting and Amazing , and have wondered about, is How
windows defrag works.
how it goes through everything, sorts, puts things aside that are in the wrong place, deletes data
that is no longer needed, and then reassembles everything in the correct order.
I have a hard time getting my head around how it does that! 😵💫
00:01 Spellcheckers rely on a sophisticated algorithm for accuracy
01:33 The Lenin distance algorithm was crucial for enhancing spell checkers.
03:10 The algorithm follows guard clauses and recursive comparisons.
04:54 Lenin distance algorithm is not practical due to its recursive nature
06:37 Wagner-Fischer algorithm uses dynamic programming for efficient spell checking.
08:23 Explanation of operations involved in transforming strings.
10:02 Wagner Fisher approach calculates edit distance efficiently
11:42 Spell checkers use edit distance to suggest correct words.
Crafted by Merlin AI.
not only is he a communist, he's also a computer scientist!
crafted by a meatbag
Oh no, communists are back to destroy computer science with the Lenin algorithm 😂
interesting, my quip about the Lenin distance is deleted? Did I offend a communist?
@@michaeldula462I guess UA-cam took it personally lol
Quite a interesting history.
Could add a cache layer so we never have to check a misplaced word more than one. That counts for an easy improvement.
What theme do you use for VSCode? It looks so good
I tried thinking of a way to check against a dictionary faster. while Levenshtein distance is computable in O(nm), using it repeatedly would lead to O(nmk) if the dictionary has k words. The string space sort of behaves like a metric space, with stuff like the triangle inequality. I believe in computational geometry we know how to efficiently find the “k nearest neighbors” in Euclidean space, but Idk how to do that for the space of strings . I was curious if there’s a way to use Levenshtein distance smartly to only perform something like log k queries. If that were possible, the running time would effectively be O(log k) since the lengths of individual words are much smaller than the length of an entire dictionary.
tou make Levenshien algorithm so easy, that's so much
can we have a keyboard/setup tour
Nice video 👍👍❤
Could you weigh the edit distance to favor letter substitutions that are physically close in the keyboard?
9:30 why so? is this equivalent to the square bracket in the levenshtein formula? if yes, which box stands for which formula in the square bracket?
or perhaps this is left as an exercise for the reader lmao. im a bit lazy ill look over it one more time😅
this is some good content
11:29
In your wagner_fischer implementation, why are you incrementing change? (line 17) If "previous_row[j-1]" was guaranteed to always be the smallest value, and none others shared that value, maybe it would work. Why not choose the minimum first and then add 1 to it after checking if the two letters are not the same? Or am I misunderstanding something?
Why does this awesome channel have such a low number of views??
I presume this wagner-fisher algorithm is also what is behind the edit distance (file diffing) in git
Do modern spell checkers take into account likely errors due to typing. Ie onky is probably only, it’s not only one edit distance away, but that edit is only one key away too.
u r amazing bro u directly helping me doing my P.hD
Awesome video
I was also thinking there is some kind of trie based solution
Edit distance is a famous problem ask in software engineering interview!
really cool video
If you have enough data, may create a map of all wrongly type words as key then the values would be an array?
Which tool are you using for the slides and transitions?
Great video joined as a sub .