The Algorithm Behind Spell Checkers

b001

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 січ 2024
🖥️ GitHub: github.com/b001io/wagner-fischer
⭐ Join my Patreon: / b001io
💬 Discord: / discord
🐦 Follow me on Twitter: / b001io
🔗 More links: linktr.ee/b001io

КОМЕНТАРІ • 270

@e2myearly288 4 місяці тому ⁺¹⁶¹⁰
This video wouldve been super helpful 3 years ago in college. A professor had us make a spellchecker. It didnt not go well
@NoGentle 4 місяці тому ⁺⁷¹
serious question:
when you gotta make a program or a piece of code, whatever.
how "original" it needs to be until it is acceptable to you?
i mean, how many lines of code you can copy without have a guilty conscience? (not literally, but you get it, i think)
also having in mind that you don't know that specific algorithm but you got to do it anyway
@GetUrFunnyUp 4 місяці тому ⁺⁸⁶
@@NoGentleIt relally does depend on what you are trying to achive
If it's for learning purpopses why would you use someone elses solution to a problem, why not make it yourself? that implies that by copying you mean literally copying the code line by line but if by copying you mean that someone just has the idea of the solution to it
you solve x by doing z thing and y thing
you still have to code z and y thing even tough you know in what way you should i think these are what you call patents
@raymondarrington5339 4 місяці тому ⁺³⁰
@@NoGentleI would say that if you are programming something and you copy code because you know it but don’t want to type it all out then it’s fine. Alternatively, you could also copy code to try and pick it apart and learn it better. There are no rules though, so do what you think is best for your situation
@krolmuch 4 місяці тому ⁺⁵
it wouldn't help you at all... you can't do basic research
@e2myearly288 4 місяці тому ⁺²⁵
@@krolmuch bro what's with the attack? I'm a visual learner. I struggle to read. Video education is just easier for me to understand. It was mostly a joke anyway.
@bhushanlaware 4 місяці тому ⁺²⁰²
It took 20 years to solve the Edit Distance problem for the first time, but they want us to solve it in 1 hour of interview.
@seanpe8474 3 місяці тому ⁺³³
honestly this does open some interesting philosophical ideas about how genius solutions and algorithms come to be. the best ideas are those that even though took a while to come up with, are comparatively easy to teach after they've been discovered.
@gamefun2525 2 дні тому
you’d better consider yourself lucky if you get edit distance asked in an interview. It’s popular, intuitive and fairly moderate in complexity. I mean, it’s solving a real world problem and I’m all for it. There can be many DP problems that are just bad for interviews.
@andrewjknott 3 місяці тому ⁺⁴⁵⁴
Excellent explanation. Modern spell checkers also use other techniques. One is transposition because that is one of teh most common spelling mistakes. Another is nearness of letters on the keyboard because people can mistype letters that are clise to each other.
@haidaralhassan4621 3 місяці тому ⁺⁹⁹
excellent showcasing of tranpsosition and nearnesd
@awesomedude3247 3 місяці тому ⁺¹³
it seems like the modern ones bridged the difference betweens actual spelling errors and what we might call typos
@arandomguy9669 3 місяці тому ⁺³³
I see whay yuo did there
@haidaralhassan4621 3 місяці тому ⁺⁸
@@arandomguy9669 hwat a mitzure
@ScienceSuds 3 місяці тому ⁺³⁹⁵
This algorithm actually paved the way for a lot of modern bioinformatics algorithms used to align two DNA sequences together, some of the most famous being Smith-Waterman and Needleman-Wunsch! It’s so cool to see the overlap!
@aswinsnair1702 3 місяці тому ⁺¹⁴
do you know where i could find more about bioinformatics algorithms?
@geekzombie8795 3 місяці тому
@@aswinsnair1702^^
@ThePituLegend 3 місяці тому
@@aswinsnair1702try look up the terms that @ScienceSuds commented in Google Scholar, as well as terms such as "Sequence Alignment". There's really a ton of work in this field!
@jenithmehta9603 3 місяці тому
@@aswinsnair1702 search for fasta and blasta methods
@mantacid1221 3 місяці тому ⁺³
Oh yeah don’t biologists check for mutations and differences in a genome by pasting it into word and spellchecking it when the original is in the spellcheckers dictionary?
@portalwalker_ 4 місяці тому ⁺¹⁰⁶
I always thought spellcheckers would incorporate the keyboard layout into their suggestions, as in correcting "worls" to "world", because s is one key away from d
@natescode 4 місяці тому ⁺²⁵
I'm sure some do.
@jacksondeane1629 3 місяці тому ⁺⁸
Same! I’m always like “why can’t you tell that I just missed one letter!!!”
@stt.9433 3 місяці тому ⁺¹⁵
Keep in mind many different keyboard layouts exist. You could also have a case where a written file is OCR'd in which case that wouldn't be relevant.
@seanewing204 3 місяці тому ⁺¹
This and parts of speech.
Maybe track the most common errors based on vocab and document length, like the UA-cam algorithm recommending videos based on age, gender, etc.
@ArchiWorldRuS 3 місяці тому ⁺²
Of course it does now. There is a video from Enrico Tartarotti released recently "The LIES That Make Your Tech ACTUALLY Work" where you can learn more about your idea and how it is implemented!
@ToddVanyo 4 місяці тому ⁺⁴³
Would have loved UA-cam 30 yrs ago. In my day-yeah, I’m old-I had a class in which the last assignment was an assembly program for the intel 8086 that implemented a spellchecker. Prof said it would take 40 hrs if we knew what we were doing. No mention of Levenstein, Gorin, or any known algos. I took a 0, as I was behind in other things.
@samueljehanno 3 місяці тому ⁺⁵
That's insane
@thekwoka4707 4 місяці тому ⁺¹⁴⁶
What is an interesting addition to the algorithm is actually providing a list of the changes between the two, like for a typewriter.
@maker0824 4 місяці тому ⁺¹²
I read this like 7 times and I can’t tell what you are trying to say
@phiscz 4 місяці тому ⁺¹⁰
@@maker0824 not sure what they meant w the typewriter mention but i read this as to mean implementing something that provides a diff-like output (it being character-by character instead of by line tho)
@tomchapman128 4 місяці тому ⁺¹⁵⁵
You should absolutely make more videos like this! You're extremely good at explaining things and this video was genuinely so interesting. Well done :)
@birch_tacos 4 місяці тому ⁺¹⁶
that was the best explanation of Dynamic programing ive ever heard
@stacklysm 4 місяці тому ⁺³
It's very nice to discover dev channels with quality content and interesting topics, keep up the good work!
@anirudhbakare3547 4 місяці тому ⁺²³
Love your work ❤. Make a series on Programming Algorithms 🙌
@daveys 4 місяці тому ⁺³⁷
I used a Levenshtein program to match software names and categories to a list that I scraped from somewhere or other. Worked nicely, but took a while to run, even running on 12 cores because it was running 140,000 unsorted items against 40,000 items with a category and type. Still, 5mins isn’t bad compared to how long it’d have taken to do it manually.
@OPTechpure 3 місяці тому ⁺⁴
im using it in my application to actively read text boxes and compare them to a script im using.
@felixstuber8046 3 місяці тому ⁺³²
Levenshtein algorithm can also be extended to calculate the Damerau-Levenshtein distance. Simply put, this means that you get another operation that switches two neighbouring letters. E.g. the words "world" and "wordl" have Levenshtein distance 2, but Damerau-Levenshtein distance 1 since it is enough to switch the last 2 letters. Especially in keyboard typing, such errors are common. It is also possible to fine tune even more by giving weights different from 1 to the operations.
@stt.9433 3 місяці тому ⁺⁴
Essentially what they're proposing is that the weight of of 2 substitution actually has the weight of 1 substitution. And this goes into a much deeper topic which is that realistically the weight between each mistake(insertion, deletion or substitution) is actually not equal. There are things like phonetic mistakes where two similarly phonetic letters are interchanged, happens a lot in French for example. Common spelling errors have roots and generally it's because of phonetics, double consonants sound the same as single consonants etc...
In the Deep Learning approach, you could build a model which would in fact be able to extract these features including not by limited to insertion, deletion, substitution, phonetic difference, common spelling mistakes that would determine the true distance between spelling errors and better determine your intent when writing.
@marcusaurelius6607 3 місяці тому
@@stt.9433It’s not a proposal, it’s an algo from 1965, a base for all search engines. levenstein distance in pure form is very insufficient for real applications and been used since forever without any need for ML (doh)
@Simplified-Script-Development 4 місяці тому ⁺¹
i like your videos, because it dive deep in tiny important stuf which realy helps a lot
@justinmayhew6848 4 місяці тому
This was a great video, this explanation was made so intuitive and I have wondered in the past how spell checkers work
@thekwoka4707 4 місяці тому ⁺⁴³
levenshtein distances is basically a pathfinding algorithm.
@Zaary 4 місяці тому ⁺⁵
what??? its not even remotely close to that
@chesstipsandtricks420 4 місяці тому
@@Zaary i agree
@NachitenRemix 4 місяці тому ⁺¹
Yes, its working out the unknown path (there could be more than one) from one word to another, thats true.
@TheHDreality 4 місяці тому ⁺¹¹
For anyone confused he's saying that because the levenshtein distance is considered a "metric space" Which basically means that if you imagine all strings as points in space, that the levenshtein distance works much the same as distance in real space.
It sounds kind of meaningless at first but if you use it that way it actually unlocks certain properties of strings that enable some other clever algorithms for searching text.
@Zaary 4 місяці тому ⁺¹
pathfinding makes it possible to backtrack, this does not, it has only 1 thing in common with patfinding - finding the shortest path, this alghoritm however works completely different from pathfinding and has nothing in common with it.
@pemessh 4 місяці тому
Oh my god! This was so simple to understand.
Thank you so much. Please keep these coming :)
@kellybmackenzie Місяць тому
This is awesome, I learned a lot, thank you! I heard that some spell checkers use tries (prefix trees) for better auto-completion. I'd love to see a video on those as well, I adore your way of explaining!!
@andrejvujic 3 місяці тому ⁺²
Great video! You should make more where you explain interesting algortihms. Maybe you can do Bresenham’s line drawing algorithm next. Keep it up. 😃
@TeamDman 3 місяці тому
Thank you for this! The visuals are great!
@mikeygduv 3 місяці тому
Great video! Love the username. I will definitely be watching more and subscribed. I'm a python novice and don't code but love to see how the sausage is made. Maybe one day I'll get into the sausage biz.
@VFPn96kQT 3 місяці тому
Although I was familiar with the algorithms presented in the video visualizations were great and helped to understand them much better. Thank you.
@FriendlyCodeBuddy 2 місяці тому
Great video! Thanks for the excellent explanation. I found it really friendly and easy to understand.
@jemandev 3 місяці тому ⁺²
I would love to see more of these history of algorithms videos.
@nexcode_ir 25 днів тому
It was extremely wonderful. Thanks for your great explanations 😍
@54peace 4 місяці тому
I saw "spell checker alogrithm" I subbed. thanks for the video and hoping to see moreeee!❤
@egelbets 4 місяці тому ⁺⁵
Wow this is also what i learned in uni but then in the context of DNA sequences because they can also have deletions, insertions, and substitutions (i studied bioinformatics)
@kunalsoni7681 4 місяці тому
Wow this video is amazing and now I have learnt the core concept behind the spell check :)
@ekardonsenior2894 Місяць тому
i was hoping to learn about the modern algortihms, but well, now i know the history behind it. hope to see a part 2
@robharwood3538 3 місяці тому
Great video! Thanks for making it!
I would love to see you expand this video/topic to include the use of different types of edits having different probabilities and/or 'costs', which is a useful and interesting application for things like calculating the 'distance' between two things which have different physical/theoretical processes for causing different kinds of edits.
For example, in DNA sequences, nucleotide substitutions might be much more common than deletions or insertions. And perhaps deletions are more common than insertions -- or vice versa.
One way to model this is to have less-common types of edits 'cost' more than more-common ones. Another way to model this is to go by actual probabilities (aka likelihoods).
There are algorithms which incorporate such ideas, and can be solved in a similar way to the Wagner & Fischer method, but unfortunately I can't recall the name(s) off the top of my head.
But still it is both a really interesting question with really interesting and instructive solutions, so IMHO I think it would make a great topic for a follow-up video. What do you think?
Cheers!
@Tom-lz9pu 3 місяці тому
As others also pointed out this is so similar to needleman Wunsch and or Smith-Waterman, and its insane to me. I learned about bioinformatics algorithms and now end up in a situation in which I can think of sequence aligning being responsible for my spellchecking
@garancegourdel5681 3 місяці тому
Nice video, very pedagogical, if you ever get tempted to make a follow up there is an optimization where instead of computing the entire matrix you only compute the distance bellow a threshold d, this corresponds to computing a wide diagonal in the middle of the dynamic programming matrix.
@dacixn 4 місяці тому
Great explanation, and your voice is pretty soothing
@Leet.Time. 4 місяці тому
What an exceptionally good and well researched video
@CartoType 3 місяці тому ⁺¹
Two points: it’s actually the Damerau-Levenshtein algorithm; and the implementation given is O of n^2, which is unnecessary. You can use a moving window into the grid that is a diagonal stripe wide enough to hold the maximum acceptable edit distance. That makes the algorithm O of n.
@CartoType 3 місяці тому
I meant that the commonly used algorithm is Damerau-Levenshtein.
@LukasSmith827 4 місяці тому ⁺¹
really helpful ngl, didn't know much about spell checkers! but now i understand we really need NN's in this area because of how bad the functions are
@Carberra 3 місяці тому
I've done some word with Levenshtein distances in the past, but it's cool to see what's actually going on under the hood. Thanks for this!
@ralphvirtucio4328 4 місяці тому
This was a great video, MAKE ANOTHER ONEEEE !❤❤❤
@bensadik 3 місяці тому
Thank you so much for this video!
@TonyTrippier 3 місяці тому ⁺¹
6:36 Wagner-Fischer algorithm looks a lot like NeedleMan-Wunsch algorithm(it also a dynamic programming algorithm that is used for alignment of nucleotide, protein and other genetic sequences). It’s possibly the same algorithm but repurposed for alignment in genetic sequences.
@juxtopposed 3 місяці тому
This is very interesting! Great video.
@linusschonrath956 4 місяці тому
i really thought it was just ai, didnt realize it was already this old, good video quality!
@frankkevy 2 місяці тому
Just amazing explanation
@mico3454 4 місяці тому ⁺²
Can you upload DSA contents with visualizations? It would really help. Enjoyed this video, will try implementing it myself.
@Vortex-qb2se 2 місяці тому ⁺¹
Omg could this be a channel about algorithms? 🤩
@justblue6922 3 місяці тому
Amazing video extremely interesting, simple and high quality
@Chris-cx6wl 3 місяці тому
Algorithms with historical context videos are the best.
@ebol08 4 місяці тому ⁺¹
Wonderful video!
@dr7049 3 місяці тому
Amazing video! Well done!
@smallant. 4 місяці тому ⁺⁴
I thought it'd be obvious to incrporate the distance between two letters on a keyboard into the calculation but I was surprised that after so many iterations its still not there!!!
@a_nerd_on_the_internet 4 місяці тому
I love the kingsman reference
@dasten123 3 місяці тому
This was awesome!! I think I just found an awesome new channel :D
@praneethsaitunuguntla7751 3 місяці тому
super informative .. thanksss
@npip99 3 місяці тому ⁺¹
Best way to teach Dynamic Programming is just simple hashmap memoization of the recursive function, and only teaching the 2D matrix after solving multiple DP problems with memoization.
@chloedelaware2922 4 місяці тому ⁺²
this video is causing flashbacks to the time I wrote autocorrect for bash
@BenjaminBobkin 4 місяці тому ⁺¹
Great video. How did you create thise animations. Did you use manim or something else, or have you done it with after effects. Im just curious
@nikhilweee 3 місяці тому ⁺¹
I'm curious too!
@bigjamar 4 місяці тому
excelente !!! muchas gracias !!!
@prestonhall5171 3 місяці тому
Dynamic Programming is one of the coolest design techniques in computer science. First time I learned it I was amazed. Cudos to the Richard Bellman who first developed the idea for it
@samgoel4283 3 місяці тому ⁺¹
This algorithm is very similar to one you use in finding longest common subsequence between 2 strings a very popular LeetCode question
@Alexander-zt9kz 3 місяці тому
As well as Edit Distance ( same as LCS ) - This is the first thing I noticed when he started explaining the video, as it felt as if I had solved this before.
@honestarsenalfan7038 3 місяці тому
my head hurts, crazy sudoku
@zCodeCAE 3 місяці тому
The matrix is similar to an action table used to determine the symmetry of a group in accordance to an operation. Basically the math of dp which is you think of it is fractal
@BestPlacesTo_ 3 місяці тому
Very good video, but I guess using a binary search tree on a pre-sorted list for words is more efficient which would make a worst case of roughly O(log n) in the above example. It will also perform both checking if the word is correct or not and finding the suggested words by traversing through the tree only one time. correct me if i'm wrong please
@pmccarthy001 3 місяці тому
That's very interesting. I'm wondering if perhaps this algorithm could be implemented more efficiently in an array programming language like APL or J?
@djangoworldwide7925 4 місяці тому ⁺²
Wow I really want to write the levenstein algo!
@tylerbakeman 3 місяці тому
One thing I’ve always wondered is how with find the distance between strings from a dictionary- if the strings contain a close substring, starting at an ambiguous index.
It’s not very intuitive, but thanks for the video.
@mishadanilenko955 3 місяці тому ⁺¹
How do you do this animations? Are you using some kind of library like Manim?
@juxuanu 3 місяці тому ⁺²
Whenever I see matrices, I think GPU. GPU accelerated spell checker?
@arenmee540 3 місяці тому
now this is awesome
@adimascahyaning9202 4 місяці тому ⁺⁹
This algorithm is also used in the field of bioinformatics, to solve sequence alignment problem.
@MorRobots 3 місяці тому
I wrote this version of the Levenshtein formula in C just now. It's recursive, however I optimized the two length checks so they only happen once and the we just increment the length value down as we increment the string pointer up.
/*Levenshtein distance formula*/
#define min(a,b) ((a < b) ? a : b)
int _lev(char *s1,int sl1, char *s2, int sl2);
int lev(char *s1,char *s2)
{
int sl1 = strlen(s1);
int sl2 = strlen(s2);
return _lev(s1,sl1,s2,sl2);
}
int _lev(char *s1,int sl1,char *s2,int sl2)
{
if (sl2 == 0)
return sl1;
if (sl1 == 0)
return sl2;
if (s1[0] == s2[0])
return _lev(s1+1,sl1 - 1,s2+1, sl2 - 1);

int a = _lev((s1+1),sl1-1,s2,sl2);
int b = _lev(s1,sl1, (s2+1),sl2-1);
int c = _lev(s1+1, sl1-1,s2+1,sl2 - 1);
return 1 + min(min(a,b),c);
}
@IStMl 4 місяці тому
Reminds me of my 1st semester in uni, but ZIEGE and TIGER instead of FLOAT and BOAT
@sandiguha 4 місяці тому
what a great video. food for curiosity
@noahprentice751 3 місяці тому
great video!
@lucasalvesdossantos3993 23 дні тому
Your explanation helped me a lot! But i think that I identificate a little mismatch in your explanation: i think that when m[0][j] == m[i][0] we should to copy the value in m[i-][j-1] instead of select the minimun value of the three neighbors positions. In some tests your method works, but sometimes it fails. Sorry for my english...
@Pedritox0953 4 місяці тому
Great video!
@crazyguy3337 27 днів тому
whats the font? looks good
overall nice video
@GothGuy885 3 місяці тому
something that I have always found interesting and Amazing , and have wondered about, is How
windows defrag works.
how it goes through everything, sorts, puts things aside that are in the wrong place, deletes data
that is no longer needed, and then reassembles everything in the correct order.
I have a hard time getting my head around how it does that! 😵‍💫
@mohdmajid4309 3 місяці тому ⁺¹
00:01 Spellcheckers rely on a sophisticated algorithm for accuracy
01:33 The Lenin distance algorithm was crucial for enhancing spell checkers.
03:10 The algorithm follows guard clauses and recursive comparisons.
04:54 Lenin distance algorithm is not practical due to its recursive nature
06:37 Wagner-Fischer algorithm uses dynamic programming for efficient spell checking.
08:23 Explanation of operations involved in transforming strings.
10:02 Wagner Fisher approach calculates edit distance efficiently
11:42 Spell checkers use edit distance to suggest correct words.
Crafted by Merlin AI.
@michaeldula462 3 місяці тому
not only is he a communist, he's also a computer scientist!
crafted by a meatbag
@maaxxaam 3 місяці тому ⁺¹
Oh no, communists are back to destroy computer science with the Lenin algorithm 😂
@michaeldula462 3 місяці тому
interesting, my quip about the Lenin distance is deleted? Did I offend a communist?
@Will_of_Iron 3 місяці тому ⁺¹
@@michaeldula462I guess UA-cam took it personally lol
@RealUniquee 3 місяці тому
Quite a interesting history.
@kyngcytro 3 місяці тому
Could add a cache layer so we never have to check a misplaced word more than one. That counts for an easy improvement.
@CornFlaekk 3 місяці тому
What theme do you use for VSCode? It looks so good
@TesIaNikola 3 місяці тому
I tried thinking of a way to check against a dictionary faster. while Levenshtein distance is computable in O(nm), using it repeatedly would lead to O(nmk) if the dictionary has k words. The string space sort of behaves like a metric space, with stuff like the triangle inequality. I believe in computational geometry we know how to efficiently find the “k nearest neighbors” in Euclidean space, but Idk how to do that for the space of strings . I was curious if there’s a way to use Levenshtein distance smartly to only perform something like log k queries. If that were possible, the running time would effectively be O(log k) since the lengths of individual words are much smaller than the length of an entire dictionary.
@ayoubelmhamdi7920 4 місяці тому ⁺⁵
tou make Levenshien algorithm so easy, that's so much
@rocketmanhowie6623 3 місяці тому
can we have a keyboard/setup tour
@idocoding2003 3 місяці тому ⁺¹
Nice video 👍👍❤
@mannyc6649 3 місяці тому
Could you weigh the edit distance to favor letter substitutions that are physically close in the keyboard?
@nytrocide007 3 місяці тому
9:30 why so? is this equivalent to the square bracket in the levenshtein formula? if yes, which box stands for which formula in the square bracket?
or perhaps this is left as an exercise for the reader lmao. im a bit lazy ill look over it one more time😅
@shinobi5189 4 місяці тому
this is some good content
@derstreber2 3 місяці тому
11:29
In your wagner_fischer implementation, why are you incrementing change? (line 17) If "previous_row[j-1]" was guaranteed to always be the smallest value, and none others shared that value, maybe it would work. Why not choose the minimum first and then add 1 to it after checking if the two letters are not the same? Or am I misunderstanding something?
@pqrnr 4 місяці тому
Why does this awesome channel have such a low number of views??
@vectasus 17 днів тому
I presume this wagner-fisher algorithm is also what is behind the edit distance (file diffing) in git
@TheTrainWatch 4 місяці тому ⁺¹
Do modern spell checkers take into account likely errors due to typing. Ie onky is probably only, it’s not only one edit distance away, but that edit is only one key away too.
@sapandeepsinghsandhu480 13 годин тому
u r amazing bro u directly helping me doing my P.hD
@mews75 4 місяці тому
Awesome video
@JyothiSumerGoudMaduru-bm1rp 3 місяці тому
I was also thinking there is some kind of trie based solution
@akilhimu5735 3 місяці тому
Edit distance is a famous problem ask in software engineering interview!
@thinzin101 4 місяці тому
really cool video
@pinoykun3325 4 місяці тому
If you have enough data, may create a map of all wrongly type words as key then the values would be an array?
@mwlo8635 3 місяці тому
Which tool are you using for the slides and transitions?
@rembautimes8808 4 місяці тому
Great video joined as a sub .

Наступне

Автоматичне відтворення

Understanding B-Trees: The Data Structure Behind Modern Databases