Rolling hash | Rabin karp algorithm | Pattern searching
Вставка
- Опубліковано 11 жов 2024
- This video explains the rolling hash technique with the help of rabin karp algorithm which is used for searching a given pattern in text. I have explained both rabin karp algorithm as well as rolling hash by taking suitable examples. I have shown how to create a stronger hash in order to match pattern in given text efficiently. This is a very frequently asked interview question as well as a frequently faced question in the coding round as well. CODE LINK for rabin karp algorithm is present below. If you find any difficulty or have any query then do COMMENT below. PLEASE help our channel by SUBSCRIBING and LIKE our video if you found it helpful...CYA :)
CODE LINK: gist.github.co...
For every confusing topic in a textbook or a course, there is a video by an Indian guy on UA-cam explaining it much more clearly. You sir, are a legend.
🤣 Thanks
Hilarious!
@@techdose4u Pls make a vedio on Approximation Algorithm. And it's various Types..pls bana dhijye khi nhi available hai
And then there are the guys who troll indian
no bhaiya didi bs, no time wasting, no overexplaining.
short and concise, to the point video.
Love you videos, perfect for an individual who values time.
Thanks
Coming from day 21 may challenge. He referred me to this video for rolling has. Now he referring to another video. He truly is a programmer!
🤣 hahahaaa
He is using rolling technique :p
Who is the 'he' y'all are talking about?
@@AyushMo The channel they mean. In other videos, he has asked his viewers to also refer to this video.
@@rohansamanta7819 ah okay, thanks
okay the rap in the end was icing on the cake, the explanation was clear, concise and to the point.
😂 Thanks
If I see a video by you, I heave a sigh of relief because I know that it will be superbly explained and that I need to look no further.
sir i have been continuously watching you videos. since then , i started loving programming ...a very big thumbs up
Thanks :) Recommend your friends if possible as well.
sir , does there no such hash function exists so that no collision occur as well as the size limit also not exceeds ?
We can implement hash using trie. That will be very simple. If you don't want collision then take larger hash size. If size is limited then resolve collision by linked list at each node. You can also use trie for hash implementation. Will put a video in future for it as well.
When it comes to learning new algorithm, I first check if Tech Dose has any video related to it, if its there then it becomes my first choice automatically
i was surching for a tutorial on how to roll hash joints but i watched the whole video in the end
:)
what a great job you are doing for programming community !!
Thanks :)
starting with trivial approach was a really nice move.
Thanks :)
how do they even come up with such algorithms!!
Really amazing explanation, Thanks a lot
You are my number 1 youtuber on algorithms and dsa. You simplify life sir
Wah Wah Wah !!!!! No words to express feelings. Really Awesome
Thanks :)
Thank you very much sir for saving me hours of pain and frustration
Welcome :)
Why UA-cam provide only 1 like button 😞....
btw best channel for placements🤗
😁
bhaisahab tumhare hash function ne to dimaag hi uuda diya bhaiiiiii
galat kehl gaye vro app to
i never thought of taking modulus at every addition, demmmm
kryptologisht101 :D :]
😅
I understand that in order to make sure that the hash function strong, we need to factor in the position of the alphabet in the pattern, instead of taking base value as the alphabet size =26, can we consider the size of the pattern itself, like for e.g. for pattern BAA, pattern size is 3, and the hash will be 2*(3^2) + 1*(3^1) + 1*(3^0)
For every DSA problem, there is a solution video by Tech dose. 🙏🏾🙏🏾🙏🏾🙏🏾🙏🏾🙏🏾🙏🏾🙏🏾🙏🏾🙏🏾
😅
This is a very nice explanation which helped me understand the benefit of a rolling hash (reusing the parts of the window you already calculated), and how to generate a stronger hash.
One minor comment: I don't think it is technically correct, in the case of base 26, to say you are shifting "bits". A bit is a binary digit, which by definition is base 2. You can just say "shift . . . by one digit".
In fact, to encode something with 26 characters, you'd need at least 5 bits per character. Thus, you need to shift at least 5 bits, each time. 🙂
I'm trying to roll hash for smoking but I still watched anyway 💯
Excellent explanation. No one explained the algorithm better than you. Good job
Concise and Exact explanation.Thank you.
Welcome :)
Communication skills are very Good 🔥👍
You are my hero...........best explanation
Thanks 😊
I don't understand why at 11:05 we are subtracting only (1 * 26^2)%p ? While creating hash for the 3 character long window we added it with two other characters' hash but after each addition we modded the result with p. i.e. hash[A] = (1 * 26^2)%p; hash[AA] = (hash[A] + 1*26^2)%p; hash[AAB] = (hash[AA] + 2*26^2)%p. The value of hash[A] has been modded twice in the final hash so how come while removing it we subtract it without any consideration for the mods. i.e. how come hash[AB] = hash[AAB] - hash[A] ? Shouldn't the double modding come into the equation somewhere?
Bhaiya ur videos r very helpful
Please upload more videos
Sure
Hello guy's welcome back to tech dose and in this video we will discuss.......so let's look at problem statement.... 🤣 🤣 🤣
These are my goto lines 😂
Superb explaination.
Thanks 😊
there are many things to remember in this ....1) collision(for X1 ,X2 if h(X1)==h(X2) and X1!=X2) , collision frequency 1/p where p is some big prime number . 2) an interesting mod property we r using here i.e for a[1......n] there exist a^-1[1......n] such that a*b*a^-1= b mod p, where p is some prime number .
ALGO: (general)
our class has 3 funnction
1. hash() -> return Hk;
2. pop()-> pop last value;
3. append() -> append new value;
constructor() -> take base , p {base in our case 26 , prime number p}
basic opperation including append and pop() // its intuitive...
Hw= (n-old*base^size-1)*base +new
__init__(base ,p){
hash=0;
constant=1 // constant= base^size mod p
}
append(new ){
hash= (hash*base + new ) mod p;
constant= (constant*base ) mod p;
}
pop(old){
hash=(hash- old*constant + old*p ) mod p // old*p is correction factor since hash-old*constant
Precise explanation.
Helps develop good intuition …
Great
The explanation I found so far
Beautifully Explained
Thanks mate! That's great explanation!
very nice and complete explanation!!!!keep making such videos
Sure :)
@@techdose4u i am implementing my own code as per your video but getting integer overflow at one place despite using mod operator!!! If i share my code will you be able to help?? It will be highly appreciated
it seems to be quite easier than it look like.
the video was very helpful, thankyou for saving my time.
Welcome :)
I really love your video, it really helped me understand!!
Nice :)
really??
While subtracting (11:08 min), shouldn't we subtract 1x26^2%p, we only subtracted 1x26^2, is this correct?
since 26^2 is already lower than 5381, so he didn't bother, but yeah in code you have to.
Amazing explanation
thank you for this beautiful explanation. sir, you are gain new subscribe...
Welcome :)
Great explanation sir!!!!!!!!!!!!!!!!!
Thanks
Awesome Explaination....can you please provide the code also for the above, the link you have given in description redirects to the GFG solution which is quite confusing
The best explanation.. Thank you
Welcome
Thanks a lot for the video!
Bhaiya after the hash improvement , why the worst case t.c is O(N.m)
Because we atmost have to search for O(N - L +1) bcs we don't have to compare the pattern bcs we will be having the different ans for diff. Pattern and when the hash ans matches then we can guarantee that the pattern is exactly same , and in the constant time we are able to find the hash value , then it should be O(N -M +1) time.
Very Nice Explanation. Thank you :)
thanks for making these videos😇
The code which u have provided is from gfg right... But still the explanation was great
Yes. Even if I would have written, the code would have been the same 😅
I am not able to understand the code given in description,
does the code vary from given hash formula and if it is then what formula has been used in the code.
please help.
what is the use of rabin karp Algo..when we have to compare letters again(in the code that you provide)?
and when i tried to solve this by your method ...that you've explained...it's giving me negative values for text hash value
very nice explanation
try to explain algo also in video..keep up with good work
👍 What was missing?
This is brilliant!
😁
First of all big thanks for making these videos, you are doing a great job!
I have a question , I am kind of beginner, so please correct me if I am wrong.
For this pattern matching brute force approach , instead of matching all the substrings of the window size ( size of pattern string) , if I first check if the first character of the pattern string is same as the first character of the current substring of text , and then do the matching operation , I will not really compare with all the substrings. Will the time complexity still be o(n m ) ? If so, is that the case when will not find any substring?
Late reply, but in case you still need an answer.
Yes, the time complexity still be O(n*m) in worst case like no substring were found of the matching is in the last position. But in real practice, it may perform just a little bit better than the brute force approach, and even not as effective as the basic hashing formula in rolling hash
Hi....what if we use a hashing function like we convert the pattern into its corresponding decimal number
You can use any hash fn but the goal should be to minimize the collisions when two strings are different.
simply awesome!!
Thanks :)
What problems could we encounter if we don't shift and simply add the value of the next character? A rigorous math approach would be nice.
Thanks in advance.
flawless
The code provided is doing a different hash value calculation than that is being mentioned in the video. Can you please explain it?
Please watch my video on Longest duplicate substring. I have implemented the hash there. This code was taken from geeks and so it can slightly differ.
since we are taking exponential values, won't it lead to overflow for very large strings?
We will use MOD for that. ex. 1e9+7
Great explanation, 11:20 you forgot to put the mod P right?
Maybe 😅 I will have to check.
Yea but it is not a issue
Thanks so much it helped me a lot
Welcome :)
thannku so so muchhhh
Please could you explain with smaller prime number , so that I understand compute when mod p comes into picture, here the prime number is very big and so it is equivalent to not having mod p operation
yes how calculated hash value came out to be 704 if we are using prime number 5381
when you calculating hash for the next window you are substracting first character and that can give you a negative result which will break all calculation
You need to perform a mod function after the subtraction to bring it back positive. In his example it wasn't required, but you are correct that sometimes it will be.
Is this algorithm obsolete like the bubble sort.
Since you have other efficient algorithms?
It's useful atleast when you want to implement something. You can't implement harder and efficient codes easily. So this is useful for interview prep atleast.
@@techdose4u This is something looking like, whatever I want I can implement for searching questions. Harder the hashing function, faster the algorithm.
Nice Video💘💘
nibar prak : 1:00 - 1:51
gnorts hsah : 6:40 - 9:25
thnx a lot
Welcome
In stronger hash shouldn't you be subtracting 1x26^2%p?
I have the same question
Pls jaldi ak ques karwa dijia hash with set mhujhe bahut dikkat aa rhi hai
Thanks
Welcome 😊
Got it
nice explanation :)
Thanks
its 13 min. long but it takes 1 hr to note it down in notebook along the video
:O
Hey, why the complexity is O(n*m)? We have a stronger hash that matches only if the pattern and the window are the same no? We can make it O(n)?
Yes. If you can guarañtee that hash will match only when required then it will be O(N). But you need to have a proof for this assumption.
lets say for hash we took 1mb data as argument and return something of word size say 32bit /64bit integers .
HASH(1mb)---->word(32bits) //our hash function.
for simplicity lets talk just for ASCII(256) so our possible hashes can be 256^1million and for 32 bits i can only have 2^32 values .
256^1million ------------to match------------> 2^32
therefore for hash functions UNIVERSE size is always bigger then output size we are bound to have collisions .
but collision frequency is 1/p {where p is some prime number}
so u might want to have some very big number as p......but that will increase your operations time . so its a trade off .
@@sharinganuser1539 Thanks Uchiha.
How can we match two string ij 0(1) time ? I read it in editorial please explain sir
I'm doing this in a question and i got TLE and in the editorial is written that we can compare the string in 0(1)
Can you explain the procedure .
Thanks in advance
If you have preprocessed and stored as integer then you can compare in O(1)
one can compute hash in O(1) time and that's the beauty of rolling hashes . But in order to see string u need to traverse at least ones so O(n) is best any string matching algorithm can perform .
Bhai ak ques asa karwa dijia jisma hash ka sath (set) bhi use ho rha ho
Sir why p[i] - 'A' at 2.11(at time )? why subtraction is happening?and Why 'A'?
Subtraction to get relative ASCII value
@@techdose4u why not 'B'? Or any other Alphabet like 'B', 'C'?
Because A is the first letter hence counting should start from A. Doing A-A will give 0 and B-A will give 1 and so on.
I think taking power of an integer can cause trouble if we have a larger pattern to match
you can its mod value with 1e9+7
Can someone explain why this hashing work. Or where will it fail. How to come up with a hashing strategy under different scenarios. Much appreciated. Thanks.
hashing may not be perfect or collision less, cause,
lets say we have a 100 length string then there would be 26^100 diff strings to be hashed [for 26 diff characters] so it is impossible to store 26^100 different hashvalues, thus collisions are always a possibility. here in hashing we focus on probability of failure, thus aiming at minimising the probability of any such collisions.
Thank you!
Welcome
cool
Time complexity of the video closing is O(1)
sir please make a video on Z Algo
Okay sure
Sir please explain the code too for this solution because i understand the logic but not able to frame the code even seeing from gfg.
I had explained the code in a video but don't remember now 😅 It was some leetcode problem
@@techdose4u Longest Duplicate Substring | TRIE | Rolling Hash | Binary Search | Leetcode This one Sir😅
Thanks anushka :)
Bro what could we do if they are providing some of the special character also 😶😶....how do we provide the hash function values for special character and some of the capital alphabet tooo
Modify the hash function as per your wish. The goal is to minimize same hash values for different patterns :)
Mmm i have an idea....value=primenumber%asciivalue(pattern[i])%(primenumber%asciivalue(pattern[i+1])%.....%
SIR, AT 7:11 WHY ARE YOU TAKING PRIME NO. ??
can u explain clearly the concept of taking prime no. only not any other no.
The worst case time complexity of this algo can be O(n*m) so how is it efficient?
The average time complexity is O(n), which is more likely to occur based on how well you defined your hash function
Hi
// Calculate the hash value of pattern and first
// window of text
for (i = 0; i < M; i++)
{
p = (d * p + pat[i]) % q;
t = (d * t + txt[i]) % q;
}
how this is transforming into the explanation in video where we use base^(m-1) * charAt(i)+ base^(m-2) *charAt(i) ...... + base^0*charAt(i)
Watch today's video. You will get more clarity on this topic.
this is intuitive think in base 10 for example ....how u add . i.e lets window be [12345] and we r at 1234 so how u'll add last char is 1234*10+5 i.e p*(base)+ char[i].....nd mod it to prevent overflow.
can we use strstr?.... if(strstr(originalstring,pat)!=NULL) printf("Yes")....
You can use find as well as strstr. But when this question comes then surely you won't be allowed libraries.
@@techdose4u ohkk.... thanku sir
Welcome :)
Strstr internally traverses and it doesn't make sense to use it during interviews.
@@ankuradhey thank u sir
Can you explain this from your code? What is this h ?
for (i = 0; i < M - 1; i++)
h = (h * d) % q;
check this, you will get the answer medium.com/swlh/rabin-karp-algorithm-using-polynomial-hashing-and-modular-arithmetic-437627b37db6
what is the name of the software you are using?
Techsmith sketch.io
I’m sure this guy rolled a hash before inventing this algorithm 😝
💕✌🏼😶🌫️ This is not the hash I was hoping to learn how to roll...
before this i was thinking I know maps, no need to learn hashing. I was wrong.
Code doesn't work!!!
Those who watch on 2024 assemble here guys 🫂🎉
Sorry sorry my mistake it's not that rolling 😀