Bloom Filters | Algorithms You Should Know #2 | Real-world Examples
Вставка
- Опубліковано 17 лис 2024
- Subscribe to our weekly system design newsletter: bit.ly/3tfAlYD
Checkout our bestselling System Design Interview books:
Volume 1: amzn.to/3Ou7gkd
Volume 2: amzn.to/3HqGozy
Digital version of System Design Interview books: bit.ly/3mlDSk9
The Secret Sauce Behind NoSQL: • The Secret Sauce Behin...
Animation tools: Illustrator and After Effects
ABOUT US:
Covering topics and trends in large-scale system design, from the authors of the best-selling System Design Interview series.
Asking/answering the right questions, information dense and technical. I love this channel.
Levelling the playing field by making info like this easily accessible to anyone
Yesterday I binge watched all your videos 😂 to prepare for an interview, and then had nothing left to watch!
I’m glad that you uploaded again, you’re videos are awesome well explained, your channel will grow fast. Keep the good work! 👏👏
How did the interview go?
@@TheAyushSomani sorry I didn’t see your reply. I was hired on September! After 4 months of job hunting and countless applications. I did more than 100 applications 😂 and I received 2 offers. Practicing interviews and getting rejections definitely made me improve a lot. It’s been a great experience so far on my job.
@@MrX-nc8cm Wow! Congrats from Korea:) What country do you live in? and I guess it's pretty competitive to get into a software engineering job there.
If you like Bloom Filters - I highly recommend reading the SuRF : Practical Range Query Filters from CMU. It's a little more complicated but shows how you can take the same concept and apply it to eliminate arbitrary size ranges.
For someone who does not like to read there is a presentation ua-cam.com/video/OD29hZww-DM/v-deo.html
Definitely more exciting to watch than many Netflix series 👌🏼
Comparing a DSA tutorial to a netflix series doesn't make sense.
Stop the cap
This channel is an absolute gold mine of knowledge for software devs
Learned this great data structure when I was at grad school, which completely blew my mind back then. I recall the number of hash functions can also be tuned to control the probability of false positive, in addition to the length of the buckets. Great video as always! Thanks!
I love how you make any topic so easy to understand. I tried reading the same topic on wikipedia and got utterly confused but the way you explained it, it clicked immediately.
Very good realization, the animations are remarkable and very useful.
I feel smarter everyday after watching ur design videos.
This video is awesome. I read about Bloom filters a long time ago in wikipedia but I didn't get it at that moment. This video explains it so clearly. I love it.
This channel is a gem !
I've looked up bloom filters before and knew what they were used for but this actually made them make sense to me. Thank you so much!
thank you, easy to learn, this videos should get million viewers
I don't know why but my every morning starts with one of these short videos
This was beautifully explained!
Short and clear ! I love it.
Thanks for your easy-to-understand explanation! Amazing to see a data structure applied to so many different uses.
Been fascinated with this data structure for some time. Hearing the Akamai perspective is very interesting, particularly since I manage a large proxy installation that does subscribe to an advanced rating service. Imagine needing a quick, memory efficient data structure that can give you a no/maybe answer very quickly. Sure, on the 16 proxy servers we have 128GB of RAM each. But imagine the memory requirements for virtually every FQDN on the planet.
but kinda look for false positives
@@yatinarora1252 It's been a while since I was taught about this, but I'm pretty sure that was said to be one of the weaknesses. It essentially caches positive findings, but lookups are called for otherwise (or maybe it's the other way around). And, collisions can accumulate (or something like that), so there has to be some way of clearing things when they get stale. So, it's not that's it perfect. It's fast and compact, but you need extra algorithmic handling to manage it's weaknesses.
Great stuff, never heard of bloom filter before.
I was confused until you explained it with the food example. And, suddenly it clicked for me. It was Ahha moment for me.
I was actually more confused... loving pork chops must always be true!
03:45 Using three hash functions is the key. It would be helpful if its highlighted in the video. Thanks for explaining the concept.
2 years after the video release, but browers are using cascading bloom filters to do SSL revocation checks now! Supper interesting when I learnt that.
Steve Gibson on a podcast called security now (ep989) did a great chat on it
Amazing video... Very Well Explained. !!
Today I learned that the Bloom Filter relies on using several different hash functions (in this video the example uses 3 different hash functions). This is a very creative idea for an algorithm!
This is actually a simplification for this video. Hashing from scratch is expensive. In most actual bloom filters I've seen use two hashes to produce polynomial coefficients A and B. You can then produce as many "hashes" as you want by using these coefficients without running through a whole hash function.
For me, I always learn new things from this channel😅
awesome, truely knowledge delivered in byte size :)
Big Fan of your books, Alex. These Videos are very much useful. One concern I have is.. these videos are playing like 2x fast :). You can slowdown, while explaining stuff. So our mind will get time to understand what you are saying.
Just to provide a differing opinion, I found the pace of the video perfect personally
The animation at 4:21 is wrong. Ribeye hashes to indexes 1, 3, and 4, but indexes 0, 3, and 4 are highlighted. The explanation is still correct, though.
Good catch!
I really enjoyed that, thanks a lot
Very useful, thank you!
Wow that was very interesting!
Another great video
Thanks very clear explanation, but I like lemons.
Great video. Thanks
Great content as usual.
Just little pun on title of video.. 😉 What else should i know before i die ?
Good one!😀
Techically, you could make the bloom filter store integer numbers instead of one and zero and use increment/decrement approach when adding/removing elements.
Nice videos you're putting out. Would have been good mentioning cascading bloom filters. Maybe out of scope for video, but just as a mention for people to do further research on their own.
Thank you so much.
Thank you so much !
thanks. interesting stuff
good, good stuff right here!
I was out for 2 weeks and when I come back, bytebytego has like 10 new videos to watch 🙂
😂 You sure you were only gone for 2 weeks?
@@ByteByteGo to be honest, no, not so sure 😅
The original bloom filters don't allow removals but there are a lot of proposed improvements over the original idea and one of them allows removals by using integer buckets instead of bit buckets and incrementing/decrementing the value of each bucket (trade-off of more space to allow removals)
What if you remove a false positive? You break the assumption that the data store cannot give false negatives.
@@ZorakWars yes, that is true. Another trade-off you have to make.
Excellent video
Thank alot, very helpfull
I love this channel
Awesome videos
3:51 explanation starts
How do you create such a nice animation?
From description : Animation tools: Illustrator and After Effects
This would be a good first step to slim down a huge data set, before more complex processing on the "probably Yes" subset.
Great video! can I ask something how do you make this lovely animations?
It takes a team. We have some talented editors for illustration and animation, with the help of tools like Adobe After Effects, Adobe Illustrator, etc. Each video takes many hours to make.
You can delete if you make the buckets numbers and whe adding you increment the bucket and when deleting you decrement it
Bloom filters are also used in the Blockchain (EVM-based) context to query for smart contract events.
How is this more efficient in terms of memory access? - HashTable returns 1 location and thus O(3), but BF can return n locations, thus access time is O(n). I guess it gives huge advantage in terms memory required - to store n elements, BF will only need - log(n) locations.
Implemented with buckets and fast hash function
awesome !
Why are multiple hash functions required? A single one would be enough no? Having 3 seems to reduce the capacity by x3?
IIRC, this prevents patterns from producing false positives too frequently without using much more expensive hash functions. You get to use fast, naive hash functions and still get decent space savings. Plus, it’s not really 3x storage, as elements can reassign bits and often do as the bloom filter fills up.
Love that series.
I'm interested in doing video form teaching and I wanted to know : how do you make your animation ?
It takes a team. We have some talented editors for illustration and animation, with the help of tools like Adobe After Effects, Adobe Illustrator, etc. Each video takes many hours to make.
In 1:17, it should be trade off between less memory and more accurate.
If the entries are not booleans but integer accumulators, then the deletion of keys can go away. This would be useful for deleting old entries.
Keep up the good work! your content is just amazing and one of the best that could be found on the internet. Though the narration has some room for improvement.
Tiny error in the animation. The ribeye lookup looks at 0 3 and 4 (not 1 3 and 4) according to the red highlights. But nicely done!
1:30 that was personal
Please make video on Raft amd Paxos Algorithms as well.. pleaee
Ooof... Raft is doable, but Paxos...
Can we just do raft, please? LOL
@@ByteByteGo yes please you may opt with raft, i will give it my name of paxos LOL,
@@ByteByteGo and Ehsan, there an excellent paper on Paxos Made Simple 01 Nov 2001 by Leslie Lamport. It's only 14 pages. Great read.
If it never forgets, will bloom filter full? eventually it will return might be true to any query.
@byte byte go, can you do a video on token bucket algorithm ?
Thanks for this info. There is also something called cuckoo filter. Can you make a video on that as well with usecases.
Can I please know how you made this presentation? I'd very much like to do a final presentation this semester that looks this good. 🙏
I'm interested in how to make presentations as pretty as this one too.
It takes a team. We have some talented editors for illustration and animation, with the help of tools like Adobe After Effects, Adobe Illustrator, etc. Each video takes many hours to make.
Can I know how to make video such like this style? What tools does Alex use?
Amazing, AMAIZINGNGNGNGNGNGN
I don't understand. in the give buckets, if he would have added 5-6 more foods of his liking then entire range would have become 1.. then it doesn't matter what you search it will always be positive. What am I missing ? is the bucket size always much larger than the dataset ?
what I am not getting is that consider the scenario when you have run 1000 such queries, there may arise a case when all the 10 bits are set to 1, so from the 1001th query every query will return a true. Am I missing something ?
How can it be firm no and probably yes? I mean, if it returns probably yes, and the element is not there, then that no cannot be so firm.
It’s all in the hash function, since it can have collisions then it’s a probably yes, if no collision is found is a definitely no
Which software is used for these presentations
It takes a team. We have some talented editors for illustration and animation, with the help of tools like Adobe After Effects, Adobe Illustrator, etc. Each video takes many hours to make.
No false negative but probably false positive
One animation is wrong, when you check the indexes for "ribeye" you're highlighting in red 0, 3 and 4 but then you make the arrows point to 1, 3 and 4
🤩
If anyone has worked with mongoose implemeting bloom filters for full text search it will be nice! Thank you in advance
Sorry, what? How is a bloom filter more space efficient than a simple set when the filter has to store 3x the bits AND give extra buffer zeroes to prevent false positives?
A simple set's memory requirement scales linearly with the number of elements.
A bloom filter's space requirement is capped at the size of the bit-set for the bloom filter itself, which is usually a fraction what a simple set would need.
Let's go through an example.
Say there is a set with 1 million keys, with each key hashes to a 64-bit number.
A simple set would need 8-million bytes to just hold all the keys, ignoring collision handling and other details.
A bloom filter with about a 1 in 100 chance of collision would need a bit-set with 10 million bits and 7 hash functions. 10 million bits is 1.25-million bytes.
This is the main purpose of a bloom filter. We trade space for sometimes slightly inaccurate answers.
@@ByteByteGo Thank you for taking the time! I finally figured it out on Stack-O when I read a comment saying basically, "But. A. Set. Stores. The. Keys." lol. That's just NOT how I think about a set. Why do sets store the keys anyway? Is it only for dynamic sizing? (not saying that's unimportant, rather that it's not inherent to a set - you COULD have a fixed-size set that doesn't do that)
I barely managed to solve box blur and now theres more... 😂💀
Came for my Fujifilm left with a ?coding degree
1:45
Shows that you never truly know someone. I used to love this guy and think he's an amazing teacher, but now that he came out as a lemon hater I can't watch his videos in good conscience anymore. /s
Trade off animation doesn't exactly make sense, but great video otherwise. The trade off is between memory and accuracy, not between a lack of both.
This is exactly how you create boring videos with much efforts
no one explains a single question with bloom filter
How does a hash function return multiple values? A hash function should only return one value for a function. It doesn't make sense. What hash functions are typically used for Bloom filters.
Is it so hard to put proper description what is that filter and why we need it??