Consistent Hashing | Algorithms You Should Know #1
Вставка
- Опубліковано 29 вер 2024
- Weekly system design newsletter: bit.ly/3tfAlYD
Checkout our bestselling System Design Interview books:
Volume 1: amzn.to/3Ou7gkd
Volume 2: amzn.to/3HqGozy
Other things we made:
Digital version of System Design Interview books: bit.ly/3mlDSk9
Twitter: bit.ly/3HqEz5G
LinkedIn: bit.ly/39h22JK
Animation tools: Illustrator and After Effects
ABOUT US:
Covering topics and trends in large-scale system design, from the authors of the best-selling System Design Interview series.
Subscribe and leave a comment, we read them all :)
hope you can update more requently 😄
Yes! I already did. Difficult concepts explained in such an easy way.
Amazing video! Can you make a video comparing this to Rendezvous Hashing please.
Service mesh.
dude your content is simply outstanding
you synthesize very complex subjects in the amount of explanation that is just right for understanding without getting lost
how come I've not seen you earlier?
One of the best videos on this topic
Hats down. This channel is really a blessing for the software engineers. This is the most simplified video on consistent hashing i have seen till now 👍
Thanks @ByteByteGo team, for explaining an important concept like Consistent Hashing in such an easy manner.
Super nice presentation, thank you!
Very graphical and simple explanation! thanks
Awesome video. Love the way you explain concepts in so simple manner using such a nice animation.
Which tool do you use to create animations?
We have some talented editors for illustration and animation, with the help of tools like Adobe After Effects, Adobe Illustrator.
Please make a full playlist for system design
And please examples of system design full
Thank you
this is so beautiful!
Thanks, this was great. But I still have gaps in my knowledge here. So we have an object that maps to some spot within the consistent hash-ring. Now what does it mean to walk the ring clock-wise to "find" a server? Is it a ring of integer values? How do I walk this ring and come to a spot that contains a server? It looks like we need to maintain a lot of metadata about this ring (which wasn't mentioned in the video) so that as we're walking the ring clockwise, we check at each point if there's a server that maps to that point in the ring.
Good stuff
I love the easy to understand content with interactive videos. Can you please tell me what software you use to create these videos? Thank you.
You read all comments, but no reply?
distributed hash table
Talk about "Bloom Filters"
Are all these part of your System Design book or these video releases are totally different series?
the ring is imaginary, but still works
Thanks
Great content. The same hashing func is used even in case of we want to replicate data? Precisely let’s say we have 4 servers and want to replicate data to one more server. With hashing Object is stored in S0. Do we use same hashing or something else to figure out where to store replication of data.
🤔my answer disappeared, may be because of a link there? anyhow. Cassandra just chooses next server in a ring as a replica. Simple, reliable, consistent.
As Aleksandr mentioned, next nodes in the ring is a good strategy.
You could think that just hashing the object key with a different salt would also work, but this introduces the situation where with a replication factor of 3, any 3 servers going down would statistically mean some objects were replicated in just those 3 servers, and are thus not available. Using next servers ”clusters” the keys which has several benefits, including it being much less probable that 3 random servers crashing results in data loss or unavailability.
Using consecutive servers also allows giving the highest priority to efficiently bulk-re-replicate all objects on servers with several nearby failures, while re-replication from servers with no nearby failures can remain a lower priority.
What software do u use to make these videos
Placing virtual node means creating replica of each server owned by other servers i believe
That's not true, at least for Cassandra. There it's just as described in a video - more uniform distribution
decentralize the keys.
Watching your channel should be a requirement for anyone in software engineering. Love the variety of topics
This is one of the best teaching styles I've personally encountered on UA-cam.
Would be absolutely amazing if you could make a playlist for beginner software engineers starting with the core concepts. I would definitely pay for such a playlist/course.
Thank you for all the time and effort you put into these videos, the visualisations definitely help to drive the point home like no others.
Which tools are you using to create such a cool amimation?
You know when you try hard to understand something that just doesn't make sense to you, then you find an explanation so clear and simple that it actually makes you chuckle... For me, that's this video.
What algorithms should we talk about next?
Can you please cover architectural design patterns like Saga etc?
cryptographic algorithms? Huffman, RSA etc
Quad tree
Bloom Filter
Bloom Filter or hash function under the hoods
I really liked that explanation. I didn't know that server are also using the same hash function. And this workaround with virtual nodes is awesome
If a node crashes, how does the system know which entries need to be transferred? And how do we retrieve the data that belonged to a crashed node to redistribute it to a new node?
You mentioned two NoSQL DBs use consistent hashing for data partioning. What about other SQL and NoSQL DBs? What are some tradeoffs with those?
You explain each concept so well with just the right amount of detail. An added bonus is your way of speaking which is so calming and peaceful.
Thanks for this amazing content you put out and wishing you the very best for the future!
I am not sure whether I should comment on the excellent explanation or mind-blowing presentation.
Awesome content like always. Again please share the tools you use for creating such cool animations. I'm a BIG fan!!
Thanks. Illustrator and After Effects
I cant get my head around the fact that this channel is having less than a million subscribers. It deserves much more than that!!!
Thanks for your video, it's very clear and high quality. But I still have some questions. Does the hash space need to be 360 ( as in 360 degree ) ? My understanding ( including reading en.wikipedia.org/wiki/Consistent_hashing ) is that to put a server or object on the ring, we do Hash(id) and then take a modulus of 360, so we have a "degree angle value" to place it on the ring. But given the 360 discrete slots, if we have way more than 360 servers wouldn't that not fit on the 360-slot ring? So what's an appropriate size of the ring ( hash space ) we should choose ? does it need to be much larger than the servers we expect to have ?
Best explanation of consistem hashing I have found so far! Keep it up!
Great video! Superb explanation 👌
Which software did you use for making this video?
Great Videos !! Can you create a video on multitenant design
Awesome video 👏loved the animation and explanation.
A side question: Animation on your channel is just awesome, which tool you are using?
I read the chapter on consistent hashing in System Design Interview, and didn’t feel like I understood it completely. This video helped so much. I’d love to see walk throughs of some of the systems on this channel.
Thank you for the feedback. We agree videos augment text and they could be very useful.
The chapter length videos take a lot of efforts to make. We will try to make one once a month, or maybe every two months.
Can someone explain why it is necessary to use the same hash function for the servers and object keys? Would any problems happen if we used a different hash function but with the same hash space?
A hash function has a 1:1 mapping to a hash space. Multiple hash functions do not map to the same hash space because each hash function generates a unique hash space.
What an amazing explanation.
Please keep producing more content
Thank you so much! Very elaborate explanation. Great work!😌
I was never able to properly understand what consistent hashing is, but after watching this video it seems so clear now. Thank you so much for such videos. Please keep it up and upload more videos . We are here to support. 🎉
Such clarity and brevity to this and all your videos. Many of these topics are of little use to me, but I watch and learn anyway due to your teaching style. Fantastic! Thankyou ! 🙏
This is the best and most concise explanation !! Very impressive !! 👏
Amazing content, I am gonna recommend these books be bought by the company and put to premises where people can read them to understand these concepts. I have a suggestion for a video btw: can you explain CPU metrics and throttling on container systems? There is lots of misunderstanding especially with Kubernetes resources like "millicores" about what it actually means and how the usage is calculated and when throttling is performed on Kubernetes nodes and how. Also what are the implications of these metrics and quotas on CPUs with many cores and the difference between nodes running on hypervisor vs bare metal. It's a hole can of worms of course, but I think just starting with the quotas and throttling would already clear things up a lot.
If the number of virtual nodes = (size of hash space)/(number of nodes) and the virtual nodes are all distributed in a regular fashion in the hashspace (ie. ,s1_0, s2_0, ..., sm_0, s1_1, s2_1, ..., sm_1, s1_n, s2_n, ..., sm_n), does consistent hashing become equivalent to simple hashing?
It seems like consistent hashing, in general, aims to increase the lengths of "contiguous regions" that are mapped to the same node in the hashspace. With more virtual nodes, the lengths of these contiguous regions that are mapped to the same node decreases on average. Does that mean there is a tradeoff between uniformly balancing objects onto nodes (more virtual nodes) and lesser remapping when number of nodes changes?
Great way of simplifying and explaining the concepts. Great work! Will recommend this channel to every software engineers
Is this concept can apply to just temparal, recreatable data? When it comes to RDBMS like MySQL INNODB, It doesn't make sense if one server goes down, all existing data in it will recreate to next server.
This video was extremely useful just when i was about to implement self routing varnish cluster with consistent hashing. Awesome video
why do we use Consistent Hashing rather than just keep tacking for the f(sever_1) = {a,b,c,..} mappings and allocate to the server with the least connections.
if a server is lost, its connections get re distributed
if a server is added, it'll get all new connections till its no longer the server with the least connections.
01:01 Consistent hashing evenly distributes data in distributed systems.
02:02 Consistent hashing ensures even distribution of hashes and allows flexibility with server changes.
03:03 Consistent hashing ensures objects stay assigned to the same server despite changes
04:04 Consistent hashing uses a hash ring to map servers and objects efficiently.
05:05 Consistent hashing minimizes key redistribution when adding or removing servers.
06:06 Consistent hashing can lead to uneven distribution of objects on servers.
07:07 Using virtual nodes for load balancing in consistent hashing
08:04 Consistent hashing is used in real-world systems like NoSQL databases, high-end delivery networks, and load balancers.
Crafted by Merlin AI.
what if there was a hash collision (it's not impossible) of the virtual nodes? meaning 1 virtual node belongs to 2 physical nodes? which physical DB would the data end up written to?
Awesome content as always. I'm a paid blog subscriber as well as have the annual pass for the System design course. I'm amazed by all the graphics used in this video. Is there a particular tool you use that you can share for our own presentations?
The amount of genius in the ideas here, the ring, someone did an insane job there at some point. also great explanation
Mind = blown content and visuals. One question though. What are vnodes exactly? Can I say it has map of lists with physical node as key and list of vnodes as value. And I hash each vnode of list on hash ring ?
Kindly someone clarify
If s0 is down then s0_1,s0_2 etc. all will be removed. In this case how the virtual server is going to solve the problem of balanced distribution?
Hi Alex,
I am from India and want to purchase your book but I am not getting a proper link where i could get authentic one...so many seller is selling as a duplicate with bad quality typing.
Content is so awesome and well explained, that every new video release sticks to the mind until you watch it. Please keep up the great content delivery.🙌
At 6:41 can you elaborate more or provide the source when you are saying labeled S0 are managed by Server 0. How labeled 0 is defined and how someone knows which proxy server will be used. What are the techniques to overcame the tradeoff of having more virtual nodes. How we can make a balance what are the resources from where I get the references about these?
All the video is good...the part where hash table should be considered as ring is little not well...just a few seconds part ...others channel got it easy....but virtual servers part is awesomely explained...better than other channels and neat...thank you
Woow, thank you for the explanation. Can you also please explain about rendezvous hashing ?
> Rendezvous hashing is both much simpler and more general than consistent hashing, which becomes a special case (for k = 1 {\displaystyle k=1} k=1) of rendezvous hashing.
en.wikipedia.org/wiki/Rendezvous_hashing
Hello, big fan of the book here.
What i didnt get is, when the servers are also hashed with the function ( at 3:20 in the video) - how do you ensure, that they are spread ( evenly)? cause there might be a scenario where they are very close to each other leading that one server takes most of the objects. right?
best regards, Kevin
Thank you, very simple and understandable explanation for me.
In distributed system cache are not in sync that will arise cache stampede if we move, remove or delete the server from the network. But if cache were in sync, then do we still need Consistent Hashing and go along with traditional hashing?
Thank you! I came across consistent hashing while reading the chapter on Metrics Monitoring from System Design Vol. 2. I had never heard the concept, so I searched it on youtube, and there you were as the 1st result! Great explanation. Now back to the book 👋
Wow, super clean animations and the explanations are on point!
Another amazing video ! Thank you so much 👍
So in the end, consistent hashing no longer "hashing" server key, but instead just uses a table to record the locations on the ring for each server (multiple virtual locations), is that correct?
hm. What’s the overhead for it? There’s obviously some drawbacks
Amazing video, especially the uneven sizes and virtual nodes part.
Hey, awesome video! One suggestion: The text in the top left corner is a bit confusing to me. I know it's meant as the title of the video, but I often interpreted it as "the title of the current slide" which was confusing. I say either remove the text (because it's already stated in the video title) or make it act as a heading for the current subtopic.
When a server goes down how do we move the objects in the server to another server? Or does every server have a copy of all the objects and we are just using the hashing to limit where a certain object can be found?
Thanks a lot! I just realize that I didn't understand how does it work before this video
can you make an lecture for key-value store which you explained in the book
Typically in a distributed system, who initiates the re-distribution?
You just earned a subscribe from me :) for awesome explaination
I really like the way you explain different topics in your videos. The images and animations are very helpfull to visualise and understand the concepts. Waiting for new ones!:)
Very concise video in explaining the concept. Keep it coming, you've earned yourself a new sub!
I learned something new. Thank you very much.
That's the kind of quality content that I was looking for!
Wow, this presentation on consistent hashing was absolutely beautiful 🏆💯❤. Even though you didn't' write any code, the principle was so clear that most people could probably implement such an algorithm. Bravissimo!
Do we need consistent hashing even after separating cache layer and session data (stateless architecture)
Man, you should bring out an entire series, It takes too much effort to come up with a video where you show too many concepts in a short amount of time. you are a saviour for legion like us.
I agree, and furthermore, I am willing to pay if it helps bring a series with this level of quality and information density. Especially your videos on systems design are a blessing.
Loved the video, really helped me in a recent interview! Just wondering, can consistent hashing be used to redistribute load across hot shards?
I don't know where I would be without this video, thank you. So much effort was clearly put into this and it was so worth it. Very easy to understand.
The best system design video ever. I wish you could make more videos like this available on youtube. Thank you!
Thank you for the feedback. We’ll definitely make more videos like this. Let us know if you have specific topics you’d like to see.
This is called Open Chord network
This isn’t just used in servers. Many peer-to-peer networks, like Kademlia, use a similar “hash ring” to determine which peers should hold which keys/values. They usually put in quite a bit of redundancy, spreading data among multiple “nearby” peers, since peers come and go so frequently.
2:40 Subtitles and what's spoken don't match
The question - if server is down, who and how moves the keys to another servers?
Awesome video, thanks for sharing this.
The illustration is the best I have seen for the virtual nodes. Thank you so much for the clear explanation.
Great channel and content. How do we generate the hashed values that represent a single server’s virtual nodes?
1) Great explanation
2) Its oddly satisfying when he says "the other keys are unaffected"
Stumbled on your videos recently. I really like your style. Animations are great and the summaries are clear and concise. Keep up the great work!
Could u suggest me the link from where u could order these two book
If the server is down, are all the virtual servers down as well?
cant you ever come to a decision, you always forcibly have to create something "new", come on boiis!
if you claim to know, dont believe in you