Nice. Keep up the good work.. i think there is a slight issue in your LRU code, i think you have misplace a bracket, the correct implementation would look like : private void updateCache(int val){ if(!hSet.contains(val)){ if(q.size()==cacheSize){ int last = q.removeLast(); hSet.remove(last); } } else{ q.removeLast(); } q.push(val); hSet.add(val); }
This part of code doesn't seem right else { q.removeLast(); } Instead this is required to be used else { // value is present in the hashSet, so remove element from its current index in linked list in order to add that to the end q.removeLast(val); }
basically we have a hashing mechanism for mapping any value to be stored, suppose we have 10 servers to store the cache, and we want to to store some value 17, then we will apply hashing function (modulo here ) to map the value 17 to 7. After doing the cache value will be stored at this server, and when have some other value which comes to same server after mapping (say 27) then we will store it in the linked list, corresponding to number 7. and manipulate the linked list for getting the least recently used value. It was good until now, but in case some server goes down, then we will have lesser number of servers and same kind of value will be looked in for different servers. This is resolved by consistent hashing, which is a mechanism for making a circular arrangement, and incase some server goes down then its request will be moved to immidiate next value (server number here).
The problem comes when you need to retrieve an element from some mid portion area of the list and remove and connect it to the end of the list. That's why doubly linked list is used.
Consistent hashing was not clear as well as you didn't touch upon availability. I came here looking for redis cluster with replication using containers.. what got me listen till the end was curiosity and how bad it could be.. well it was not all that bad. Just few bits and pieces needs to improve. Pls read on earliest distributed system such as architecture used for CDN .. one important factor is distributed transactions e.g raft and consistency when designing such a cache. Usually it is a multi node mult partition with replication and often geographically distributed. Anyways it is a interesting topic.
we have not talked much about the availability parameters of the distributed cache. One more suggestion, can we have a series on basic overview of technologies like, redis, rabbit mq, apache eureka, these are few to name, but as you said in the video that the candidate should be clear with the available tech stack. what do you say?
Yes would like to do that, some sort of POC on these, let's see. and regarding availabilty, like i mentioned in bookmyshow, each of the design videos are created keeping a specific problem in mind, BMS was about concurrency, this one is about scalability, so availability angle got missed but if you compare this with Amazon video for availability you can see that similar concepts will apply, plus here we are using zookeeper so that makes our job little easy from availability point of view as well.
this type of implementation works for single system, explain how LRU will be implemented for distributed systems. you cannot use a double linked list for a 100 node cluster, where 100k users access it in parallel.
Please double check the LRU implementation. That's incorrect. If you remove q.removeLast(); even if queue size is not full, which means you are removing '27' and then q.push - which means 52 will be present twice in the linked list.
Nice. Keep up the good work..
i think there is a slight issue in your LRU code, i think you have misplace a bracket, the correct implementation would look like :
private void updateCache(int val){
if(!hSet.contains(val)){
if(q.size()==cacheSize){
int last = q.removeLast();
hSet.remove(last);
}
}
else{
q.removeLast();
}
q.push(val);
hSet.add(val);
}
yes right
This part of code doesn't seem right
else {
q.removeLast();
}
Instead this is required to be used
else {
// value is present in the hashSet, so remove element from its current index in linked list in order to add that to the end
q.removeLast(val);
}
basically we have a hashing mechanism for mapping any value to be stored, suppose we have 10 servers to store the cache, and we want to to store some value 17, then we will apply hashing function (modulo here ) to map the value 17 to 7. After doing the cache value will be stored at this server, and when have some other value which comes to same server after mapping (say 27) then we will store it in the linked list, corresponding to number 7. and manipulate the linked list for getting the least recently used value. It was good until now, but in case some server goes down, then we will have lesser number of servers and same kind of value will be looked in for different servers. This is resolved by consistent hashing, which is a mechanism for making a circular arrangement, and incase some server goes down then its request will be moved to immidiate next value (server number here).
I think we can use singly Linked List as well by taking 2 pointers i.e. head & tail.....we can add element to the the tail and remove from the head.
The problem comes when you need to retrieve an element from some mid portion area of the list and remove and connect it to the end of the list. That's why doubly linked list is used.
Better to write article/blog as well. So, it's easy to understand more.
Can you please let me know of the part, which is not clear, so that I can help with it.
@@TheTechGranth I mean if you write the article on the same. It's become better understandable.
@@GauravKawatrakir ok, will try that. Thanks for the suggestion :)
Recommended to watch at 1.25x or 1.5x as per convenience
Consistent hashing was not clear as well as you didn't touch upon availability. I came here looking for redis cluster with replication using containers.. what got me listen till the end was curiosity and how bad it could be.. well it was not all that bad. Just few bits and pieces needs to improve. Pls read on earliest distributed system such as architecture used for CDN .. one important factor is distributed transactions e.g raft and consistency when designing such a cache. Usually it is a multi node mult partition with replication and often geographically distributed. Anyways it is a interesting topic.
Do check out other videos on HLD and LLD and let me know your views 🙂 Do like and subscribe and share with others 🙂
we have not talked much about the availability parameters of the distributed cache.
One more suggestion, can we have a series on basic overview of technologies like, redis, rabbit mq, apache eureka, these are few to name, but as you said in the video that the candidate should be clear with the available tech stack. what do you say?
Yes would like to do that, some sort of POC on these, let's see. and regarding availabilty, like i mentioned in bookmyshow, each of the design videos are created keeping a specific problem in mind, BMS was about concurrency, this one is about scalability, so availability angle got missed but if you compare this with Amazon video for availability you can see that similar concepts will apply, plus here we are using zookeeper so that makes our job little easy from availability point of view as well.
keep going excellent work
Thank you. Do like, share and subscribe.
nicely explained !!!
this type of implementation works for single system, explain how LRU will be implemented for distributed systems. you cannot use a double linked list for a 100 node cluster, where 100k users access it in parallel.
Please double check the LRU implementation. That's incorrect. If you remove q.removeLast(); even if queue size is not full, which means you are removing '27' and then q.push - which means 52 will be present twice in the linked list.
Yes, I have pinned a comment, the bracket position is wrong.
Such a gold Mine
There was something to learn still there are many rooms for improvement
How is this distributed cache ? How is the cache of one node made available to another node ??
I do have the same question, If data is not available at all the nodes how it is distributed ?
Consistent Hashing could have been described in a better way.
Noted, will come up with a separate video on it 🙂
Thanks again :)
Thanks
Consistent hashing is not this exactly