What are Distributed CACHES and how do they manage DATA CONSISTENCY?

Gaurav Sen

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 лют 2025

КОМЕНТАРІ • 535

@VrajaJivan 5 років тому ⁺⁴⁸⁸
Gaurav nice video. One comment. Writeback cache refers to writing to cache first and then the update gets propagated to db asynchronously from cache. What you're describing as writeback is actually write-through, since in write through, order of writing (to db or cache first) doesn't matter.
@gkcs 5 років тому ⁺⁵⁶
Ah, thanks for the clarification!
@KumarAbhishek123 5 років тому ⁺³⁷
Yes, would be great if you can add a comment saying correction about the 'Write back cache'. Thanks for the great video!
@gururajsridhar7314 5 років тому ⁺⁸
I agree.. a comment in the video correcting this would be good update to this.
@mrityunjoynath7673 5 років тому ⁺²
So Gaurav was also wrong in saying "write-back" is a good policy for distributed systems?
@jyotipandey9218 5 років тому
@Gaurav Yes that would be great. That part was confusing, had to read about that separately.
@waterislife9 4 роки тому ⁺³⁰¹
Write-through: data is written in cache & DB; I/O completion is confirmed only when data is written in both places
Write-around: data is written in DB only; I/O completion is confirmed when data is written in DB
Write-back: data is written in cache first; I/O completion is confirmed when data is written in cache; data is written to DB asynchronously (background job) and does not block the request from being processed
@rajee120 2 роки тому
Q
@GK-rl5du 5 років тому ⁺⁴⁹⁸
Other variants
1. There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.
2. There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery
@gkcs 5 років тому ⁺¹⁴
Hahahaha!
@GK-rl5du 5 років тому ⁺⁴¹
@@gkcs A humble suggestion, I think you should have a sub-reddit for the channel, because these are such critical topics [not just for cracking interviews], I'm sure they'd definitely encourage healthy discussions. I think YT's comment system is not really ideal to have/track conversations with fellow channel members.
@RAJATTHEPAGAL 4 роки тому ⁺¹
This is an underrated comment .... 😂😂😂
@kumarakantirava429 4 роки тому ⁺¹
@@gkcs Can you please give some hints on WHY "out of order Delivery" is a problem in distributed systems, if the application is running on TCP ..................PLease Kindly reply.
@kumarakantirava429 4 роки тому
@goutham Kolluru , Can you please give an hint on WHY "out of order Delivery" is a problem in distributed systems, if the application is running on TCP ..................PLease Kindly reply.
@mannion1985 5 років тому ⁺¹⁶
I can already hear the interviewer asking "with the hybrid solution: what happens when the cache node dies before it flushes to the concrete storage?" You said youd avoid using that strategy for sensitive writes but you'd still stand to lose upto the size of the buffer you defined on the cache in the e entire of failure. You'd have to factor that risk into your trade off. Great video, as always. Thank you!
@zehrasubas9768 5 років тому ⁺⁹
Hi Guarav, I really like your videos thank you for sharing! I need to point out something about this video. Writing directly do DB and updating cache after, is called write around not write back. The last option you have provided, writing to cache and updating DB after a while if necessary, is called write back
@gkcs 5 років тому ⁺¹
Thanks Zehra 😁
@Sound_.-Safari 4 роки тому
Cache doesn’t stop network calls but does stop slow costly database queries. This is still explained well and I’m being a little pedantic. Good video, great excitement and energy.
@mengyonglee7057 Рік тому ⁺⁴⁴
Notes:
In Memory Caching
- Save memory cost - For commonly accessed data
- Avoid Re-computation - For frequent computation like finding average age
- Reduce DB Load - Hit cache before querying DB
Drawbacks of Cache
- Hardware (SSD) much more expensive than DB
- As we store more data on cache, search time increases (counter productive)
Design
- Database (Infinite information) vs Cache (Relevant information)
Cache Policy
- Least Recently Used (LRU) - Top entires are recent entries, remove least recently used entries in cache
Issue with caches
- Extra calls - When we couldn’t find entry in cache, we query from database.
- Threshing - Input and output cache without ever using results
- Consistency - When update DB, we must maintain consistency between cache and DB
Where to place the cache
- Close to server (in memory)
- Benefit - Fast
- Issue - Maintaining consistency between memory of different servers, especially for sensitive data such as password
- Close to DB (global cache, i.e. Redis)
- Benefit - Accurate, Able to scale independently
Write-through vs Write-back
- Write-through - Update cache, before updating DB
- Not possible for multiple servers
- Write-back - Update DB, before updating cache
- Issue: Performance - When we update the DB, and we keep updating the cache based on that, much of the data in the cache will be fine and invalidating them will be expensive
- Hybrid
- Any update first write to cache
- After a while, persist entries in bulk to database
@pushp3593 Рік тому
nice, but write through and write back notes part is wrong, pls correct it. you can check other comments. thanks
@cheerladinnemouli2864 11 місяців тому
Nice notes
@enfieldli9296 3 роки тому
I just can't find a better content on YT than this, thanks man!
@Freeman937 4 роки тому ⁺²
The world needs more people like you. Thank you!
@NohandleReqd 2 роки тому ⁺¹
Teaching and learning are processes. Gaurav makes it fun to learn about stuff, then let it be systems or the egg dropping problem.
I might just take the InterviewReady course to participate in the interactive sessions.
Take a bow!
@akash.vekariya 4 роки тому ⁺¹⁷
This man is literally insane in explanation 🔥
@bhavyeshvyas2990 5 років тому ⁺³
Dude you are the reason for my system design interest Thanks and never stop making system design videos
@anjurawat9274 5 років тому ⁺¹
I watched this video 3 times because of confusion but ur pinned comment saved my mind
thank you sir
@mayankvora8329 3 роки тому
I don't know how people can dislike your video Gaurav, you are a master at explaining the concepts.
@rajeevkulkarni2888 3 роки тому ⁺¹
Thank you so much for these videos!. Using this I was able to pass my system design interview.
@VikramKumar-qo3rg 4 роки тому
Fun part. I was going through 'Grokking The System Design Interview' course, found the term 'Redis', started searching for more on it on youtube, landed here, finished the video and Gaurav is now asking me to go back to the course. Was going to anyway! :)
@gkcs 4 роки тому
Hahaha!
@SatyadeepRoat 4 роки тому
I am actually using write back redis in our system but this video actually helped me to understand what's happening overall. GReat video
@AnonyoX 5 років тому ⁺¹²
Great video. But I wanted to point out that, I think what you are referring to as 'write-back' is termed as 'write-around', as it comes "around" to the cache after writing to the database. Both 'write-around' and 'write-through' are "eager writes" and done synchronously. In contrast, "write-back" is a "lazy write" policy done asynchronously - data is written to the cache and updated to the database in a non-blocking manner. We may choose to be even lazier and play around with the timing however and batch the writes to save network round-trips. This reduces latency, at the cost of temporary inconsistency (or permanent if the cache server crashes - to avoid which we replicate the caches)
@rahuljain5642 3 роки тому ⁺⁶
If someone explains any concept with confidence & clarity like you in the interview, he/she can rock it seriously. Heavily inspired by you & love your content of system design. Thanks for the effort @Gaurav Sen
@neeraj91mathur 3 роки тому ⁺¹
Nice video Gaurav, really like your way of explaining. Also, the fast forward when you write on board is great editing, keeps the viewer hooked.
@muraliboddu4007 3 роки тому
nice quick video to get an overview. thanks Gaurav. you are helping a lot of people.
@大盗江南 4 роки тому ⁺¹⁹
each of ur videos, i watched ay least twice lol, thank you!! WE ALL LOVE U! U R THE BEST!
@rishiraj9131 3 роки тому ⁺²
I also watch his videos mamy times.
At least 4 times to be precise.
@jayantsogani8389 5 років тому ⁺⁹
Thanks Gaurav, your lecture helped me to crack MS. Keep posting video's
@gkcs 5 років тому ⁺²
Congrats!
@shubham.1172 5 років тому
Are you in the Hyd campus?
@ashwinasokan 2 роки тому
Bhai. u r a life saver! Brilliant tutoring. Thank you!
@pat2715 Рік тому
amazing clarity, intuitive explanations
@manasbudam7192 4 роки тому ⁺¹
What you explained as write-back cache is actually a write-around cache. In write-back cache...you update only the cache during the write call and update the db later (either while eviction or periodically in the background).
@billyean 2 роки тому
Explained like my interviewed candidate today.
@shoaibzafar5663 Рік тому
This everything what I needed. I am really looking forward to learn that how can create an online game hosting server . I researched a lot on how do it and I didn't get it what is exactly happening. Your CDN video was really good 👍. Now I have understood how exactly CDN works and why it uses distributed caching 👍💯
@gkcs Рік тому
Thank you 😁
@prakharpanwaria 3 роки тому ⁺¹
Good video around basic caching concepts. I was hoping to learn more about Redis (given your video title)!
@OwenValentine 5 років тому ⁺⁵
Gaurav, what you initially described as write-back at around 10:30 I have seen described as write-around. Write-back is where you write to the cache and get confirmation that the update was made, then the system copies from the cache to the database (or whatever authoritative data store you have) later... be it milliseconds or minutes later. Write through is reliable for things that have to be ACID but it is slower than write back. You later describe what I have always heard as write-back at around 12 and a half minutes
@gkcs 5 років тому
Yes, I messed up with the names. Thanks for pointing it out 😁
@durden0 7 місяців тому
@@gkcs so does this mean mean that write-through is good for critical data (financial/passwords) and write-back/write-around is not?
@djanupamdas 5 років тому
I think simply telling THANK YOU will be very less for this help !!! Superb video.
@gkcs 5 років тому
Glad to help :)
@jagrick 5 років тому
I mean you can always do more by becoming a channel member 😄
@Satu0King 5 років тому ⁺⁵¹
Description for write back cache is incorrect.
Write-back cache: Under this scheme, data is written to cache alone and completion is immediately confirmed to the client. The write to the permanent storage is done after specified intervals or under certain conditions. This results in low latency and high throughput for write-intensive applications, however, this speed comes with the risk of data loss in case of a crash or other adverse event because the only copy of the written data is in the cache.
@gkcs 5 років тому ⁺⁸
Thanks for pointing this out Satvik 😁👍
@justinmancherje6168 5 років тому ⁺⁴
I believe the description in the video given for write-back cache is actually a write-around cache (according to grokking system design)
@mostinho7 4 роки тому
What if the cache itself is replicated? Will write-back still has risk of data loss
@arpansen964 2 роки тому
Yes, as per my understanding, write-through cache : when data is written on the cache it is modified in the main memory, write back cache: when dirty data (data changed) is evicted from the cache , it is written on the main memory, so write back cache will be faster. The whole explanation around there two concepts given in this video seems fuzzy.
@muhammadanas11 4 роки тому ⁺¹
The way you explained concepts is AWSOME.
Can you please create a video that decribes DOCKER and Containers in your style.
@kabooby0 3 роки тому ⁺⁴
Great content. Would love to hear more about how to solve cached data inconsistencies in distributed systems.
@michaelscheppert3664 3 роки тому
thanks for this quick tutorial :) your English is really good
@majortakleef8445 4 роки тому
Gaurav, what you are describing as a Write Back cache is actually called Write Around cache. What you describe as the hybrid mechanism, is actually called the Write Back cache. In both assumption is an asynchronous update unlike Write Through where update is synchronous. Might be worth taking this video offline and uploading a corrected version to avoid misleading folks prepping for interviews.
@sharifulhaque6809 3 роки тому
Very easy understanding Gaurav. Thanks a lot !!!
@sandeepk9640 3 роки тому
Nicely packed lot of information for glimpse.. Great work
@hareendranep8422 4 роки тому
Very nice presentation . Simple, powerful and fast presentation. Keep up the style
@gkcs 4 роки тому
Thank you!
@shreyasns1 3 роки тому ⁺¹
Thank you for the video. You could have gone a little deeper about how the cache is implemented? What’s the underlying data structure of the cache?
@happilysmpl 3 роки тому
Excellent! Great video with tremendous info and design considerations
@rahulchawla6696 2 роки тому
wonderfully explained. thanks
@GalazyC12 Рік тому
Thank you so much..! your videos are really valuable. Really appreciate your effort, sir.!!
@an_R_key 4 роки тому
You articulate these concepts very well. Thanks for the upload.
@RpraneelK 3 роки тому
Very informative and concepts explained clearly. Thanks
@ravinmulchandani 2 роки тому ⁺¹
Nice Explanation Gaurav. This video covers basics of caching. In one of the interviews, I was asked to design the Caching System for stream of objects having validity. Is it possible for you to make some video on this system design topic?
@1970mcgraw 4 роки тому
Excellent info and presentation - thanks!
@CodeSbyAniz 4 роки тому
You have explained it very nicely. Thanks.
@victorvianna10 5 років тому ⁺¹
Your System Design videos are very good and helpful, thanks!
@kevinz1991 2 роки тому
learned a ton in this video thanks so much
@rishiraj1616 5 років тому
This is my video on your channel and I must say that you explain very well! You seem professional, knowledgable and researched your topic well!
@devinsills1281 3 роки тому ⁺³
A few other reasons not to store completely everything in cache (and thereby ditching DBs altogether) are (1) durability since some caches are in-memory only; (2) range lookups, which would require searching the whole cache vs a DB which could at least leverage an index to help with a range query. Once a DB responds to a range query, of course that response could be cached.
@meletisflevarakis40 5 років тому ⁺¹
Your explanation is awesome. Keep it up!
@gkcs 5 років тому
Thanks!
@ivandrofly 5 років тому ⁺¹
My boy look very energized... keep it up!
@gkcs 5 років тому ⁺¹
😁
@flixpods 4 роки тому
Very knowledgeable. Nicely explained
@gkcs 4 роки тому
Thanks!
@code_report 5 років тому ⁺²
Great video Gaurav!
@gkcs 5 років тому
Thanks code_report 😁
@silentknight2851 5 років тому ⁺¹
hey Gaurav, for holidays I'll watch your videos day in and day out... So please teach new topics asap.
I love to listen you
@daysimples7658 4 роки тому ⁺²
Summary
Caching can be used for the following purposes:
Reduce duplication of the same request
Reduce load on DB.
Fast retrieval of already computed things.
Cache runs on SSD (RAM)
Rather than on commodity hardware.
Don't overload the cache for obvious reasons:
It is expensive(hardware)
Search time will increase
Think of two things:(You obviously want to keep data that is going to be most used)
!So predict!
When will you load data in the cache
When will you evict data from the cache
Cache Policy = Cache Performance
Least Recently Used
Least Frequently used
Sliding Window
Cache Policy = Cache Performance
Least Recently Used
Least Frequently used
Sliding Window
Avoid thrashing in Cache
Putting data into the cache and removing it without using it again most of the time.
Issues can be of Data Consistency
What if data has changed
Problems with Keeping cache in Server memory(In memory)
-What if the server goes down(cache will go down)
-How to maintain consistency in data across cache.
Mechanism
Write through
Always write first in the cache if there is an entry and then write in DB.
The second part can be synchronous.
But if you have in-memory cache for every server obviously you will enter into data inconsistency again
Write back
Go to Db, make an update, and check-in cache if you have the entry.. Evict it.
But suppose there is no any important update and you keep evicting entries from cache like this you can again fall into thrashing.
One can use Hybrid approach as per the use case.
Thanks to @GauravSen
@GustavoRodrigues-le3zw 2 роки тому
Amazing Explanation!! Thanks!!
@sivaram2492 3 роки тому
A label/comment in the video about the change of usage w.r.t to write-back and write-through would help future viewers. I never saw the pinned comment until recently. This could have backfired in an interview.
@sakshichawla3946 3 роки тому
Very well explained !!
@vakul121 5 років тому ⁺¹
It is a really great video.Finally found a detailed video.Thank you for sharing your knowledge!!
@muthupandi4371 4 роки тому
Excellent explanation
@jajasaria 5 років тому ⁺²
always watching your videos. topic straight to the point. keep uploading man. thanks always.
@roycrxtw 3 роки тому
I really hope I have watched this video before my interview this week...:(
@legozxx6655 5 років тому ⁺²
Great explanation. You are making my revision so much easier. Thanks!!
@CloudXpat 4 роки тому
Great explanation for caching. I believe you'll go far.
@Mysterious_debris_1111 4 роки тому
Awesome explanation gaurav. You're cool man. We want a lottt more from you. We admire your ability to explain topics with great simplicity.
@chenhaofeng4842 2 роки тому
great video，very helpful to learn english
@SuperAzizx 4 роки тому
awesome Gaurav thanks
@fakhruddintahery1561 2 роки тому
Great explanation
@chikumanu 3 роки тому
i think you mixed write-back with write-around cache. write-back is when you just update the cache and the database gets updated at a later point in time. write-around is when the db gets updated first and then the cache gets notified asynchronously about that update.
@ananava254 4 роки тому
Thank you Gaurav, it was a really good explanation
@timhomstad 3 роки тому ⁺¹
Do you implement caching on most systems? It will add complexity, how can you determine if it is worth the additional effort to develop.
Love the videos by the way. These are a great learning tool, you do a great job.
@coledenesik 2 роки тому
Please make a full series in Redis or Paid Course.
@HasinthaWeragala Рік тому
well explained bhai sahib
@AbhideepChakravarty 4 роки тому ⁺¹
The draw back of write through you explained is equally applicable in Write Back i.e. I null the value in S1 still the value is not null in S2. Major thing is - Redis is not distributed cache. Even their own definition does not include the word "Distributed" - Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
@JinkProject 5 років тому ⁺¹¹
this video was gold. studying for my facebook on-site and i need to understand a bit more how backend works. cheers @gaurav sen
@pranavsurampudi6838 4 роки тому ⁺²
One Observation, cache need not run on expensive hardware, and for cache, one would use "memory" centric instances on the cloud, not SSD(s) and caches can be used in place of a database if the size is relatively small and you require high throughput and efficiency.
@mehtabsandhu3000 4 роки тому
Awesome explanation! Thanks
@gkcs 4 роки тому
Thank you!
@0verstanding 4 роки тому
You continue to offer great content. thank you !
@mana5473 3 роки тому
Great video, thank you!
@chriszeng5406 4 роки тому
Good video. Thank you. From Canada.
@sirunworld 4 роки тому
Great going, Gaurav. You have a great future!
@mkgcodes 5 років тому
This one is very helpful for me. Many thanks Gaurav.
@gkcs 5 років тому
Cheers!
@utpalsrkr 3 роки тому
I like the the explanation Dada
@harisridhar6698 3 роки тому ⁺¹
Hi Gaurav - good video on distributed caching! This expands a bit more on what I learned in my computer architecture class - I didn't recall thrashing the cache too well, or what distinguished write-through vs. write-back. I think learning caching in the context of networks is more interesting, since it was initially introduced as a way to avoid hitting disk ( on a single machine ), but is also a way to reduce network calls invoked from server to databases.
@ShukyPersky 3 роки тому
What is the efficiency of such architecture for rapidly changing data. Not only write-thru is required (as Vijay Somasundaram indicated below), but reading from the database is always required in order to get the most updated information, in which case this architecture is almost useless. Do I miss anything?
In other words, it would be better to start with going thru the use cases where this architecture has an advantage.
thanks a lot for preparing this video
@manishamulchandani1500 3 роки тому ⁺¹
I have one doubt regarding the cache policy. Gaurav explained that for critical data we use Write Back policy to ensure consistency. In write through one instance memory cache gets updated and others can remain stale.
1) My question is same can happen in Write Back, one instance's in memory cache entry gets deleted and we update DB..other instances still have that entry. So there is inconsistency in write Back as well. Why do we prefer write back for critical data because same issue is there in write back.
If answer is invalidate all instances in memory cache entry then same can be done for Write through. Which makes me ask question 2.
2) My another question is : We can update all instances' in memory cache entry and then update DB. In this way consistency is maintained so why not we use this for critical data like password financial information.
@openretailsstore3808 3 роки тому
@Gaurav Sen - How network call can be reduced in terms of distributed cache wherein cache would be distributed? Why distributed cache is faster than database?
@bmcseal01 2 роки тому
Solid explanation
@rupeshpatil6957 4 роки тому
Thanks for Video Gaurav.
What if global cache itself failed? What are different backup strategies for it?
@hemantchandekar3465 3 роки тому
Thnaks for the informative video... I have one scenario could you please go through and provide your suggestions if any....
1. Application fetching the data from multiple configuration databases and actual data will fetch from Big data on the basis of configuration ...
But all the configuration is different for all users and with their respective roles... It is just like "access level" it is something dynamic.. Here we want to reduce network calls...
We can think tag basis distributed caching but on some level, we need a cache where we can perform queries also.
@devendrparhate 4 роки тому ⁺¹
Correction: INPUTING and OUTPUTTING -> Adding and Removing 5:46
@divyaarunasharma7434 3 роки тому
awesome video, n informative
@garth3743 4 роки тому
nice. you have good presentation skills. keep it up
@MrBkkrishna 3 роки тому
when doing videos make legible pen we cannot see out side but overall explanation is too good and nice explanation
@louis-ericsimard7659 5 років тому
One approach I use for consistency is lazy updates. On DB write instead of pushing the data back to the caches (which may never get read if a second update comes in) the DB writes the ID to invalidate to a message queue that all caches subscribe to. Then you can implement query--then-cache-on-miss semantics. This way load throughout the system is reduced, with some double-queries occurring if the cache was cleared after a good query due to latency (this can be eliminated by using versioning: using the current timestamp in milliseconds at the time of write and broadcasting it so that the cache only accepts to clear itself if the cached version # differs from the broadcasted version #)
@gkcs 5 років тому
Useful :)
@zainsyed9811 5 років тому ⁺¹
Awesome overview thanks. One other possible issue with write-through - it's possible to make the update to the cache then the DB update itself fails. Now your cache and db will be inconsistent.
@gkcs 5 років тому
True 😁
@dhanrajranvirkar007 3 роки тому
Excellent 👍

Наступне

Автоматичне відтворення

Data Consistency and Tradeoffs in Distributed Systems