Nice effort and really explain the things. Please mention the sources you have referred. This would allow us to go in-depth of a topic. Can you please also add low level design? In some orgs they asks HDL and LLD. How would you classify this? I think these boxes you drew were HLDs but the tables and ER diagrams would be part of LLD right?
I know html, css, js, reactjs for frontend, python for backend, and mysql. Now I want to understand how to glue these together, build an application and deploy on server. What do I need to learn next? Is this the right video?
man i cant express my happiness. you are the only one on youtube(infact the internet) concentrating on high level systen design. many companies are shifting their focus from algorithms to system design now a days. it was so hard to figure out how to come up with answers to these. Your videos are a life saver sir. You people are literally changing lives. the minimum i can do is say a big thank you to you for making these vids.
+Biswajit Singh I‘m really stoked to hear that, man! Happy I can help you out. You would do me a huge favor if you could share my videos on your social networks. There will be more interesting videos to come! 👍
I used the caching strategy described in this video in my system design interview. You are part of the reason why I received an offer from one of my dream company. Thank you!
@@jeevithatd9221 just look up github repo for: system-design-primer . It covers a lot of the most common system design questions as well as giving you the fundamentals before giving you the problems.
Hi Ramon. Thanks for making these SDI videos. There are quite a few important things missing from this video to be considered a complete and correct answer in a real interview: the list of different micro/services that makes the platform run. Full database design/schemas. API commands from client to server and in between important micro services. And most importantly - “back of the envelope” estimations I.e. number of users DAU, QPS, storage requirements, throughout requirements etc. I hope you’ll continue making SDI videos that contain this info too in the future. Many thanks and best of luck
I'm halfway there and just overwhelmed with the kind of explanation this guy has put into the videos. Probably will complete this and come back again for more such videos. Thank you!
I have looked into couple of other twitter system design videos, but I felt your videos are way more explanatory. Your video answered my questions like "how the redis node is choosen out of many?", "for users with thousands of followers and uncertain about their next login, will constructing home timelines for such users is worth it?". I believe your design is not complete w.r.t analytics and search functionality, but still very informative and nicely explained. Thank you.
Very few people explain as well as you do and cover these topics. As a software engineer, I am very interested in these topics, and the community needs more videos like this! Keep up the good work!
What I like about your explanation is that u r not rushing it by preparing the content beforehand. Many of the videos does that cramming so much information in very little time. You are carefully walking thru the solution giving us ample time to make sense of a point u made. Can you suggest some resources(books,articles,lectures,seminars, utube channels like urs) to read/watch to get good at system design
8:06 great breakdown by system traits to design improvement. Network access availability > consistency 9:27 I like how you go into higher lvl overview with actual scenario/api for when tweets made
Hello Mr. Lopez! I loved this video. But it is always very likely to face a system design question totally out of what you had prepared for an interview. So a video on all possible system design components and how they are used for specific use cases in real life products can be very useful. So once building blocks are available, its easier from there. For example, REDIS database with its in-memory function is a good takeaway from this video which I can use in different scenarios.
today I was asked this question during an interview which makes me wonder what is the gain in asking something that pretty much can be memorized from videos just like typical common algorithms questions, I really don't see too much gain in companies expecting you to play to 'design the internet' I came up with something similar to this but replacing redis with temp tables 🌬️🔥 thanks for the info
To avoid a single point of failure. If there would have been only one Redis machine and it failed before persisting data into the database then data will be lost.
Here are few others design / architecture which i am curious to know ... would be great if you could create them in the near future: 1. UA-cam architecture and design or similar video streaming websites 2. Amazon or any E-commerce website 3. Instagram 4. Google Search Engine
at 9:46 he says the tweet will hit a gateway, then proceeds to draw a load balancer. anyone care to explain what comes first and how they're related to each other? (is load balancer and gateway the same?)
This is a great video. I have a quick question with using list in Redis. The video only mentioned store the tweet_id and sender_id for Bob's list. What about the actual tweet? Is the actual tweet store in Redis and we will need to do a look up by each tweet_id to get the actual text?
Saving all the tweets for entire duration could be memory and computation intensive. Hence, I believe twitter uses time expiration mechanism in Redis. redis.io/commands/ttl I can say this because it takes only few seconds in you're looking at your feed. On the other hand, it takes more seconds for a query when you search it on Twitter.
Thank you!! Really loved the way you explained everything: crisp and clear! Can you please explain: Your use case: Alice posted tweet. Bob follows Alice. Bob's timeline updated with Alice's tweet in say Redis 1,2,3 Assume one more use case: Kate posted tweet. Bob follows Kate. Bob's timeline updated with Kate tweet in say Redis 4,5,6 When Bob is viewing the timeline and we do the HashMap lookup to find the 3 Redis machines which of the above 3 machines will be returned to display Bob's timeline? Suppose Alice and Kate stay far away, will Bob's timeline be always updated in Redis 1,2,3 only or can it change?
after clarifying the requirements, wouldn't the next step to be understand the load patterns to the service? Wasn't jumping to the schema design a bad idea?
I'm confused: Why does the request reach the Redis cache right after reaching the LB instead of the request first reaching the LB, then an application server (a service), then a Redis cache? How does the cache know which records to search for? Does that mean that the API requires that each request contain hash information for the Redis cache and that the Redis cache is set up to manage HTTP requests?
The following was not mentioned in the video, I wanted to know if this is an acceptable idea. Since twitter like most social media is read heavy, we can maintain different servers for read/write operations. This makes sense cause, we can scale our servers accordingly Ex: if read server is 50 TB, then write server could be 10 TB or similar. This way, we can also make efficient use of in-memory cache since table reads will be different for read and write and thus mixing up them in same cache doesn't make much sense.
I think you're missing out the application servers that handles requests from Load Balancer. It won't go to Redis without a help of application servers that builds the chronological tweets
You've mentioned not to use MySQL, but what do we use for a persistent store, in that case? Redis' in-memory approach is good for speed, but does not allow persistence.
Apache Kafka might be the solution to have both speed and persistence. You have producers and consumers that the latter can always make sure all published messages are read.
Miles to go before you sleep. Could you please prepare system design and LLD for the following: 1. Simulation of a cricket match, football match etc. 2. Implementation of Queue like Kafka 3. Ecommerce price drop notification system for 50M products 4. Amazon like website and order management system i.e. everything that happens after clicking checkout 5. Elevator system 6. Scrabble 7. Chess game 8. A library for evaluation of expression
Thanks for the nice video, it is informative. I have two questions. 1) You mentioned that data will be duplicated on three reddis servers. How to are these three servers been selected? Do they intentionally choose three reddis in three different locations? For example, one in local (US), another at Asia, and another one at EU? Then, the question is what if the user travels to Australia? 2) I may missed it, is this design, sounds like one tweets will get duplicated at the home page of all followers. That means a lot of duplications, which will end up with much more memory/storage usage. Is there any way to relief this?
System Design is a discovery process, which means you start with a prototype stage to a production ready stage. It seems you demonstrated the final design, instead of starting with a standard design and improving on it incrementally based on the real-world challenges.
Exactly. It seems unreasonable an interviewee is going to come up with something like this that took Twitter years in 45 minutes. Would have been better to start form the ground up and build a reasonable system.
Thanks a lot for the video. it helps us to think the system design in a broader perspective. I have two questions here. You said conventional Relational Database would be a bottleneck in this kind of systems. Does NOSQL would be the ideal one here for storage?. Also during the entire video, you have talked about In Memory Database. At what point of time, this data gets persisted into the database?
He mentioned there should be a machine between the Load balancer and the redis clusters. I would guess that machine would take care of persisting the tweet into the database (preferably in an async manner)
Yes, it will get persisted in NoSQL for sure, As @cats3xxx mentioned. Initial POST and GET will always happen on Redis and I see his design shows Redis is kind of persistence cache for faster tweets flow.
It's a wonderful explanation about Tweeter Timeline, User Followers in details with respect to the system design. That really rare and deep in terms of getting advanced topics that most of the top-level organization ask to clarify and see their confidence. Thank you so much for the sharing perfect video which I was eagerly searching for. I would like to request you one more topic about - Google Map and Gmail system design in detail. Thanks in advance. Better Luck.
Awesome video. Thank you. It gives me a basic idea about how to approach system design questions. This design covers a lot of things which is used in real-world huge systems. It includes relational databases, In-memory databses, hashing, load balancers and most important how to design system based on actual requirements, like eventual read consistency in case of twitter.
Great video! A few things: this is more architecture than system design. Also in an interview the interviewers probably want you to focus on _your_ design rather than what twitter is already doing. And why does twitter create identical copies of the same tweet in each user list, seems redundant? Why not have each tweet only store the tweet id or something instead? Just curious ... :)
Fantastic video. You can also optimise ram needed and computational load by having a redis cluster per region and by tracking where reads come from per user to only rebuild their timeline in regional clusters they are likely to read it from. (Dont worry about rebuilding my timeline in the UK if I only ever read from Australia). Of course you can divide the computation that way too with at least a worker per region. Also you can optimise the read requests themselves by only loading the most recent slice of the timeline and loading in the next slice when you scroll to the very bottom.
Very educational! How does this work across data centres? Say Alice posts a tweet to datacentre 1 and Bob reads his timeline from datacentre 2. Is Alice's tweet posted to all datacentres and each then independently maintains replicated copies of Bob's timeline Or is Bob's last connected data centre recorded against his account and only that data centre loads up his precomputed timeline?
Hi, Its an awesome video on twitter Architecture. Just a qn on, when you had said that when user tries to access his Home Timeline, if you are a follower of big celebrities, their latest tweets would be fetched from DB and inserted along with Redis data. You had missed this feature while explaining the Home timeline feature towards the end of video. Please clarify.
Thank you for the video, Sir! May I ask why you choose to mix the implementation details with the design, is it the standard practice? For instance, you mentioned Redis as a in memory DB in the diagram. Why not just leave it at "in memory DB"(the design) and leave out the Redis (the details). Much thanks!
Hi. Actually you didn't cover (or I failed to notice?) the USER's own tweets page. If all the tweets a user creates are only stored in their followers's own lists, what happens when the user accesses their own tweets history page? Going by the solution you've presented here, the system would have to retrieve for instance BOB's list, filtered by Alice's ID ? Is that what happens for Twitter? Thanks.
Hey Andrei. First of: I don‘t know what Twitter actually does and it shouldn‘t matter. It‘s just presenting a naive design that could work. For your own tweets page you could go multiple ways. Also precompute it (trigger by tweet creation) or just do a lookup in the tweets table (slower but maybe Ok because lower priority and lower traffic). What do you think?
Maybe the Hash Lookup can be replaced with precomputed Hash value in the database/table of users? in case it can be replaced in the future, or more Redis instances are added. You don't want to move the data of the existing users
question: if we are not persisting the tweets on the DB, how can a user access old tweets/timelines? is case of any cache cleaning activity , where will the data be retrieved from ? @success in tech
Tweets _are_ being persisted on the DB, the goal of the architecture presented in this video is simply to avoid having to call expensive queries on the DB to populate the users' timelines every time they visit Twitter. This is a form a precomputation called _materialization_ is the database world, especially with NoSQL (but not only), you prepare recurring results to limit latency and resource expenditure at load time. Users can still access their old Tweets and timeline, but this will be relatively slower than accessing the current timeline, which is fine since it's not as usual. If Redis instances gets flushed for some reasons, they can still rebuild timelines from the DB, it'll just cost a lot of resources upfront.
Thankyou for sharing this descriptive video. This is definitely the cleaner strike as you were aware of some of solutions and tech stack that Twitter has already incorporated. I would however more interested to know how the tweets with the visual content would be handled. May be some exploration toward CDN and CMS related solutions? I can understand covering all aspects in one video is not possible for anyone and would look forward for more contents posted by you. Great Going!!
This might be a rather basic question but I was wondering what is the best way to maintain a followers table? Is it just userId and followerId - for a celebrity that would mean millions of rows! And I don’t suppose a list of follower ids for every user Id is ideal in relational databases. Any suggestions
@@xiaoshengliu5860 that's a good question, it starts from very basic stuff and then it gets more intricate. I wouldn't say it's for beginners but if you're willing to put in the time it'll give you a deep understanding of modern data management.
Where is unique tweet id created?? immediately after load balancer? How will it be consistent and unique with other distributed servers once created on fly?
Can we do tweet publishing design with Websockets + queue? Not sure if twitter already used it. There is still option of Http2+Long polling or Http2+Server sent event.
That‘s the part of EVENTUAL consistency. In comparison to a system thats trades in availability to gain strong consistency, a social network favors availability and can live with the fact that you may see a tweet earlier than I do etc. Now how do you keep those instances eventually consistent? Master/slave, quorums, gossip protocols etc
The idea is that you’re UPDATING your followers list of tweets, adding to the list which is stored in the in memory DB, not creating a brand new tweet in a relational database. PUT request is correct.
@@marcushines4172 Not necessarily. From my point of view, a PUT would require sending the whole LIST object with updated values, whereas in this case only a new tweet is being sent. We don't know what data structure (software / hardware) actually holds the list of tweets for BOB's homepage, but we could suspect an INSERT-like operation is taking place there. So POST would maybe be the better answer in this case ?
Thank you for this amazing video. I have one question: Are we storing the tweet ID or the actual tweet in the Redis list? Coz If we are storing the tweet id than don't we have to run a SQL query in the tweet table while building the home timeline
Can you please do a system design video on 2 topics A) how do u make sure the number of simultaneous video streams somebody watching let’s say Netflix is only 3 devices at a time. B) windows system update, how do u stream a windows system update to client computers ?
I don't think it is fair to ask this question in an interview and expect the candidate to come up with this Redis solution. If the candidate can explain why fan out is a challenge and come up with some reasonable solutions/suggestions, that must be sufficient.
I agree.. during an inverview, as long as you point out the challenges of using a relational database using disk in the back end and point out that writes might be too slow and you can use either caching/in memory db then that should be enough... I really don't know why he started with the data modeling which is a whole different aspect of the system's design... but oh well..
Redis boxes are servers by themselves. You can decide to put your in-memory caching either in the same machines that are serving the initial HTTPs requests or have a dedicated fleet (most used).
I learned interesting things from this video but it was also pretty historical. I mean, I'm not sure how much signal I got about his design skills and tech leadership capabilities; I can tell he knows how Twitter works.
So Alice tweets a message and Bob accesses home timeline. How we chose those 3 Redis servers? If based on Alice's IP then how Bob will find these 3 servers?
Hi, Can you please do a video on designing a service like Uber/Lyft? Including services like location based look-ups for cabs, computing route, fare etc. It seems to be a common interview question. Great job by the way.
Also designing a recommendation system please. Thank you so much for taking the time to make these videos. They are very helpful and resourceful. Glad and lucky to have come across your channel.
Doesn't twitter store its tweets in some database permanently or it stores everything *in memory* redis databases ? Like , what if the redis machines malfunction one day .
I like the Video, it certainly got me thinking, but... The Load Balancer is just a "Router" so it needs to route to some sort of "Follower handler" for Alices PUT's. Then that "Follower handler" needs to Query Alice's Following list (probably also on replicated REDIS and probably local on the "Follower Handler" box and why it was routed there) and then (and only then) can the "Follower Handler" send all the PUTs out to probably another Load Balancer (probably many) to update all the HOME Lists of all the followers (i.e. all 100 followers). In the Video you just draw a T under the Load Balancer. To me, that's the bulk of the problem. Just like the Query to Join the Follow table was on the Relation DB solution. But the Video just shows two intersecting lines under a Load Balancer, a box that doesn't "think", it just routes work. The video also never discussed how Lady Gaga's tweets get merged into Bobs Timeline, again one of the other major problems with tweeting (that is understandably a PART II). But how the "fan out" happens is not clear or shown.
One question, probably silly one.:) At the end of the video, you mentioned that LB will do hash look up to find which RADIS cluster to query for BOB's timeline, does this mean that always BOB timeline will be stored in same 3 RADIS machine? Or it depends on something.
+Nirav Purohit No silly questions, don’t worry. It means Bob’s timeline is always stored on 3 different Redis hosts (for availability and other reasons). Now when he visits the timeline in his browser, the LB (or a component behind the LB) queries the hashmap to find one of those 3 Redis machines. The content returned by the fastest of those 3 machines is returned to Bob’s browser.
It would be very useful to have a version of this that isn’t sped up while you’re drawing things. It’s not as important, but I do get some anxiety trying to figure out how (or if) to fill the awkward silences during stuff like that
Thank you so much for sharing this. But I think it is just some parts of the system design. We still have a lot of things to introduce. Redis is a memory database, but what if all the replicas are down? We should store the data in disk, with Redis itself or other no-sql databases. Shall we consider how the servers work, what servers we should have, with read and post servers respectively? How we consider the security issues, shall we user a gateway, and introduce SOA theory tools like service registry and discovery?
+Yasen Zhang Yeah, nobody expects you to cover the in‘s and out‘s of such a system in a 45min interview. Architectures like this grow over years. If the interviewer wants you to cover a specific topic then you should dive deeper into it.
I love the video, thank you for doing this. To design the system like Tweeter in the time constraint of an interview, I would probably start with no caching layer whatsoever. Just bunch of distribiuted databases. In every case, REDDIS is typically in-memory and has to have a traditional database as its source. Another point I would cover is once the user is logged and receives his initial snapshot of timelines, how does Tweeter merge live updates into user timelines.. Then an interesting question is if half of your friends are local and half are on the other continent. How does twitter merge tweets with different latency profiles. In any case, thanks for doing it!!!
Brilliant video on Twitter news feed generation. But storage of tweets was not discussed. Since 100 million tweets are created per day. Would have been interesting to know how this massive volume of tweet are stored and scaled. Thanks for video.
+Jon Snow Hey man, glad you like them! If you want to dive deeper into real world solutions highscalability.com is really good. On the more theoretical side of things the books of Andrew S. Tannenbaum are classics but they get updated quite often. Definitely worth reading.
Is it true that they only use an in-memory database? I would think that they have another database that persists the tweet and then they just use the cache for timelines and update the timelines in the cache (which redis can be used as).
This was great! I thought fan-out was asynchronously sending a message to a number of recipients (e.g. when you submit a tweet it gets sent to the search pipeline, to the user timeline pipeline, plus other pipelines). But you seem to suggest fan-out has something to do with the Redis precomputation step?
Hey chris, thank you. The word fanout is very generic. It just means that there is one or more entities which are then duplicated and sent to a number of recipients. The context is important. In this case a tweet is duplicated across 3 replicas of Redis during precomputation.
Thanks Ramon such good explanations, 1. purpose of 3 cluster ? is only for - the fastest one response to be taken as result? 2. user bob table and follower table are created in radish cache only not physical db tables?
I'm doing a little experiment over on IGTV, the SiT VLOG. Check it out! instagram.com/tv/BkYf4GphfQz/
Nice effort and really explain the things. Please mention the sources you have referred. This would allow us to go in-depth of a topic.
Can you please also add low level design?
In some orgs they asks HDL and LLD. How would you classify this? I think these boxes you drew were HLDs but the tables and ER diagrams would be part of LLD right?
I wanna get into Amazon, how can I get your referral?
Great job!
I know html, css, js, reactjs for frontend, python for backend, and mysql. Now I want to understand how to glue these together, build an application and deploy on server.
What do I need to learn next?
Is this the right video?
man i cant express my happiness. you are the only one on youtube(infact the internet) concentrating on high level systen design. many companies are shifting their focus from algorithms to system design now a days. it was so hard to figure out how to come up with answers to these. Your videos are a life saver sir. You people are literally changing lives. the minimum i can do is say a big thank you to you for making these vids.
+Biswajit Singh I‘m really stoked to hear that, man! Happy I can help you out. You would do me a huge favor if you could share my videos on your social networks. There will be more interesting videos to come! 👍
The minimum you can do is pay your first month salary to his patreon account, if he has one.
For that I’ll make a Patreon 😄
companies arent shifting focus, as u r becoming senior, u r facing more architect lvl questions.
Check "Tech Dummies".. .he is much better then him
I used the caching strategy described in this video in my system design interview. You are part of the reason why I received an offer from one of my dream company. Thank you!
hi, what caching strategy is described in this video? The Redis fan out part? Could you elaborate more? Thanks!
@@ethanlyu4839 Yes, the Redis fan out for active users, but I wasn't asked exactly to design Twitter. I borrowed this part of the design in my answer.
What design interview question u got ?
Can you share ? It would be of great help for me.
@@jeevithatd9221 just look up github repo for: system-design-primer . It covers a lot of the most common system design questions as well as giving you the fundamentals before giving you the problems.
Hi Ramon. Thanks for making these SDI videos. There are quite a few important things missing from this video to be considered a complete and correct answer in a real interview: the list of different micro/services that makes the platform run. Full database design/schemas. API commands from client to server and in between important micro services. And most importantly - “back of the envelope” estimations I.e. number of users DAU, QPS, storage requirements, throughout requirements etc.
I hope you’ll continue making SDI videos that contain this info too in the future. Many thanks and best of luck
I'm halfway there and just overwhelmed with the kind of explanation this guy has put into the videos. Probably will complete this and come back again for more such videos. Thank you!
I have looked into couple of other twitter system design videos, but I felt your videos are way more explanatory. Your video answered my questions like "how the redis node is choosen out of many?", "for users with thousands of followers and uncertain about their next login, will constructing home timelines for such users is worth it?". I believe your design is not complete w.r.t analytics and search functionality, but still very informative and nicely explained. Thank you.
Thank you!
I'm committed to watching your design video once a day till i finish them... then repeat. Thank you.
Love it :D All the best!
Very few people explain as well as you do and cover these topics. As a software engineer, I am very interested in these topics, and the community needs more videos like this! Keep up the good work!
What I like about your explanation is that u r not rushing it by preparing the content beforehand. Many of the videos does that cramming so much information in very little time. You are carefully walking thru the solution giving us ample time to make sense of a point u made. Can you suggest some resources(books,articles,lectures,seminars, utube channels like urs) to read/watch to get good at system design
The best design interview videos in your channel.
Thanks for doing this! One suggestion: You should have separate playlist for system design and algo related questions.
+Venkat Raman That’s a good point, will do! Thanks
8:06 great breakdown by system traits to design improvement. Network access availability > consistency
9:27 I like how you go into higher lvl overview with actual scenario/api for when tweets made
I have an upcoming test about distributed databases and WDM, this has been such a help in considering how to answer these problems. Thank you!
Hello Mr. Lopez! I loved this video. But it is always very likely to face a system design question totally out of what you had prepared for an interview. So a video on all possible system design components and how they are used for specific use cases in real life products can be very useful. So once building blocks are available, its easier from there. For example, REDIS database with its in-memory function is a good takeaway from this video which I can use in different scenarios.
Thanks for your feedback! I‘m planning something alone those lines. Don‘t forget to subscribe ;)
today I was asked this question during an interview which makes me wonder what is the gain in asking something that pretty much can be memorized from videos just like typical common algorithms questions, I really don't see too much gain in companies expecting you to play to 'design the internet' I came up with something similar to this but replacing redis with temp tables 🌬️🔥 thanks for the info
I actually shared this with my newsletter for acing coding interviews. Great way of identifying problem areas and solving them
Really fascinating, many thanks. Hardly in the Internet can you find such content 🎉
Why is it being replicated 3 times on the redis machine though? Why isn't one redis machine enough
To avoid a single point of failure. If there would have been only one Redis machine and it failed before persisting data into the database then data will be lost.
I did not understand why you need 3 redis? Thanks
Here are few others design / architecture which i am curious to know ... would be great if you could create them in the near future:
1. UA-cam architecture and design or similar video streaming websites
2. Amazon or any E-commerce website
3. Instagram
4. Google Search Engine
at 9:46 he says the tweet will hit a gateway, then proceeds to draw a load balancer. anyone care to explain what comes first and how they're related to each other? (is load balancer and gateway the same?)
This is a great video. I have a quick question with using list in Redis. The video only mentioned store the tweet_id and sender_id for Bob's list. What about the actual tweet? Is the actual tweet store in Redis and we will need to do a look up by each tweet_id to get the actual text?
I believe tweet gets also stored in redis, considering its only text+links. It wouldn't be much useful if we still have to fetch tweets from DB.
Saving all the tweets for entire duration could be memory and computation intensive.
Hence, I believe twitter uses time expiration mechanism in Redis. redis.io/commands/ttl
I can say this because it takes only few seconds in you're looking at your feed. On the other hand, it takes more seconds for a query when you search it on Twitter.
Why do we store 3 times in redis?
Sriram Subramanian To handle failures of cache nodes utilizing a number of replicas
Thank you!! Really loved the way you explained everything: crisp and clear!
Can you please explain:
Your use case: Alice posted tweet. Bob follows Alice. Bob's timeline updated with Alice's tweet in say Redis 1,2,3
Assume one more use case: Kate posted tweet. Bob follows Kate. Bob's timeline updated with Kate tweet in say Redis 4,5,6
When Bob is viewing the timeline and we do the HashMap lookup to find the 3 Redis machines which of the above 3 machines will be returned to display Bob's timeline?
Suppose Alice and Kate stay far away, will Bob's timeline be always updated in Redis 1,2,3 only or can it change?
Time flew, amazing stuff man. Crazy ideas are being implemented when it comes to huge systems.
after clarifying the requirements, wouldn't the next step to be understand the load patterns to the service? Wasn't jumping to the schema design a bad idea?
What is the DB you are using in the solution? Redis is good for caching. but there should be something to store all the tweets..thats missing here
Amazing video on System design. Your way of explaining things is simple and on high level. Many thanks.
Thank you!
I'm confused: Why does the request reach the Redis cache right after reaching the LB instead of the request first reaching the LB, then an application server (a service), then a Redis cache? How does the cache know which records to search for? Does that mean that the API requires that each request contain hash information for the Redis cache and that the Redis cache is set up to manage HTTP requests?
The following was not mentioned in the video, I wanted to know if this is an acceptable idea.
Since twitter like most social media is read heavy, we can maintain different servers for read/write operations.
This makes sense cause, we can scale our servers accordingly Ex: if read server is 50 TB, then write server could be 10 TB or similar.
This way, we can also make efficient use of in-memory cache since table reads will be different for read and write and thus mixing up them in same cache doesn't make much sense.
I think you're missing out the application servers that handles requests from Load Balancer. It won't go to Redis without a help of application servers that builds the chronological tweets
You've mentioned not to use MySQL, but what do we use for a persistent store, in that case?
Redis' in-memory approach is good for speed, but does not allow persistence.
Apache Kafka might be the solution to have both speed and persistence. You have producers and consumers that the latter can always make sure all published messages are read.
Miles to go before you sleep.
Could you please prepare system design and LLD for the following:
1. Simulation of a cricket match, football match etc.
2. Implementation of Queue like Kafka
3. Ecommerce price drop notification system for 50M products
4. Amazon like website and order management system i.e. everything that happens after clicking checkout
5. Elevator system
6. Scrabble
7. Chess game
8. A library for evaluation of expression
Thanks for the nice video, it is informative. I have two questions. 1) You mentioned that data will be duplicated on three reddis servers. How to are these three servers been selected? Do they intentionally choose three reddis in three different locations? For example, one in local (US), another at Asia, and another one at EU? Then, the question is what if the user travels to Australia? 2) I may missed it, is this design, sounds like one tweets will get duplicated at the home page of all followers. That means a lot of duplications, which will end up with much more memory/storage usage. Is there any way to relief this?
I think you wanted to say 'everyone that follows you' in the video at 11:13.
System Design is a discovery process, which means you start with a prototype stage to a production ready stage. It seems you demonstrated the final design, instead of starting with a standard design and improving on it incrementally based on the real-world challenges.
Exactly. It seems unreasonable an interviewee is going to come up with something like this that took Twitter years in 45 minutes. Would have been better to start form the ground up and build a reasonable system.
Thanks a lot for the video. it helps us to think the system design in a broader perspective.
I have two questions here. You said conventional Relational Database would be a bottleneck in this kind of systems. Does NOSQL would be the ideal one here for storage?. Also during the entire video, you have talked about In Memory Database. At what point of time, this data gets persisted into the database?
He mentioned there should be a machine between the Load balancer and the redis clusters. I would guess that machine would take care of persisting the tweet into the database (preferably in an async manner)
Yes, it will get persisted in NoSQL for sure, As @cats3xxx mentioned. Initial POST and GET will always happen on Redis and I see his design shows Redis is kind of persistence cache for faster tweets flow.
Great video :) really happy to see someone explaining overall system design in depth. Waiting for more exiting videos on system design.
why does it redirect me to "Alice Johnson" profile when I click on link to follow on twitter
I think what exactly is “tweet” needs to be defined first ; some aspect of it will come in sorting
It's a wonderful explanation about Tweeter Timeline, User Followers in details with respect to the system design. That really rare and deep in terms of getting advanced topics that most of the top-level organization ask to clarify and see their confidence. Thank you so much for the sharing perfect video which I was eagerly searching for. I would like to request you one more topic about - Google Map and Gmail system design in detail. Thanks in advance. Better Luck.
your every word is useful and informative!!!
Awesome video. Thank you. It gives me a basic idea about how to approach system design questions. This design covers a lot of things which is used in real-world huge systems. It includes relational databases, In-memory databses, hashing, load balancers and most important how to design system based on actual requirements, like eventual read consistency in case of twitter.
Great video! A few things: this is more architecture than system design. Also in an interview the interviewers probably want you to focus on _your_ design rather than what twitter is already doing. And why does twitter create identical copies of the same tweet in each user list, seems redundant? Why not have each tweet only store the tweet id or something instead? Just curious ... :)
Great video!, just wondering why would redis update 3 times if a single request came in?
Fantastic video. You can also optimise ram needed and computational load by having a redis cluster per region and by tracking where reads come from per user to only rebuild their timeline in regional clusters they are likely to read it from. (Dont worry about rebuilding my timeline in the UK if I only ever read from Australia). Of course you can divide the computation that way too with at least a worker per region. Also you can optimise the read requests themselves by only loading the most recent slice of the timeline and loading in the next slice when you scroll to the very bottom.
It will be great if the architecture of maintaining hastags in twitter can also be explained: Search, top trending hashtags etc.
Very educational! How does this work across data centres? Say Alice posts a tweet to datacentre 1 and Bob reads his timeline from datacentre 2. Is Alice's tweet posted to all datacentres and each then independently maintains replicated copies of Bob's timeline Or is Bob's last connected data centre recorded against his account and only that data centre loads up his precomputed timeline?
Hi, Its an awesome video on twitter Architecture. Just a qn on, when you had said that when user tries to access his Home Timeline, if you are a follower of big celebrities, their latest tweets would be fetched from DB and inserted along with Redis data. You had missed this feature while explaining the Home timeline feature towards the end of video. Please clarify.
That is an optimization you could implement if you are somehow constrained, it‘s not strictly necessary
Thank you for the video, Sir! May I ask why you choose to mix the implementation details with the design, is it the standard practice? For instance, you mentioned Redis as a in memory DB in the diagram. Why not just leave it at "in memory DB"(the design) and leave out the Redis (the details). Much thanks!
Hi. Actually you didn't cover (or I failed to notice?) the USER's own tweets page. If all the tweets a user creates are only stored in their followers's own lists, what happens when the user accesses their own tweets history page? Going by the solution you've presented here, the system would have to retrieve for instance BOB's list, filtered by Alice's ID ? Is that what happens for Twitter? Thanks.
Hey Andrei. First of: I don‘t know what Twitter actually does and it shouldn‘t matter. It‘s just presenting a naive design that could work.
For your own tweets page you could go multiple ways. Also precompute it (trigger by tweet creation) or just do a lookup in the tweets table (slower but maybe Ok because lower priority and lower traffic). What do you think?
Maybe the Hash Lookup can be replaced with precomputed Hash value in the database/table of users? in case it can be replaced in the future, or more Redis instances are added. You don't want to move the data of the existing users
question: if we are not persisting the tweets on the DB, how can a user access old tweets/timelines?
is case of any cache cleaning activity , where will the data be retrieved from ? @success in tech
Tweets _are_ being persisted on the DB, the goal of the architecture presented in this video is simply to avoid having to call expensive queries on the DB to populate the users' timelines every time they visit Twitter. This is a form a precomputation called _materialization_ is the database world, especially with NoSQL (but not only), you prepare recurring results to limit latency and resource expenditure at load time. Users can still access their old Tweets and timeline, but this will be relatively slower than accessing the current timeline, which is fine since it's not as usual.
If Redis instances gets flushed for some reasons, they can still rebuild timelines from the DB, it'll just cost a lot of resources upfront.
Thankyou for sharing this descriptive video. This is definitely the cleaner strike as you were aware of some of solutions and tech stack that Twitter has already incorporated. I would however more interested to know how the tweets with the visual content would be handled. May be some exploration toward CDN and CMS related solutions? I can understand covering all aspects in one video is not possible for anyone and would look forward for more contents posted by you. Great Going!!
From what I know Redis is more than an in-memory database, it does provide persistence. Am I wrong?
That's amazing. Just one question: Why do we need 3 redis instances? Can 1 of them not suffice?
It’s common to use Redis in clusters to increase availability, performance and available storage space.
This might be a rather basic question but I was wondering what is the best way to maintain a followers table? Is it just userId and followerId - for a celebrity that would mean millions of rows! And I don’t suppose a list of follower ids for every user Id is ideal in relational databases. Any suggestions
Do you have any book recommendations for these sort of high level design that we can read and get better?
System design can be understood by reading articles and video blogs. There is no complete books to best of my knowledge.
Designing Data Intensive Applications is an absolutely amazing book!
@@GabrieleCimato Hi! I just wonder if this book is friendly for beginners? Thanks!
@@xiaoshengliu5860 that's a good question, it starts from very basic stuff and then it gets more intricate. I wouldn't say it's for beginners but if you're willing to put in the time it'll give you a deep understanding of modern data management.
@@GabrieleCimato Thanks!!!
I don’t get it how redis can maintain relationships data , how does redis know who follow who?
What's the rationale for having 3 Redis databases instead of 1? Optimize recovery time based on user location/server load? Thank you!
Where is unique tweet id created?? immediately after load balancer? How will it be consistent and unique with other distributed servers once created on fly?
This is great! Your approach, time management and advise to solve the problems are spot on. Thank you and keep up the good work!
+Ravi M Hey Ravi, thank you for the kind words! Stay tuned for more videos :)
The only thing I didn't like about this video is that I can only like it once. What a great Video!!!
Can we do tweet publishing design with Websockets + queue? Not sure if twitter already used it. There is still option of Http2+Long polling or Http2+Server sent event.
Can you do a video on system design for a twitter or IG notification system?
How do you ensure consistency between the 3 redis instances? If a write to one of the three fails, are all rolled back?
That‘s the part of EVENTUAL consistency. In comparison to a system thats trades in availability to gain strong consistency, a social network favors availability and can live with the fact that you may see a tweet earlier than I do etc.
Now how do you keep those instances eventually consistent? Master/slave, quorums, gossip protocols etc
This guy is awesome! Subscribed
Your videos have been amazing. They are a great complement to other videos that are more algorithm focused.
With this design does that mean Twitter stores O(N*3), N being the total amount of active twitter users, timeline in memory?
Why tweeting is PUT request and not POST?
agree, PUT used for update
The idea is that you’re UPDATING your followers list of tweets, adding to the list which is stored in the in memory DB, not creating a brand new tweet in a relational database. PUT request is correct.
@@marcushines4172 Not necessarily. From my point of view, a PUT would require sending the whole LIST object with updated values, whereas in this case only a new tweet is being sent. We don't know what data structure (software / hardware) actually holds the list of tweets for BOB's homepage, but we could suspect an INSERT-like operation is taking place there. So POST would maybe be the better answer in this case ?
Thank you for this amazing video.
I have one question:
Are we storing the tweet ID or the actual tweet in the Redis list?
Coz If we are storing the tweet id than don't we have to run a SQL query in the tweet table while building the home timeline
you'd store a map from ID => tweet (redis is a key/val store in memory)
@@howardwang2821 Redis has many complex data structures, we can have lists
Can you please do a system design video on 2 topics
A) how do u make sure the number of simultaneous video streams somebody watching let’s say Netflix is only 3 devices at a time.
B) windows system update, how do u stream a windows system update to client computers ?
Thank you so much for posting this video. This is great! Many regards.
I don't think it is fair to ask this question in an interview and expect the candidate to come up with this Redis solution. If the candidate can explain why fan out is a challenge and come up with some reasonable solutions/suggestions, that must be sufficient.
I agree.. during an inverview, as long as you point out the challenges of using a relational database using disk in the back end and point out that writes might be too slow and you can use either caching/in memory db then that should be enough... I really don't know why he started with the data modeling which is a whole different aspect of the system's design... but oh well..
Great video, thanks a lot. Shouldn't load balancer connects to servers and the servers access external persistent memory like Radis?
Radis is in-memory which is very fast compared with external database. Moreover, fetching data from external DB is much costly
Redis boxes are servers by themselves. You can decide to put your in-memory caching either in the same machines that are serving the initial HTTPs requests or have a dedicated fleet (most used).
I learned interesting things from this video but it was also pretty historical. I mean, I'm not sure how much signal I got about his design skills and tech leadership capabilities; I can tell he knows how Twitter works.
no scale estimation ? throughput estimation ? storage estimation ?
Why there are 3 REDIS machines?
So Alice tweets a message and Bob accesses home timeline. How we chose those 3 Redis servers? If based on Alice's IP then how Bob will find these 3 servers?
Hi, Can you please do a video on designing a service like Uber/Lyft? Including services like location based look-ups for cabs, computing route, fare etc. It seems to be a common interview question. Great job by the way.
+Eldo Joseph yes! Thats exactly what I have planned for the next system design video. Thank you!
Also designing a recommendation system please. Thank you so much for taking the time to make these videos. They are very helpful and resourceful. Glad and lucky to have come across your channel.
+Akshatha Thank you, thats always great to hear! I’ll do my best to make some more of these asap :D
Why use 3 Redis clusters? Is that arbitrary?
NEW: Check out my brand new website www.successintech.com
Doesn't twitter store its tweets in some database permanently or it stores everything *in memory* redis databases ? Like , what if the redis machines malfunction one day .
I like the Video, it certainly got me thinking, but... The Load Balancer is just a "Router" so it needs to route to some sort of "Follower handler" for Alices PUT's. Then that "Follower handler" needs to Query Alice's Following list (probably also on replicated REDIS and probably local on the "Follower Handler" box and why it was routed there) and then (and only then) can the "Follower Handler" send all the PUTs out to probably another Load Balancer (probably many) to update all the HOME Lists of all the followers (i.e. all 100 followers). In the Video you just draw a T under the Load Balancer. To me, that's the bulk of the problem. Just like the Query to Join the Follow table was on the Relation DB solution. But the Video just shows two intersecting lines under a Load Balancer, a box that doesn't "think", it just routes work. The video also never discussed how Lady Gaga's tweets get merged into Bobs Timeline, again one of the other major problems with tweeting (that is understandably a PART II). But how the "fan out" happens is not clear or shown.
Your content is amazing. You should create Udemy course on System Design Interview Questions.
One question, probably silly one.:) At the end of the video, you mentioned that LB will do hash look up to find which RADIS cluster to query for BOB's timeline, does this mean that always BOB timeline will be stored in same 3 RADIS machine? Or it depends on something.
+Nirav Purohit No silly questions, don’t worry. It means Bob’s timeline is always stored on 3 different Redis hosts (for availability and other reasons). Now when he visits the timeline in his browser, the LB (or a component behind the LB) queries the hashmap to find one of those 3 Redis machines. The content returned by the fastest of those 3 machines is returned to Bob’s browser.
It would be very useful to have a version of this that isn’t sped up while you’re drawing things. It’s not as important, but I do get some anxiety trying to figure out how (or if) to fill the awkward silences during stuff like that
That‘s a super interesting point, thank you. I‘ll try to address this in my next video!
What about a timeline that’s dominated by large, famous accounts?
Thank you so much for sharing this. But I think it is just some parts of the system design. We still have a lot of things to introduce. Redis is a memory database, but what if all the replicas are down? We should store the data in disk, with Redis itself or other no-sql databases. Shall we consider how the servers work, what servers we should have, with read and post servers respectively? How we consider the security issues, shall we user a gateway, and introduce SOA theory tools like service registry and discovery?
+Yasen Zhang Yeah, nobody expects you to cover the in‘s and out‘s of such a system in a 45min interview. Architectures like this grow over years. If the interviewer wants you to cover a specific topic then you should dive deeper into it.
Got it. I've never had a system design interview before. But I gonna take one next week. It really helps me . Thank you.
Hi, Can you please do a video on designing a service like google docs and how to keep everything in sync, concurrent writes by multiple users etc
11:12 , I think you meant everybody that follows you?
Kyle Li agree.
Agreed
I love the video, thank you for doing this. To design the system like Tweeter in the time constraint of an interview, I would probably start with no caching layer whatsoever. Just bunch of distribiuted databases. In every case, REDDIS is typically in-memory and has to have a traditional database as its source. Another point I would cover is once the user is logged and receives his initial snapshot of timelines, how does Tweeter merge live updates into user timelines.. Then an interesting question is if half of your friends are local and half are on the other continent. How does twitter merge tweets with different latency profiles. In any case, thanks for doing it!!!
Hello, is RabbitMQ is a good choice for user notifications feature in webapps like twitter/fb ?
Brilliant video on Twitter news feed generation. But storage of tweets was not discussed. Since 100 million tweets are created per day. Would have been interesting to know how this massive volume of tweet are stored and scaled. Thanks for video.
Seriously, A big Thank You for these videos!
What should I read to dig a bit deeper into these topics? Like technologies used etc...
+Jon Snow Hey man, glad you like them! If you want to dive deeper into real world solutions highscalability.com is really good. On the more theoretical side of things the books of Andrew S. Tannenbaum are classics but they get updated quite often. Definitely worth reading.
can we simulate a basic scalable setup like this in cloud, like AWS, and test the performance ?
Is it true that they only use an in-memory database? I would think that they have another database that persists the tweet and then they just use the cache for timelines and update the timelines in the cache (which redis can be used as).
I‘m sure they have long-term storage
This was great! I thought fan-out was asynchronously sending a message to a number of recipients (e.g. when you submit a tweet it gets sent to the search pipeline, to the user timeline pipeline, plus other pipelines). But you seem to suggest fan-out has something to do with the Redis precomputation step?
Hey chris, thank you. The word fanout is very generic. It just means that there is one or more entities which are then duplicated and sent to a number of recipients. The context is important.
In this case a tweet is duplicated across 3 replicas of Redis during precomputation.
Thanks Ramon such good explanations,
1. purpose of 3 cluster ? is only for - the fastest one response to be taken as result?
2. user bob table and follower table are created in radish cache only not physical db tables?
Thanks! 1. speed and replication 2. yes they are stored in a conventional DB too, as a backup so to say.
Where is the tweet ID generated? Does the frontend generate it while creating a tweet?
This was amazing! Thank you so much, as a beginner on System design, you explained it beautifully.