I think you can tell that Mark has had a lot of experience as he's able to navigate easily even topics that he declares that he doesn't master and he's also an excellent communicator. Those are very valuable soft skills and not many people think about them. Also, I loved his enthusiasm at the end. However, I imagine that Mark hasn't had a lot of changes to build systems in the recent years as there are some slight technical issues with what he proposed and, since this channel probably attracts many beginners, I'll go over them and try to come up with better alternatives than what Mark proposed. Hopefully it will be an useful exercise! First, the API is a bit wrong. Normally you put the version before the resource name, so instead of `/messages/v1`, it would be `v1/messages`. However, what's more interesting here is the way we model our domain, as it's not just about users and messages and it's mainly about conversations: when you open the application you do not see messages, you see a list of conversations and the messages associated with the latest conversation, when you want to send somebody a message, it is part of a conversation and only one conversation can happen between two users, plus there are features like blocking, notifications, etc that are associated with conversations, not individual messages. So you'd work with something like `POST conversations/conversationId/messages` and then you don't need to add information about the recipient as we already know the conversation ID from the URL (and also scales nicely when thinking about groups). Also, you most likely don't want to mark as read each message individually and instead you'd probably want to mark the conversation as read until a specific message ID (imagine a chat with 500 unread messages, it would be horrible to generate 500 requests just to mark a conversation as read). As for the database, I think Mark kind of rushed to the NoSQL solutions. You can easily scale relational databases like MySQL (Facebook, Github, Shopify are all heavy MySQL users) or PostgreSQL and there are managed offers for both databases. The real question is if you have relational data and you can take advantage of those relationships. And I think we do have relational data: a user has many conversations and a conversation belongs to two participants and a conversation has many messages. Speaking of messages, how do we handle the sent/delivered/read status of a message? By default, a message has the status "sent", otherwise it wouldn't be in our database. Whenever the other participant's application pulls conversation data (either in the background or when the user opens the app), it sends back the ID of the last message received, so all the messages up until that point are marked as "delivered". Whenever the user opens the conversation, the app sends back the ID of the last message that appeared in the viewport, so all messages up until that point are marked as "read". This is more efficient because you can do bulk operations on messages and the logic that determines the limit of those bulk operations is offloaded to each individual device instead of a process that loops over billions of messages. There are some finer points, like how does the application know that a new message has arrived. It could use the push notification system from Apple or Google, but there are no guaranteed deliveries. It could poll an endpoint, but that would generate massive load. It could also open a persistent connection (like Websockets) and receive messages that way, but you will need thousands of machines if you're targeting billions of people. And many others...🙂
This was very insightful. Thanks! What is the best solution for handling 'delivered' statuses, since none of the mentioned solutions seem to be optimal?
I agree on all points, this definitely doesn't seem like a great video to learn from (except the way he communicates), but I disagree on versioning. Having versioning at the beginning is locking you away from having granular versions. I think it's a choice and I would avoid saying things like "normally". As with a lot of system design/arch you need to understand the tradeoffs. Having a `/messages/v1` makes it very simple to put a new version `/messages/v2` into production just for that functionality, which reduces risks and makes releases smaller.
@@NenadLukicArh the path is generally decoupled from the actual code through a router of some sort, either in-app or as part of some API gateway, so it really makes no difference if you deploy code that handles `/v2/messages` or `/messages/v2`. The downside of using the version number as a suffix is that you will end up writing code to tell if the {{id}} in `/messages/{{id}}` refers to the second version of the message listing or a specific message ID. There are some services out there that might still use query params (so `/messages?v=1`), but I haven't seen any widely-used APIs that have the version number as a suffix. Do you happen to know any APIs that put the version number at the end of the URL?
I find it sad that there was no talk about websockets or any other technology that would enable a real-time chat experience. How are two users gonna chat? we designed an API to post messages and an API to get unread messages. Are we supposed to poll the server every second to get the unread messages? how is that gonna work for telegram that has 1 BIllion users and 15Billion messages daily. And what was that logic of looping through unread messages about? how it that possible with these numbers. I think the good part of this interview was the begging, since the estimations were done great though I find it disappointing that those estimations did not really play a role for the next 40 minutes of the interview. We did not use any of our estimations during the actual designing process.
@@joaorosa941 I firmly believe that the idea of looping through unread messages is completely wrong. That status will be always "delivered". Then when a user reads a message the client-side application will send a put request and just update a status.
He did mention that ideally the unread messages could be kept in a separate table or using partitioning of that message table would be another way to solve this.
I am also of the same view and I could not find a single system design of messaging app talking about websocket, message broker and all to achieve a high throughput system
This channel is great, but from what I have seen so far it could improve a little bit by making the interviewer challenge the candidate a bit more; this would be a more real-life scenario than just "agreeing" with all the candidate is doing. i.e. Not all Mark's decisions are ultimately flawless and there are tradeoffs to consider that should be mentioned. Great work, though!
the interviewee is having so much clarity in this thoughts he's amazing crisp clear i love the way how he sticks to just limited details than talking or going through unlimited details in this video, which would lead us to confusion. that greatly shows his experience too
Mark is the best out of all of the videos I've watched. He truely knows what interviewers want. The other videos I've watched are mostly people doing opinionated system design skipping over important segments and drilling down way to early on topics that are waste of time to talk about initially. I think the host has to focus on directing those mock interviews instead of sitting there quiet. We are just lucky that Mark is amazing and naturally host dont even need to intervene cuz he is already going over everything structured like a book
Did not talk about how can users chat, nothing about things like websockets, polling, sse, eventual consistency etc. Should everything be tranferred with HTTP(S) or there is something more efficient? Should all the chatting go through the API servers or we can implement direct communication between the (two) users? I would also not suggest specific technologies (like DynamoDB), but I would rather say something like "I would pick a database that has this and that properties, so that we can address this and that problem". In general, I expected more, especially from a person that served so much time at G.
The number of undelivered messages would only increase, so his strategy looks like a big no. I'm not a software engineer and I'm not experienced, but I thought about clients pulling lost messages when they start, before stablishing a websocket for new messages. This way, undelivered messages would be delievered on demand, and we would never bother with clients that doesn't exist anymore.
I would really have to say that this was a great interview! I just have one nit on the design. I really think a stream/MQ/queue approach to dealing with a message distribution would be more appropriate in this case. "Looping over the database" is a huge waste of resources/bandwidth/etc and a red flag IMO in addition to data consistency and state issues with multiple processors. I can only assume that Mark expects to query an index repeatedly which could also be expensive based on the required latency. Processing message queues scales quite easily and decreases the latency between receiving and routing the message. (ex-G 10y)
Yeah, pushing the message to each of the recipient's devices screams 'pub sub'. I think the main problem of this video is that the interviewer doesn't challenge the interviewee's design at all.
I am not very clear about read messages , it seems it a polling solution from the perspective of recipient, i was think of message subscription and pushing messages to mobile once a message send to server for a recipient . I am not sure I am making sense , it recipient keep polling for messages there may be too many API calls in vain from recipient end.
I know this is for people training their interview, but boy is this content gold for system architects and people that want to build stuff like apps or startups. Just found out about the channel, 10/10
Why do we need a message distributor? When the api server queries and creates a new message entry, it can make an extra query to update the unread_message_ids of the receiver. Doing so can help to get rip of the message distributor.
It's because the interviewer said he is fine with 1-2 seconds latency. If we need lower latencies then we should have a web socket connection with all the online users.
All fair points - I confess I am not as familiar with web sockets, so that's also a reason I chose REST. I imagine you could make this work with a more persistent connection protocol, though.
@@MarkKlenk you are awesome Mark.... 🙂🙂🙂 Thanks for this kind of design videos. This helps us reg how to approach the solution and gives some idea about how to explain things in interview
Fair point - I was exposed to AWS post-Google while working at Uber ATG, and I was really impressed by how clear, consistent, and easy to use AWS solutions are. Amazon has set a really strong bar when it comes to hosted services.
Possibly he left Google prior to the heavy shift to GCP and given he knows ahead the names of the systems he is designing, he likely knows they are platformed on AWS.
thta really awesome. I learnt a lot. One thing puzzled the distributor and from there many question coming. 1. should we really have a massive table on messages? if yes how sharding/hashtable should be implemented? how users and messaging will behandled if both remainsin different box? Otherwise Little elaboration on table structure and sharding would be of great help/ 2. what if considering KAFKA kind of storage system and remove all delivered, hold copies not delivered with some count and day basis. This will eliniminate storage requirement.
thank you for the content it would be great if the interviewer can ask some questions instead of only praising the candidate: like where do the message text sit?, any caching of messages, pull vs push stretegy of delivery of messages and their status etc
Http2 you can have server sent events and no new tcp connection is necessary for each request, even 1.1 has tcp keep alive although that is for different purpose
For the messages not yet received due to a receiver not being available, wouldn't a pub/sub system be a lot more efficient than constantly looping over the unreceived messages? I.e., message discovers it can't be received, so it instead subscribes to the user-just-became-available event and gets stored in a dedicated DB while it waits. When the user logs in, the event fires and the message gets sent.
Would have been nice to compare this approach to an Append-only-log structure like Kafka as a store - I suspect that's where we would have ended up if group messages were part of the design. I think not handling groups was a miss here. I would expect a senior/staff/principle candidate be able to talk about the added scaling needs of group chats.
Nice video, thanks a lot. However, I want to highlight one point that could be useful for those preparing for this type of interview. Do not add more functionality to your system that was discussed in the "Functional Requirement" step. It may play against you in two ways: 1) It could be one of the red flags for the interviewer. 2) You can implement this functionality not optimal and this will also be a red flag for the interviewer. In this interview, Mark added to his system the ability to keep track of the message's status. And this was done, in my opinion, not optimal at all. Nevertheless, thanks for the video.
How would redis be used for thi? Just curious because I feel like there is no need to cache messages since there is no spike in need for a specific set of the messages. Also, I not too familiar with websockets, would the websocket be communicating through the load balancer and how would you deal with closed connections? Thanks!
Thank you so much! This is an excellent resource for everyone to enhance their knowledge and delve into deeper thinking. I have a question: you mentioned that you were storing sent, read, and unread messages under Users. Considering that the number of messages will likely grow over time, there might be a concern about exceeding the storage limit for a single document (or row) in databases like MongoDB, where the maximum size is 16 MB. Couldn't this potentially lead to issues in the future? How do think to resolve this?
each chat message is a document. There is no way that this text will go beyond 16 MB. You literally can't type so much in a message that it take 16MB. Perhaps you could. I once tried to write a really long message in WhatsApp ands add some point, I couldn't type more. So, they have limits to each message.
Hey! you are doing a great work, helped me a lot and I learned alot, just wanted to give you a suggestion, Can you ask more cross questions in these type of interviews. :)
Why there is a need of message distributor ? At the time of receiving messages, one can just update the unread_messages_list of the receipient. Why receipent has to be online for DB update ?
Why not? I'm a newbie in system design so I'm curious why its a bad fit. As far as my understanding, the messages are structured data and need fast read and writes while also being able to scale the size of the database. It seems to me that cassandra would be a great option. I was confused why he used DynamoDB since it is a key-value database which is not as fast as something like cassandra or elastic for lookup. Thanks!
Just out of my curiosity, is there a requirement to store the messages on the server specifically? I was thinking of using apache kafka as the backend, is there any challenges using a message queue for such scenarios?
Good question. However I personally would not use a technology like Kafka in this use case. Kafka is more suitable to be used when wanting to stream large amounts of data between two applications whose processing speeds are different or purpose is different. Also using a technology like Kafka might result in increased latencies when sending or receiving messages. Also you will not want to expose your Kafka topic to the public which means it would go through a web/websocket server. Putting a Kafka topic between the web/websocket/other server and the actual message processing application seems like an unnecessary step. I would much rather use websockets or http2.0. WhatsApp uses XMPP protocol for example.
4:10 he mentioned if its okay to use APIs as a point of entry. What other points of entry could there be? Not very experienced, but I thought that an app needs an API to connect things around, kinda like a middleman. What other options are there?
It's not even smart in many cases to connect the app via an api, if you can somehow avoid it you call the database directly from the app, everything that produces a html request slows down your system, so no need to additionally wrap every app via an api into html calls
@@MirjanScholz doesn't that cause access issues? Since DB Calls need some specific logins and it would be an extreme security concern if those details were exposed
@@MirjanScholz I would disagree if security is even a minor concern, which in the case of Telegram it absolutely would be. You don't want your users to even know the url to your DB let alone making queries against it directly and you would want to have some sort of authentication (ie JWT) to verify that the user has sufficiently logged in.
The first part started good, but the API part is all wrong Those are four Endpoints of ONE API of one service. Also, why would you go so deeply into the implementation of Endpoints for a high level design portion of the system?
The better architecture would be like below. This will fix all the For sending Mobile --> Message Server --> Kafka --> Consumer --> Databse For receiving Mobile
wouldn't this be bad for receiving because if there is no message sent to a phone due to a phone being offline, wouldn't this cause the Kafka pipleine to be held up? Just curious on how this could be approached
Very fair points - that shows you my limited knowledge of Telegram. I probably should have asked about security / privacy requirements, though. Good catch!
as a senior eng you shouldn't ask for scale of the product you are designing, you should make educated guesses and have the interviewer confirm it, better yet do a solution for small scale and then show the way to scale up for bigger traffic
I like how Mark handles this, but I see clearly from his videos that he's not really been hands in implementation. His approach to API and design on Telegram is really flawed. Sad to see the interviewer also didn't correct him.
I think 10k is just a rough estimate for what a commodity server can handle. In "Grokking Modern System Design Interview. . ." they arrive at an estimate of 8k RPS, assuming ~300RPS for CPU bound request and ~16k for RAM bound requests and a 50% split between the two. It's much easier to get rough estimates using 10k instead of 8k though. Similarly when I calculate any from daily -> per second I just assume ~100k seconds per day instead of 86400. It just makes the math much easier and we only need rough numbers. During an interview I would just say "I'm assuming we can get roughly 10k RPS per server" and leave it at that. I don't think you'll get any requests to dive deeper in that assumption, or they have a different number in their head they will just give you their assumption number and you can use that instead.
Indeed - this was an assumption I made, and it really depends on the type of server / virtual machine / instance. When I'm doing mock interviews, I'm not as concerned with underlying assumptions as I am with the math that results from it. Also - yes, 100k seconds per day is a good approximation - what I look for is the candidate to call out that you get averages and may need to add a peak factor.
Appreciate his experience and expertise but he designed Telegram without even talking about real-time messaging! How can an interviewer accept this basic and simplified system design for a platform like Telegram? Chat app without a real-time client-server connection is not a real-world system!
This solution would get easily rejected in a real-world interview. No discussions about how clients interact with the backend - how do we ensure a real-time chat experience? WebSockets, long-polling, etc. The scheduler kind of job that keeps checking databases for undelivered messages etc. doesn't make sense. The interviewer doesn't challenge the interviewee enough.
ehh, that design is not going to make it anywhere close to required performance, I wish he focused more on the actual technical solution than API methods or DB schema, the message distributor thingy is just weird, same as pooling for messages. You'd definitely want some permanent connection ala websockets and probably a queue behind with a backup to DB for historical messages. This, followed by almost no feedback or deep dive questions from the interviewer in a 52min video... huge disappointment
Insightful interview but why the interviewer is silent all the time? There are millions of questions and drill downs that he should ask but he just sits and nods. I'm sorry but It would be 100 times more useful to make an interview with a real interviewer not with a nodding head. You won't see such an interviewer irl and if you would run from this company.
I certainly liked that interview, but that design... Let's be honest, drill down was absolute mess. You just can't store messages inside user on that scale. Dot.
The guy is blowing hot air. He has no idea how to design a messenger (like Telegram). Extremely poor explanation, extremely poor design, which would not work in the real life. His knowledge base is low, his way of thinking is very primitive. As I said, this approach is simply wrong for the presented task. He would not pass my interview by a long shot!
I think you can tell that Mark has had a lot of experience as he's able to navigate easily even topics that he declares that he doesn't master and he's also an excellent communicator. Those are very valuable soft skills and not many people think about them. Also, I loved his enthusiasm at the end.
However, I imagine that Mark hasn't had a lot of changes to build systems in the recent years as there are some slight technical issues with what he proposed and, since this channel probably attracts many beginners, I'll go over them and try to come up with better alternatives than what Mark proposed. Hopefully it will be an useful exercise!
First, the API is a bit wrong. Normally you put the version before the resource name, so instead of `/messages/v1`, it would be `v1/messages`. However, what's more interesting here is the way we model our domain, as it's not just about users and messages and it's mainly about conversations: when you open the application you do not see messages, you see a list of conversations and the messages associated with the latest conversation, when you want to send somebody a message, it is part of a conversation and only one conversation can happen between two users, plus there are features like blocking, notifications, etc that are associated with conversations, not individual messages. So you'd work with something like `POST conversations/conversationId/messages` and then you don't need to add information about the recipient as we already know the conversation ID from the URL (and also scales nicely when thinking about groups). Also, you most likely don't want to mark as read each message individually and instead you'd probably want to mark the conversation as read until a specific message ID (imagine a chat with 500 unread messages, it would be horrible to generate 500 requests just to mark a conversation as read).
As for the database, I think Mark kind of rushed to the NoSQL solutions. You can easily scale relational databases like MySQL (Facebook, Github, Shopify are all heavy MySQL users) or PostgreSQL and there are managed offers for both databases. The real question is if you have relational data and you can take advantage of those relationships. And I think we do have relational data: a user has many conversations and a conversation belongs to two participants and a conversation has many messages.
Speaking of messages, how do we handle the sent/delivered/read status of a message? By default, a message has the status "sent", otherwise it wouldn't be in our database. Whenever the other participant's application pulls conversation data (either in the background or when the user opens the app), it sends back the ID of the last message received, so all the messages up until that point are marked as "delivered". Whenever the user opens the conversation, the app sends back the ID of the last message that appeared in the viewport, so all messages up until that point are marked as "read". This is more efficient because you can do bulk operations on messages and the logic that determines the limit of those bulk operations is offloaded to each individual device instead of a process that loops over billions of messages.
There are some finer points, like how does the application know that a new message has arrived. It could use the push notification system from Apple or Google, but there are no guaranteed deliveries. It could poll an endpoint, but that would generate massive load. It could also open a persistent connection (like Websockets) and receive messages that way, but you will need thousands of machines if you're targeting billions of people. And many others...🙂
Great insights 👍
This was very insightful. Thanks! What is the best solution for handling 'delivered' statuses, since none of the mentioned solutions seem to be optimal?
I agree on all points, this definitely doesn't seem like a great video to learn from (except the way he communicates), but I disagree on versioning. Having versioning at the beginning is locking you away from having granular versions. I think it's a choice and I would avoid saying things like "normally". As with a lot of system design/arch you need to understand the tradeoffs. Having a `/messages/v1` makes it very simple to put a new version `/messages/v2` into production just for that functionality, which reduces risks and makes releases smaller.
@@NenadLukicArh the path is generally decoupled from the actual code through a router of some sort, either in-app or as part of some API gateway, so it really makes no difference if you deploy code that handles `/v2/messages` or `/messages/v2`.
The downside of using the version number as a suffix is that you will end up writing code to tell if the {{id}} in `/messages/{{id}}` refers to the second version of the message listing or a specific message ID.
There are some services out there that might still use query params (so `/messages?v=1`), but I haven't seen any widely-used APIs that have the version number as a suffix. Do you happen to know any APIs that put the version number at the end of the URL?
Great explication, i agree with you 100%
I find it sad that there was no talk about websockets or any other technology that would enable a real-time chat experience. How are two users gonna chat? we designed an API to post messages and an API to get unread messages. Are we supposed to poll the server every second to get the unread messages? how is that gonna work for telegram that has 1 BIllion users and 15Billion messages daily. And what was that logic of looping through unread messages about? how it that possible with these numbers. I think the good part of this interview was the begging, since the estimations were done great though I find it disappointing that those estimations did not really play a role for the next 40 minutes of the interview. We did not use any of our estimations during the actual designing process.
Welcome to system design interviews where back-of-the-envelope calculation it's just a way to show off you know some math
@@joaorosa941 I firmly believe that the idea of looping through unread messages is completely wrong. That status will be always "delivered". Then when a user reads a message the client-side application will send a put request and just update a status.
He did mention that ideally the unread messages could be kept in a separate table or using partitioning of that message table would be another way to solve this.
@@julianosanm agree
I am also of the same view and I could not find a single system design of messaging app talking about websocket, message broker and all to achieve a high throughput system
This channel is great, but from what I have seen so far it could improve a little bit by making the interviewer challenge the candidate a bit more; this would be a more real-life scenario than just "agreeing" with all the candidate is doing. i.e. Not all Mark's decisions are ultimately flawless and there are tradeoffs to consider that should be mentioned. Great work, though!
I totally agree that there wasn't much feedback or discussion from the interviewer. It just felt like Mark was giving a tutorial (a good one though)
the interviewee is having so much clarity in this thoughts he's amazing crisp clear i love the way how he sticks to just limited details than talking or going through unlimited details in this video, which would lead us to confusion. that greatly shows his experience too
mark's accent is very native and comfortable, this is a good material for practice my spoken english
Glad to hear that!
Mark is the best out of all of the videos I've watched. He truely knows what interviewers want. The other videos I've watched are mostly people doing opinionated system design skipping over important segments and drilling down way to early on topics that are waste of time to talk about initially. I think the host has to focus on directing those mock interviews instead of sitting there quiet. We are just lucky that Mark is amazing and naturally host dont even need to intervene cuz he is already going over everything structured like a book
Did not talk about how can users chat, nothing about things like websockets, polling, sse, eventual consistency etc. Should everything be tranferred with HTTP(S) or there is something more efficient? Should all the chatting go through the API servers or we can implement direct communication between the (two) users?
I would also not suggest specific technologies (like DynamoDB), but I would rather say something like "I would pick a database that has this and that properties, so that we can address this and that problem".
In general, I expected more, especially from a person that served so much time at G.
Looping over messages in database with 10B messages a day? Seriosly?
The number of undelivered messages would only increase, so his strategy looks like a big no. I'm not a software engineer and I'm not experienced, but I thought about clients pulling lost messages when they start, before stablishing a websocket for new messages. This way, undelivered messages would be delievered on demand, and we would never bother with clients that doesn't exist anymore.
I would really have to say that this was a great interview! I just have one nit on the design. I really think a stream/MQ/queue approach to dealing with a message distribution would be more appropriate in this case. "Looping over the database" is a huge waste of resources/bandwidth/etc and a red flag IMO in addition to data consistency and state issues with multiple processors. I can only assume that Mark expects to query an index repeatedly which could also be expensive based on the required latency. Processing message queues scales quite easily and decreases the latency between receiving and routing the message. (ex-G 10y)
Your approach sounds interesting, could you share a bit more details about the stream/queue approach?
I'm confused to why we cant just add the message_id to the recipient users undelivered messages when we post the new message in the first place?
Yeah, pushing the message to each of the recipient's devices screams 'pub sub'. I think the main problem of this video is that the interviewer doesn't challenge the interviewee's design at all.
I am not very clear about read messages , it seems it a polling solution from the perspective of recipient, i was think of message subscription and pushing messages to mobile once a message send to server for a recipient . I am not sure I am making sense , it recipient keep polling for messages there may be too many API calls in vain from recipient end.
My first thought was producer/consumer design pattern, using messaging services like Apache Kafka or JMS.
Nice content, Probably to add below contents:
* Cold Storage for Old Message- cost optimisation and better for OLTP side
* WebSockets
I know this is for people training their interview, but boy is this content gold for system architects and people that want to build stuff like apps or startups. Just found out about the channel, 10/10
Can you suggest any other YT channels too ?
Great job! Enjoyed the way you are doing it❤
Why do we need a message distributor? When the api server queries and creates a new message entry, it can make an extra query to update the unread_message_ids of the receiver. Doing so can help to get rip of the message distributor.
A good work by IGotAnOffer , detailed explanation on system design questions. All the best!
Thanks Gaurav, glad you found it useful :)
Why did we go with HTTP REST over a web socket?
Web sockets are difficult to scale because it eventually requires a distributed hash table. They make your server stateful. HTTP w REST is stateless.
I was wondering this too since many system designs interviews for Whatsapp use websockets
It's because the interviewer said he is fine with 1-2 seconds latency. If we need lower latencies then we should have a web socket connection with all the online users.
All fair points - I confess I am not as familiar with web sockets, so that's also a reason I chose REST. I imagine you could make this work with a more persistent connection protocol, though.
@@MarkKlenk you are awesome Mark.... 🙂🙂🙂 Thanks for this kind of design videos. This helps us reg how to approach the solution and gives some idea about how to explain things in interview
It's a funny thing that ex-Googler tells more about AWS than Google solutions 😂
Fair point - I was exposed to AWS post-Google while working at Uber ATG, and I was really impressed by how clear, consistent, and easy to use AWS solutions are. Amazon has set a really strong bar when it comes to hosted services.
Possibly he left Google prior to the heavy shift to GCP and given he knows ahead the names of the systems he is designing, he likely knows they are platformed on AWS.
Or he was laid off and now doesn't give a F 😂
thta really awesome. I learnt a lot. One thing puzzled the distributor and from there many question coming.
1. should we really have a massive table on messages? if yes how sharding/hashtable should be implemented? how users and messaging will behandled if both remainsin different box?
Otherwise Little elaboration on table structure and sharding would be of great help/
2. what if considering KAFKA kind of storage system and remove all delivered, hold copies not delivered with some count and day basis. This will eliniminate storage requirement.
thank you for the content
it would be great if the interviewer can ask some questions instead of only praising the candidate:
like where do the message text sit?, any caching of messages, pull vs push stretegy of delivery of messages and their status etc
Couldn't the message distributer be replaced with an event driven solution like kafka/kinesis?
Great interview!
This channel is quite underrated.
Keep up the great work!
👏👏👏
Thanks Al, spread the word!
Shouldn't we use WebSocket instead of Rest? Why to establish connection for each msg? Also Server can't send request if rest is used
Http2 you can have server sent events and no new tcp connection is necessary for each request, even 1.1 has tcp keep alive although that is for different purpose
@@samhadi7972 still, this case is made for technologies like websockets, signalR etc
I have a senior systems interview tonight.
I've probably watched 100 hours of breakdowns and mock interviews in the last 2 weeks.
Wish me luck.
Go on David, you can do it!
i hope you nailed it, please share some resources for those 100hours
For the messages not yet received due to a receiver not being available, wouldn't a pub/sub system be a lot more efficient than constantly looping over the unreceived messages? I.e., message discovers it can't be received, so it instead subscribes to the user-just-became-available event and gets stored in a dedicated DB while it waits. When the user logs in, the event fires and the message gets sent.
By the time we reach the end of this video, it turn out to be hilarious
His design really reflected his lack of understanding of how front end works
So is each client polling the server every 2s? Resulting in a db query each time?? To check unread messages
Would have been nice to compare this approach to an Append-only-log structure like Kafka as a store - I suspect that's where we would have ended up if group messages were part of the design. I think not handling groups was a miss here. I would expect a senior/staff/principle candidate be able to talk about the added scaling needs of group chats.
That would be great if Mark makes course on system design, I would definitely purchase it.
Are we shifting the values from unread_messages_id's to read_message_ids once user reads the message ? Please clarify more, if yes then how ?
Where is a Text in the message table?) I guess it was lost
Any idea about the cool tool Mark is using for drawing?
The tool is Google Draw.
So we're not going to discuss the main selling point of Telegram: Security? End to end encryption ring a bell?
Yep that was my bad as the interviewer, I meant to ask him on that but we ran out of time.
great information
Nice video, thanks a lot. However, I want to highlight one point that could be useful for those preparing for this type of interview. Do not add more functionality to your system that was discussed in the "Functional Requirement" step. It may play against you in two ways:
1) It could be one of the red flags for the interviewer.
2) You can implement this functionality not optimal and this will also be a red flag for the interviewer.
In this interview, Mark added to his system the ability to keep track of the message's status. And this was done, in my opinion, not optimal at all.
Nevertheless, thanks for the video.
12:52 [Candidate] Let me put some.. [My brain: let me put some dirt in your eye]
19:54 yes we did put a number there :)
I wonder what happened to the other videos of Hong Lu, did they get deleted?
Awesome, thank you
Need more of these mock interviews
lots more here: ua-cam.com/channels/Wlbj3trSoIU3SHrJCSiAuA.html
No need for websockets, redis, kafka? :o
How would redis be used for thi? Just curious because I feel like there is no need to cache messages since there is no spike in need for a specific set of the messages. Also, I not too familiar with websockets, would the websocket be communicating through the load balancer and how would you deal with closed connections? Thanks!
Thank you so much! This is an excellent resource for everyone to enhance their knowledge and delve into deeper thinking. I have a question: you mentioned that you were storing sent, read, and unread messages under Users. Considering that the number of messages will likely grow over time, there might be a concern about exceeding the storage limit for a single document (or row) in databases like MongoDB, where the maximum size is 16 MB. Couldn't this potentially lead to issues in the future? How do think to resolve this?
each chat message is a document. There is no way that this text will go beyond 16 MB. You literally can't type so much in a message that it take 16MB. Perhaps you could. I once tried to write a really long message in WhatsApp ands add some point, I couldn't type more. So, they have limits to each message.
Did we actually need a message distributor? Why not add the messge to the recipient's unread messages the moment it gets posted by the sender?
Golden content
Hey! you are doing a great work, helped me a lot and I learned alot, just wanted to give you a suggestion, Can you ask more cross questions in these type of interviews. :)
Hi Vikas, noted will try to do so in future videos
Which diagram making software is being used here? Can anybody please tell me the game? Thanks.
He's using Google Draw :)
Shouldn't we sketch out the data model before selecting the db?
Why there is a need of message distributor ? At the time of receiving messages, one can just update the unread_messages_list of the receipient. Why receipent has to be online for DB update ?
24:00 Cassandra is not a good fit for this usecase. The trade-offs are completely missing.
Why not? I'm a newbie in system design so I'm curious why its a bad fit. As far as my understanding, the messages are structured data and need fast read and writes while also being able to scale the size of the database. It seems to me that cassandra would be a great option. I was confused why he used DynamoDB since it is a key-value database which is not as fast as something like cassandra or elastic for lookup. Thanks!
Just out of my curiosity, is there a requirement to store the messages on the server specifically? I was thinking of using apache kafka as the backend, is there any challenges using a message queue for such scenarios?
Good question. However I personally would not use a technology like Kafka in this use case. Kafka is more suitable to be used when wanting to stream large amounts of data between two applications whose processing speeds are different or purpose is different.
Also using a technology like Kafka might result in increased latencies when sending or receiving messages. Also you will not want to expose your Kafka topic to the public which means it would go through a web/websocket server. Putting a Kafka topic between the web/websocket/other server and the actual message processing application seems like an unnecessary step.
I would much rather use websockets or http2.0. WhatsApp uses XMPP protocol for example.
Also unread messages would need to be stored somewhere to be sent later so you'd want some database for it.
4:10 he mentioned if its okay to use APIs as a point of entry. What other points of entry could there be? Not very experienced, but I thought that an app needs an API to connect things around, kinda like a middleman. What other options are there?
It's not even smart in many cases to connect the app via an api, if you can somehow avoid it you call the database directly from the app, everything that produces a html request slows down your system, so no need to additionally wrap every app via an api into html calls
@@MirjanScholz doesn't that cause access issues? Since DB Calls need some specific logins and it would be an extreme security concern if those details were exposed
@@MirjanScholz I would disagree if security is even a minor concern, which in the case of Telegram it absolutely would be. You don't want your users to even know the url to your DB let alone making queries against it directly and you would want to have some sort of authentication (ie JWT) to verify that the user has sufficiently logged in.
What program are using in these interviews?
Hi Per, Mark was using Google Draw
a google em uses job runners to distribute messages for a 500k m/s application?
The first part started good, but the API part is all wrong Those are four Endpoints of ONE API of one service. Also, why would you go so deeply into the implementation of Endpoints for a high level design portion of the system?
could you store messages in s3? this way you can store images there as well.
This is just for learning purposes, if you bring this design to the interview process, you will get good amount of questions from the interviewer 😅
thanks very usefull
The better architecture would be like below. This will fix all the
For sending
Mobile --> Message Server --> Kafka --> Consumer --> Databse
For receiving
Mobile
wouldn't this be bad for receiving because if there is no message sent to a phone due to a phone being offline, wouldn't this cause the Kafka pipleine to be held up? Just curious on how this could be approached
@@suchitbandaram5390
This flow is assumed that you are chatting and mobile is online
Mobile
Not one comment about security? Encryption? Omg.
Considering that this is one of the key selling points of Telegram, yes I'll admit it was strange that this wasn't mentioned.
Very fair points - that shows you my limited knowledge of Telegram.
I probably should have asked about security / privacy requirements, though.
Good catch!
But if someone is not telegram user, how will he know that security is the selling point. Should interviewer ask for that?
At about minute 40:00 Tom struggling staying awake 😂
as a senior eng you shouldn't ask for scale of the product you are designing, you should make educated guesses and have the interviewer confirm it, better yet do a solution for small scale and then show the way to scale up for bigger traffic
this was epic
glad it was helpful :)
Engineering Management and Engineering are two different disciplines.
I like how Mark handles this, but I see clearly from his videos that he's not really been hands in implementation. His approach to API and design on Telegram is really flawed. Sad to see the interviewer also didn't correct him.
I was not satisfied with the idea of how he was storing the read/unread messages.
what seems appealing to me is that interviewer does not ask question during the explanation
How/Why did we assume that one server can handle only 10k requests/second?
I think 10k is just a rough estimate for what a commodity server can handle. In "Grokking Modern System Design Interview. . ." they arrive at an estimate of 8k RPS, assuming ~300RPS for CPU bound request and ~16k for RAM bound requests and a 50% split between the two. It's much easier to get rough estimates using 10k instead of 8k though. Similarly when I calculate any from daily -> per second I just assume ~100k seconds per day instead of 86400. It just makes the math much easier and we only need rough numbers. During an interview I would just say "I'm assuming we can get roughly 10k RPS per server" and leave it at that. I don't think you'll get any requests to dive deeper in that assumption, or they have a different number in their head they will just give you their assumption number and you can use that instead.
Indeed - this was an assumption I made, and it really depends on the type of server / virtual machine / instance.
When I'm doing mock interviews, I'm not as concerned with underlying assumptions as I am with the math that results from it.
Also - yes, 100k seconds per day is a good approximation - what I look for is the candidate to call out that you get averages and may need to add a peak factor.
Appreciate his experience and expertise but he designed Telegram without even talking about real-time messaging! How can an interviewer accept this basic and simplified system design for a platform like Telegram? Chat app without a real-time client-server connection is not a real-world system!
It's only me or others that find you two look alike.
great video.
This solution would get easily rejected in a real-world interview. No discussions about how clients interact with the backend - how do we ensure a real-time chat experience? WebSockets, long-polling, etc. The scheduler kind of job that keeps checking databases for undelivered messages etc. doesn't make sense. The interviewer doesn't challenge the interviewee enough.
Look at the draw here.. Is this a real Googler?
ehh, that design is not going to make it anywhere close to required performance, I wish he focused more on the actual technical solution than API methods or DB schema, the message distributor thingy is just weird, same as pooling for messages. You'd definitely want some permanent connection ala websockets and probably a queue behind with a backup to DB for historical messages. This, followed by almost no feedback or deep dive questions from the interviewer in a 52min video... huge disappointment
Get affordable, 1-to-1 expert coaching to ace your system design interview: igotanoffer.com/en/interview-coaching/type/tech-interview?UA-cam&
the interviewer was set to responds yes to every decision the interviewee made -_-
Learn about the system design of Google drive here : ua-cam.com/video/QHrqsB_3pJM/v-deo.html
45:30
Insightful interview but why the interviewer is silent all the time? There are millions of questions and drill downs that he should ask but he just sits and nods. I'm sorry but It would be 100 times more useful to make an interview with a real interviewer not with a nodding head. You won't see such an interviewer irl and if you would run from this company.
Pretty sure the interviewer has zero practical experience
Can’t really start with how this design system is bad
I certainly liked that interview, but that design... Let's be honest, drill down was absolute mess. You just can't store messages inside user on that scale. Dot.
Build up me
So it's not a real interview
I expected better! :/
The guy is blowing hot air. He has no idea how to design a messenger (like Telegram). Extremely poor explanation, extremely poor design, which would not work in the real life. His knowledge base is low, his way of thinking is very primitive. As I said, this approach is simply wrong for the presented task. He would not pass my interview by a long shot!
Message system is good but last time I was on system design interview they ask me to design air traffic control software