System Design Interview - Notification Service

Поділитися
Вставка
  • Опубліковано 2 жов 2024
  • Please check out my other video courses here: www.systemdesi...
    Topics mentioned in the video:
    Functional (create topic, publish message, subscribe to a topic) and non-functional (high scalability, high availability, high performance, durability) requirements.
    High-level architecture of a notification service.
    FrontEnd service host components (reverse proxy, local cache, logs and metrics agents).
    Metadata service, distributed cache, consistent hashing ring, gossip protocol.
    Storage for messages: SQL/NoSQL database, in-memory store, distributed message queue, stream-processing platform.
    Message sender service, thread pool, semaphore.
    Duplicate messages, retry policy, message order, security, monitoring.
    Inspired by the following interview questions:
    Amazon (www.careercup....)
    Flipkart (www.careercup....)
    Microsoft (www.careercup....)
    Uber (www.careercup....)

КОМЕНТАРІ • 353

  • @ted2101977854
    @ted2101977854 5 років тому +205

    This is somehow the best system design tutorial I've ever seen. Keep doing bro!

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +75

      This is somehow one of the best feedbacks I've ever seen. Keep sharing your thoughts with us all bro! )))

    • @gxbambu
      @gxbambu 4 роки тому +5

      2nd this. Not only for interviews, but also for day-to-day work! thanks.

    • @xiaopeiyi
      @xiaopeiyi 2 роки тому

      It is.

    • @mostinho7
      @mostinho7 Рік тому +1

      @@SystemDesignInterviewthank you for making these videos! What would you recommend to learn these topics in depth? What do you personally use? Any good books, courses etc?

  • @SamWhitlock
    @SamWhitlock 3 роки тому +95

    The pedagogical steps in all of these videos are perfect. So many other "courses" say "here's how to design whatsapp" don't really start from first principles like this! (leaving me scratching my head wondering why they did something!)
    I hope you find time to continue making these kinds of videos. I'd absolutely support via patreon or an online course if you made it. There's really nothing as well-presented as this out there!

  • @riteshthakur9064
    @riteshthakur9064 3 роки тому +65

    finally, someone talking about system design who actually know how things work internally :-).
    Kudos to you brother!

  • @saurabhchoudhary9260
    @saurabhchoudhary9260 5 років тому +41

    Best System design Video ever. No buzzwords and it also goes into depth of various components reasonably well. I'm waiting for more videos from you

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +4

      Thank you for the feedback, Saurabh!

    • @vova_dev
      @vova_dev 2 роки тому

      Mikhail made a course. More material there now)

  • @vinibaggio
    @vinibaggio 5 років тому +32

    I had an interview at A Very Big Company that asked me a similar question and this video was extremely helpful.

  • @sigorbor
    @sigorbor 3 роки тому +18

    Absolutely the best System Design videos on the internet. I especially love that the videos are kept relatively short despite the huge amount of knowledge and explanation. Waiting for the new videos! Спасибо огромное, Михаил!

  • @hadimajeed1078
    @hadimajeed1078 4 роки тому +11

    The best part of this channel is that Mikhail does not use ready-made solution available in the market. He shows us how to think simple and pushes for us to think what is under the hood. The thinking simple forces us to challenge our basics. Exactly what we need to prepare for System Design interview. Sadly, he has not been publishing any new videos for a long time now :-)

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +9

      Glad you liked the channel, Hadi! Thank you a lot for the feedback.
      I do plan to come back to UA-cam with more regular video postings. Just need a bit more time to finish what I am working on currently. Stay tuned.

    • @gemtyler8258
      @gemtyler8258 3 роки тому

      @@SystemDesignInterview appreciated with all the videos!

  • @jituborse2193
    @jituborse2193 5 років тому +11

    These videos are really high quality and in depth. Haven't found any others diving deep into each individual component specially with options for each component and pros and cons for choices made.

  • @zhuoqianzhang4399
    @zhuoqianzhang4399 3 роки тому +3

    I think there are few major areas I would like to see more details if I am interviewer:
    1. What's stored in the tmp storage? what's the schema look like? Is it denormalized msg? say for subscriber a, b, c there are msg 1, 2, 3 not delivered?
    2. How to handle large fan-out msg? Say Twitter used this notification system and Trump tweet... XD Is it a single node sender service to pick the tweet and try to notify all followers/subscribers? I guess that will take forever.
    I really like expanding #2 above, because large fan-out is a really hard problem to solve in real life.

  • @SatyanarayanaBolenedi
    @SatyanarayanaBolenedi 4 роки тому +10

    I feel, This channel is hidden Gem!!!
    Thank you so much for explaining things in great details!!

  • @bankybaba
    @bankybaba 4 роки тому +5

    Highly underrated content that is not just full of buzzwords.
    Please keep it up. Can't wait for more uploads.

  • @nikhil_mehta
    @nikhil_mehta 4 роки тому +6

    Mikhail - Amazing videos as always. Really good content and explanation. Thanks for spending time creating such videos! One minor suggestion - it will be great if you can also explain the DB choices (SQL vs NoSQL) and which of the popular NoSQL DBs (Dynamo, Cassandra, MongoDB etc) is a good choice for the use case being discussed in all your videos.

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +3

      Hi N M! Thank you for the feedback!
      You are right, DB related topics deserve more attention. And I plan to close this gap. The recent video (ua-cam.com/video/bUHFg8CZFws/v-deo.html) contains some information on this topic. More to come!

  • @anmoldua6574
    @anmoldua6574 2 роки тому +2

    Absolutely Amazing ....I haven't found any other video from any other you tuber with such details. Thanks for the content. Are there any videos of your content which are not available in you tube,might be in udemy .I want to checkout out that as well. Kindly tell.
    Thanks

  • @sunnyshang4350
    @sunnyshang4350 3 роки тому +7

    This is the most densely packed sys design video I’ve seen and it’s so full of good information. Really appreciate your work!

  • @partrivedi1122
    @partrivedi1122 2 роки тому +10

    I have been watching Mikhail's videos for 2 years now, and they continue to be extremely valuable, and the best sys arch videos on youtube. If you can grok all of his concepts, you will be prepared. Still, it would be nice to see Mikhail cover some of the more advanced topics that the book "Designing Data Intensive Applications" discusses.

  • @ashwinravichandran1701
    @ashwinravichandran1701 5 років тому +7

    Great video!
    I have a question regarding message retriever component. How do we ensure that same messages are not being read from temporary storage by multiple threads/hosts?

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +15

      Hi Ashwin. Great question.
      It depends on what Storage we use. Let's take a look at different options.
      Message queue. There may be several different flavors. For example, in AWS SQS, when message is retrieved, it is marked as invisible for some period of time. This prevents other consumers from processing the message. More on this here: docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html
      In case of Kafka or AWS Kinesis there is this concept of a monotonically increasing sequence number. Consumers keep track of what message (number) they have processed.
      In case of a database, we need to implement our own logic. For example delete this message from the database when message is retrieved and store it back in case message delivery failed.
      Please, also remember that SQS/Kafka/Kinesis supports at-least-once semantics. Which means that the same message may be delivered more than one time to the consumer.

    • @kkthespidy
      @kkthespidy 4 роки тому

      @@SystemDesignInterview Instead of deleting message from database, we can consider making it inactive using some flag, because if the host processing the messages is down, all those messages might be permanently lost.

    • @wfan2844
      @wfan2844 2 роки тому

      @@SystemDesignInterview With Kafka/Kinesis/Azure EventHub, isn't Kafka only allow one consumer per partition? If you have multiple threads, kafka will block other threads from concurrently access the same partition. So, in reality, the threads are reading the kafka message sequentially. We could of course have different threads read from different partition of the same topic, but still there is no chance of those threads re-read each others message, unless there is a some thread/consumer crashed before commiting the processed sequenceId. Then another consumer thread might pick it up and re-read the message. Re-read should be fine, since if we want at least once guarantee

  • @xinzhang7817
    @xinzhang7817 10 місяців тому +1

    What is the difference between the frontEnd component and API Gateway?

  • @minostro
    @minostro 5 років тому +3

    I have a question regarding the Metadata Service and its data storage. In the video, you mentioned that Metadata Service will be a distributed cache system, but you didn't mention what technology you will use or in which way you will use the cache system. I think that for the distributed cache system we could use Zookeeper (you mentioned this) and Redis (multiple nodes). The caching strategy will be cache-aside. For storing the data permanently I think we could use a key/value storage such as Dynamo DB. The key will be the name of the topic and the value will be a list of subscribers. If we need to store more information about the topic or the subscribers I would use a document based DB such as mongo/couch DB. Does this make sense?

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +1

      Hi Milton. Thank you for the question!
      All technologies you mentioned make total sense to me and can be applied here.
      I like your thought process and attention to details. Keep sharing your thoughts!

  • @syedhusain5465
    @syedhusain5465 4 роки тому +13

    I must say, this is amazing content(probably the best on internet) and you really are doing great service to job aspirants and distributed system enthusiast. I have one question, what would be the database schema for meta data database. I guess we are only storing topic and subscriber information in meta data database and in worst case(Cache miss) if we are calling database, we need to get all subscriber list for a topic really fast for sender service. Also in worst case, we need to do request validation(e.g. topic is present ) fast. What do you think should be the database schema for meta data service and also in what format we are going to store data in caching layer.

  • @funnyhjk
    @funnyhjk 3 роки тому +6

    Dude your videos are so good! Best systems design videos I've seen, love how you've especially tailored it for interviews. Thank you!

  • @dmxrahul
    @dmxrahul 4 роки тому +5

    Damn bro. This was dope. Thank you for making such a detailed video. Deepest gratitude ever expressed.

  • @kameto8116
    @kameto8116 Рік тому +1

    is frontend service the same as API gateway?

  • @JustEnergyFlow
    @JustEnergyFlow 2 роки тому +2

    Thank you for material. Is Matt Damon programmer?

  • @mangeshshikrodkar6192
    @mangeshshikrodkar6192 5 років тому +4

    Your videos are highly informative. Please create more such videos. They are way better than other youtube videos on system design i have seen. The only thing that I found missing is scalability numbers and estimates like bandwidth calculations, number of servers needed, amount of storage needed and how they can go as system stress goes as per growing userbase. In some videos i noticed top down approach like they start with numbers first and break down the question in some feasible small system. Here I see a bottom up approach where we start with small system and grow it with scale. I believe starting with small system and growing it is better than starting with big system and breaking it down and playing with numbers.

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +7

      Hi Mangesh. Thanks a lot for the feedback!
      Let me add a topic to my TODO list on how to estimate capacity (number of servers, required network bandwidth, storage, etc.) for a distributed system.
      As an interviewee, I find its hard to start with numbers, unless problem domain is well-known. Without understanding the API (what data we send into the system and how data is retrieved) and at least high-level design (where and how we store this data), high chances that estimated numbers will be far off.
      And as an interviewer, I would also recommend to postpone numbers till the end of the interview. Or provide them during the interview, if requested. In many cases the problem domain is not known to us upfront and we need to start with something simple (similar to a brute force solution in a coding interview). What is really important for me as an interviewer, is that a candidate is able to identify units of scalability. E.g. for the notification service such units are: number of publishers, average number of topics per publisher, average number of messages per topic, average number of subscribers per topic, amount of time we store messages in the system (retention period), etc. And as long as we can discuss scalability issues on a t-shirt size (small, medium, high) level, it is usually enough to evaluate candidate's ability to think at scale. Numbers are crucial for real designs, though.

    • @SP-yf1ib
      @SP-yf1ib 6 місяців тому

      @@SystemDesignInterview I totally agree with you that it is hard to start with numbers. We should've some idea about the numbers to come up with all the design considerations, but it doesn't sound like a great idea to just crunch all the numbers and capacity estimation in the beginning like how many hosts are required, before even getting to the details of the design. So in my mind, we should've some idea about the scale we need to support, but we need not get into the estimate capacity initially.
      But unfortunately all other random youtube video channels started following this idea of capacity estimations and now it is becoming a norm in such a way that some inexperienced interviewer also think that it is normal to expect that and insist on it.

  • @manojgavireddy8001
    @manojgavireddy8001 3 роки тому +2

    Hi, Thanks for the amazing content you've been producing. I have a question in one scenario though.
    Consider the Task executor in the sender service has to executes tasks(send notifications) to 100 users and one of the threads failed to send notification to a particular subscriber (say subscriber no.33) and every other thread was successfully able to send notification to its corresponding subscriber. Now if we want to retry sending the notification to the failed subscriber(no.33), how do we do it?. We can just simply put the msg back in the temporary storage right, because the next time the msg gets picked by some other sender host..it again sends notification to all the 100 subscribers. So every subscriber apart from no 33 would receive the notification twice. How can we handle the case where only subscriber no 33 should get the retried notification but not others?
    Am I missing something here?

    • @vinitsacheti
      @vinitsacheti 3 роки тому +2

      the failed message can go to separate Temporary storage, typically called Dead Letter Queue, in case queue is used, with more metadata about the failure, and the task creator can create just one retry task for the failed subscriber

  • @durgaprasana5531
    @durgaprasana5531 4 роки тому +3

    Nice explanation. 👍👏
    Often interviewers are looking for the entity models for the stored data. i.e. metadata for a subscription in this case.
    It would greatly help to include those aspects as well.

  • @zxyviopond
    @zxyviopond 4 роки тому +3

    Very clear and concise presentation for a complex system. For the winner of Temporary storage I choose streaming solution like kafka, wanna know your lucky winners as well. Also I wonder whether the MessageRetriever's implementation actually varies based on the choice on temporary storage.

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Agree with you, Xiaoyun. Here is my take on this:
      ua-cam.com/video/bBTPZ9NdSk8/v-deo.html&lc=UgwmaW3Ek0-XnkXb8KB4AaABAg.8tJRBPf4mun8tKuzz6sHVs
      As for MessageRetriever's implementation, you are right, it depends on temporary storage we use and some other factors. E.g. whether order of messages is important. If it does, we better stick to a single-threaded retriever. If not, we can use multi-threaded message retriever.

    • @arunbit
      @arunbit Рік тому

      @@SystemDesignInterview I was curious too - but, that link is taking back to this same video

  • @arpijoy
    @arpijoy 5 років тому +6

    In the temporary storage, I am thinking Apache Kafka might be a good fit, since it can also handle streaming data

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +8

      Good call! I also favor message queue and stream processing platform options. With already built-in mechanisms that many such systems provide, we may filter messages and apply transformations on top of it. And Kafka is a popular choice. One of the examples: www.confluent.io/blog/real-time-financial-alerts-rabobank-apache-kafkas-streams-api/

    • @shalinmehta19
      @shalinmehta19 4 роки тому

      @@SystemDesignInterview Isn't this whole system developing kind of Kafka system?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Hi Shalin. Please take a look at this comment and the whole thread: ua-cam.com/video/bBTPZ9NdSk8/v-deo.html&lc=UgxoaZ_vr1TFpHn6ynx4AaABAg.90FYh-PbeGT91FmzoWHuf-

    • @valintepes
      @valintepes 4 роки тому

      @@SystemDesignInterview First, ask many have mentioned, these are incredibly good quality videos. Thanks for the significant effort that went into these. I have a follow up question. Is message queue the favored solution in this case because of the built-in mechanisms and popularity, therefore ease of obtaining information and support, or are there other wins such as performance, complexity or cost?
      Unrelated second followup question: regarding ordering. In a real-world scenario, if I'm already in the AWS environment, is there any justification for implementing my own solution for FIFO queue when i can just pay more (apprently 20% more) for SQS FIFO? I have to consider additional resource cost and operational cost, and I have not crunched the numbers. But I was wondering if there's some other limitations or pitfalls of SQS. I think you may have mentioned something in passing in one of your videos but can't remember which.

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +2

      Hi Miao,
      One of the main benefits of a queue (e.g. message brokers, Kafka) is ordering support (at least on the partition level). Ordering is not always required for notifications, but is preferable typically.
      Depending on the volume of messages, queue solution may be substantially cheaper than e.g. database. With the growing number of messages, queue solution helps to save on cost. Queue APIs usually allow to batch messages on both producer and consumer side, helping to save on number of calls, and as a result, the total cost.
      Also, building notification service on top of a queue of some sort is a widely used pattern, I would say. For example message brokers (e.g. RabbitMQ), implement Pub/Sub mechanism by reading messages from a queue and pushing them to many consumers over TCP connections.

  • @xlin7868
    @xlin7868 5 років тому +4

    Thank you for sharing! This is the most helpful material I have seen online... A question: how to scale senders? I understand you mention pool of threads, but that is for a single machine. Sorry I missed your instruction if you mentioned it for scaling out. A simple solution: use consistent hashing to assign the task to a ring of instants/machines, just like distributed cahce. Will this be a good solution? Thanks!

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +6

      Hi Hullo,
      Thank you for the question. Sender service is scaled both vertically (by having more threads in the pool of a single machine) and horizontally (by adding more Sender instances/machines).
      Consistent hashing idea you mentioned will work as well. But we do not actually need consistent hashing. A simple random hashing will work. Consistent hashing is usually used to make sure the same machine is chosen for the same key (message). In our case we can just pick a random Sender machine for processing the message. Random hashing is simpler and less prone to "hot" Sender issue.
      Feel free to ask any follow up questions. Will be glad to clarify it further.

    • @tejvepa8521
      @tejvepa8521 4 роки тому

      System Design Interview Can these multiple sender instances be picking tasks out of a distributed mq like sqs?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Hi Tej. Why not? AWS SQS is one of the options for the Temporary Storage. We can use other message brokers as well.

    • @theboo5857
      @theboo5857 2 роки тому

      @@SystemDesignInterview Firstly, thanks a lot for the videos. Its less than an hour, but when I go over it, it has so much material packed into it. Amazing. I have one question about horizontal scaling senders. All the threads from all the senders will now fetch msgs for sending from a certain tmp storage. How could we sync between the senders? With in one sender, we can use locks, and indicate which msg has been taken for delivery. Then other threads would not process this msg. But I am not sure how we can prevent 2 threads on different senders from processing same msg simultaneously.

  • @povdata
    @povdata 11 місяців тому +1

    I did not get, are all components within one server? Fox exmaple load balancer is Nginx as variant?

  • @rahulsharma5030
    @rahulsharma5030 3 роки тому +1

    This is awesome. I have some doubts at 19:29, there seems to be redundant components, When message retriever thread has got the message, then why to create tasks , it can directly send to the http,email etc microservices? What are we achieving by putting them in task creater and then running threads again in task executor which eventually calls other micro service. Seems over complicated.. One message retrieval thread will take message, and then it just send to the http or required endpoint.

  • @HarshaVardhan-jf9sd
    @HarshaVardhan-jf9sd 5 років тому +4

    very apt...keepup the good work..content always speaks

  • @AliyaMussina
    @AliyaMussina 5 років тому +4

    If the inter viewer wants a strict order of delivery guarantee. How can we ensure that?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +11

      Great question!
      First of all we should clarify what order really means. There may be two options: publisher may have a custom order defined (e.g. each notification comes with a timestamp or some sequence number) or order is defined by notification service (e.g. the order of requests, first come - first served, FIFO). The former option can be treated as a special case of the latter option. Or in other words, if publisher has some special order defined, it should be publishing requests in this order. And if our notification service supports FIFO, requests will be processed exactly in this order.
      How to implement FIFO support for the notification service? When requests from a publisher arrive, the notification service needs to store messages in order in the Temporary Storage. What storage types support ordering? For example Apache Kafka. Kafka only guarantees ordering within a single partition. So, if we store all requests coming from this publisher in a single partition, ordering is guaranteed.
      Now we need to read messages from Kafka one by one and send to subscribers. To preserve order of messages, Sender service should be a single-threaded component. One instance of Sender per Kafka partition. Sender simply serves as a Kafka consumer. It reads messages in a single thread one by one and tries to send them.
      The tricky part is to understand what to do with failed messages. If delivery of the message fails, we need to decide how to handle such use case. We can skip this message (send this message to a dead-letter queue to process it later) and move to the next message in the partition. In this case we may break the order here, because failed message may be re-delivered later in the wrong order.
      In general, it is hard to achieve ordering when we have multi-threaded publisher and multi-threaded sender. I should have a separate video on this, to explain it visually. It is easier to explain using specific examples. So, to achieve ordering we usually deal with single-threaded publishers and consumers. But this decreases throughput of the system. Tradeoffs, as usual.

    • @Labandusette
      @Labandusette 4 роки тому +1

      @@SystemDesignInterview good analysis man, as usual

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +1

      Thank you, Labandusette!

  • @umber3117
    @umber3117 4 роки тому +2

    Excellent explanation!!!! You are teaching us how to think which is a very important part. Most of the youtube channels are just showing big and scary architectures of the companies which are not useful from the interview point of view or from the learning point of view.

  • @amanvidura8267
    @amanvidura8267 4 роки тому +4

    Every second of this video is GOLD!

  • @jeremyshi4082
    @jeremyshi4082 3 роки тому +1

    The approach used in this video seems a synchronous push for the notification. Does it scale well if the publish and subscribe qps are at hundred thousands or even million level?

  • @DheerajKumarBarnwal
    @DheerajKumarBarnwal 4 роки тому +5

    This is mind blowing. I didn't expected this kind of system design tutorial on UA-cam. Your style to explain HLD and then deep dive into each component, Its awesome and hidden gem. Hats off to you. I know it takes lots of time to create such content but please try to upload more videos.

  • @nimash1612
    @nimash1612 2 роки тому +1

    Great Job! very clear and detailed explanation. Please make more videos on other system design problems and topics.

  • @lihaopeng768
    @lihaopeng768 4 роки тому +2

    I watched these six videos several times. They are really good. Will you consider a video about monitoring system?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +1

      Thank you for the feedback, Li Haopeng! I have this topic in my short list. But cannot promise specific dates.

  • @dkseo1992
    @dkseo1992 5 років тому +5

    Thanks for the detailed explanation.

  • @sergeyprytkov6850
    @sergeyprytkov6850 2 роки тому +1

    I'm glad that FPS Russia has found a new passion in systems design.

  • @ДимаМоргунов-ю2ч
    @ДимаМоргунов-ю2ч 2 роки тому +1

    Брат ты лучший!!!

  • @gajapathy5209
    @gajapathy5209 3 роки тому +2

    Great content! One question I had is currently frontend server pushes message to temporary storage and sender component retrieves message from temporary component. Here, we may need to keep watching the temporary storage to start the sender process. Instead we can take a hybrid approach. 1. All the publish() calls will be transmitted to sender component after persisting data in temporary storage. Now sender system acts on the task .
    2. In the case of message failures, we will write that into a re-try queue and a re-try component which handles the request

    • @ignashi7plays401
      @ignashi7plays401 2 роки тому

      " we may need to keep watching the temporary storage to start the sender process".
      But isn't the Sender service(using threads pool) that handles if it wants to read the data? at 15:25.

  • @deshengli
    @deshengli 4 роки тому +3

    Please Please keep doing this bro! This is the best system design tutorial I've ever watched.

  • @IbnIbrahem
    @IbnIbrahem 4 роки тому +7

    I love the accent xD

  • @nilanjansarkar100
    @nilanjansarkar100 3 роки тому +2

    I am so lucky I stumbled upon this channel. Amazing work, please keep em coming :)

  • @dbenbasa
    @dbenbasa 4 роки тому +1

    Which DB we should we use as our Metadata DB? We talked about the MD service being a dist. cache, but what about the DB?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +2

      Number of writes to the database is relatively small, as writes happen when new topics are created or subscribers subscribe. There are many reads in the system, but only a fraction of those reads actually hit the database. As most of the reads are served by the Metadata Service (cache). So, both SQL and NoSQL can be used. If to talk about specific names, here is the possible list of options: DynamoDB, Cassandra, MySQL, PostgreSQL, Aurora. Personally I would favor NoSQL options for this use case (DynamoDB, Cassandra).

  • @jacklemon1460
    @jacklemon1460 2 роки тому +1

    Thanks for the great content! I still have 2 questions that I couldn't figure out:
    1. What happens when a message to a certain topic is delivered to some subscribers of that topic and for some it fails? How do we keep track to which subscribers a specific message was not yet delivered in order to deliver this message just to them later? do we somehow write this information to the temporary storage/metadata storage or maybe the task responsible for the delivery of each message stays alive and retries sending the message for a few days (sounds unreasonable) until successful?
    2. When receiving a publish request why does the FrontEnd need to fetch metadata about a topic before storing the message in the temporary storage given that the Sender service anyways fetches this metadata again? Is it done just for validating that the publish request is valid (e.g., topic actually exists)?
    Thanks :)

    • @DakshVerma
      @DakshVerma 2 роки тому +1

      1) We can have a Audit Service to store the sent status. We can use Elasticsearch stack possibly (ELK) with date based index patterns. Because through this we can easily create visualizations,dashboards and do rich queries on the audit logs. Depending on the importance of the notification same can be implemented asynchronously or in the same send step.
      2) There can be some business logic that need to be validated, Depending on the use cases this might not be required. An example of business logic can be you can't send more than configured notification in a minute or so per topic. This might need some interaction with metadata service.

  • @harshharwani
    @harshharwani 3 роки тому +1

    How about having a queue in between the temporary storage and the sender service. In this way we do a fan-out approach where multiple sender instances poll the queue for messages and send it to the appropriate micro-service. The queue layer can be a distributed queue based on topics. This will increase the throughput of our service and also will be more fault tolerant as we would decouple temporary storage and sender components. Inside sender service as you mentioned we can still have the concept of task creation and execution to further increase throughput. Thoughts?

    • @gxbambu
      @gxbambu 3 роки тому +1

      yes I have the same question as you, look at my comments above. However, I don' t think a msg que is necessary here, cuz in principal, the temporary storage is THE msg que. You can impl the storage with Kafka. You can also impl it with NoSQL but just use it as a msg que. IMO, msg que is for heavy write. Here for b/w temp store and senders, we don't really need heavy write. we can use heavy reads.
      However, I do agree with you that, there should be and support multi senders instances.

  • @xinlongzhang1187
    @xinlongzhang1187 4 роки тому +1

    Thanks for the nice video. Have a question: how does message retriever get message from temporary storage? If temporary storage is a database, I wonder how we get data from this dB? Just randomly select some non-processed message from it?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Hi Xinlong,
      Let me answer your question by providing references to some of my other answers. Please take a look at those and let me know if you have more questions:
      1. ua-cam.com/video/bBTPZ9NdSk8/v-deo.html&lc=UgwBRCTc8YW3n7iWS3t4AaABAg.959L9KgaBem95Hj6adrwEL
      2. ua-cam.com/video/bBTPZ9NdSk8/v-deo.html&lc=UgzVKkaCFS4pr4H4ywN4AaABAg.95K4TxO_5zi96Hhv5bCGbL

  • @athanasiosterzakis128
    @athanasiosterzakis128 4 роки тому +2

    Very helpful content, not just for interview preparation but for a deeper understanding of the concepts. Truly enjoyed all your series. I actually interviewed with one of the FAANG companies and got similar system to design.

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Thank you for the feedback, Athanasios! I hope your interview went well!

  • @venkatthota8634
    @venkatthota8634 3 роки тому +1

    Great Content and and awesome explanation. keep doing more videos bro

  • @deathbombs
    @deathbombs 3 роки тому

    I don't think people realize how much of the functional requirements tie in with knowledge of TOPIC BASED Pub-Sub. Quite in depth for something that's not explained.
    I find part at 2:34 under explained.

  • @gyhuj1235
    @gyhuj1235 2 роки тому +1

    The videos on this channel are one of the best system design videos. I wonder why you stopped doing them?
    Would there be more videos coming any time soon?

  • @saikatkar2524
    @saikatkar2524 4 роки тому +1

    Fantastic explanation.
    I just have one question. What is the usage of calling the metadata service from frontend service if we are not using the data to send it to the downstream services and we are anyway calling the metadata service from sender service.

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Hi Saikat. Thank you very much for the feedback!
      Please take a look at my response in this thread: ua-cam.com/video/bBTPZ9NdSk8/v-deo.html&lc=UgyCIBZD38Zvib7Gf-Z4AaABAg.8zF3OPIIMqv8zR54tNr8mP

  • @derrick1152
    @derrick1152 2 роки тому +1

    I need the dislike stats to make sure no one disliked this

  • @shiweist
    @shiweist 2 місяці тому

    If you're using a message queue like Kafka or Amazon SQS, they already handle metadata storage. Then we only need to implement the consumers (i.e. the senders).

  • @alexl2512
    @alexl2512 Рік тому

    2:26 Functional
    - publish(topicName, message)
    - subscribe(topicName, endpoint)
    Why only subscribe need the parameter endpoint? There are some difference between publisher and subscriber communicating with the service?
    Can someone help me figure it out?

  • @junminstorage
    @junminstorage 4 роки тому +1

    Thumbs up here!
    I implemented a similar but more complicated system (with additional retry queue, message filtering or transformation, complicated UI with search capabilities) like you described in this video, it is a production platform service used by engineering teams in the company. Certainly you are a seasoned engineer and have hands-on some of the technologies you are talking about in this video.

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Thank you for the feedback, junminstorage!
      You totally deserve my praise for implementing a real production system like this. There are many small details that are tough to do right. Well done!

  • @amitdubey9201
    @amitdubey9201 2 роки тому +1

    After watching several courses and reading several blogs yours is truly top class and one should watch any of your video more than once to consume it thoroughly.

  • @sherazdotnet
    @sherazdotnet 9 місяців тому

    I have taken your system design course on Leetcode is hands down the best course (at least for me) in System Design. One thing I noticed about all of your contents is that you don't talk about the "Back of the Envelop" calculation. I personally think that its not necessary but would like to hear your opinion?

  • @shreyamduttagupta7527
    @shreyamduttagupta7527 4 місяці тому

    I am getting started with System design, can someone explain what is frontend service after the load balancer? In a usual web development context, we use the frontend term for clients but I am assuming it means something different here? I love how detailed these videos are but struggling to understand this one and Distributed Message Queue because of this frontend service. Any help is appreciated.

  • @NoName-oh9fh
    @NoName-oh9fh Рік тому +1

    Ох уж этот русский акцент

  • @rontman
    @rontman 3 роки тому +1

    Still waiting on your monitoring video follow-up! Really good content.

  • @abcdef-fo1tf
    @abcdef-fo1tf Рік тому

    I'm also confused as to what's being stored in the metadata service, why can't each item in the temporary storage just be a single message and subscriber? Then we don't have to parse subscribers.
    How does delivering all messages in parallel help with bad subscribers? can't the list of subscribers be huge, so no sender has enough threads? I'm not sure why we need to enough threads in the first place. Since our sender is sending to multiple subscribers, if it suddenly breaks, we won't know which ones need to be re-sent right?

  • @yushutong722
    @yushutong722 2 роки тому

    I'm not sure how is this one differ from designing a message queue (e.g. Kafka), aren't the functional requirements (create topic, publish, subscribe) the same as what Kafka does?

  • @abcdef-fo1tf
    @abcdef-fo1tf Рік тому

    I'm a little confused why it's talking to frontend service, shouldn't it be backend since it's making a call to send a notification?

  • @shobhitarya1637
    @shobhitarya1637 4 роки тому +1

    Best System Design channel. Very helpful. Please keep posting learning videos.

  • @sankalpbose3451
    @sankalpbose3451 5 років тому +1

    Great video Mikhail! One comment though, there is no component tracking successful processing of a notification. Any of the sub-components of the sender may crash and would take down the list of to-be-executed and waiting-for-retry tasks. This would break at least once delivery guarantee.

    • @man2k1985
      @man2k1985 4 роки тому

      I also thought the same

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +1

      Hi Sankalp Bose. Thank you for the feedback!
      You are correct. And I have mentioned in the video that the best guarantee the system gives us is "at-least-once". When Senders retrieve messages and send them out, they need to acknowledge back to the Temporary Storage upon successful delivery. If this acknowledgement does not happen for any reason (e.g. Sender machine sent a message and crashed right after that), message will be retried. Which may cause duplicates on the Subscriber's end. That is why it is important to de-duplicate messages on the Subscriber's end, if duplicates must be avoided.

  • @AndhraKitchenFoods
    @AndhraKitchenFoods 3 роки тому +1

    Awesome system design tutorial.

  • @quantumlexa
    @quantumlexa 4 роки тому +1

    Thanks a lot again for such a great channel. That's awesome. I'm in the middle of system design interview preparation and your channel helps me a lot.
    few comments/question
    1. as far as I understood, notification event that is handled by sender service gets endpoint information from metadata service. Let's consider a broadcast event (for example: sms storm warning, email from a system to all users about a new policy, etc) that has to be delivered to a number of endpoints > 100 000 or even more. If it is a 1 single event that is processed by 1 single node even in multi threaded environment, it would take way too long.
    I'd probably generate those events that contain both - event and endpoint beforehand and speedup actual delivery. In even could not event itself, but just pair of two id's. This idea is based on assumption, that delivery itself is a more time-consuming operation than enrichment of an event with an endpoint information.
    2. Regarding a temporary storage, you asked a very good question. The best that comes into my mind is probably a hybrid approach, i.e. having some key-value no SQL DB for failed events (that I'd redeliver later on) and Redis cache for everything else.

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Hi quantumlexa. Apologies for the delayed response. And thank you for the feedback!
      Your idea of decoupling event generation and the actual delivery is a good one. Please take a look at this thread, where we discussed resembling ideas: ua-cam.com/video/bBTPZ9NdSk8/v-deo.html&lc=Ugzg_JJd9yUMUX9ySwt4AaABAg
      P.S. Wish you all the luck on your interviews!

  • @RohitSharma-ql3dt
    @RohitSharma-ql3dt 5 років тому +1

    Thank you for this immensely useful System Design video.
    I am thinking on how to make the message delivery ordered. The publisher sends a increasing sequence ID which needs to be preserved when receiving messages from the subscriber.
    In this case,:
    - the Sender service has to keep track of which messages have been acknowledged by the subscriber via an Ack mechanism.
    - the Sender has to query the Metadata Service (which contains all the sequence IDs that are to be received by a Subscriber), and ensure message with ID1 has been acked before sending next message in the sequence ID2, where ID1 < ID2.
    - Any one message delivery issue with stall the delivery of all subsequent messages, so the Sender Monitoring system should inform the Subscriber of this stall via an email or something else.
    Any thoughts on this ?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +4

      Thank you for the feedback, Rohit!
      I like your thought process. The algorithm you described will work. But it may be hard to scale. Let me share with you some thoughts.
      To implement ordering, we need every component of the system to carry some burden. Yes, we can say that publishers are allowed to submit messages in any arbitrary order, specify some ID (sequence number) for each message and leave all the complexity for the notification service. Notification service will then track incoming IDs and hold messages. For example, if message with ID = 1 came and then message with ID = 10 came, notification service needs to hold this message until messages with ID = 2, 3, .., 9 arrive and processed. This may lead to many messages sitting in the notification service waiting for its turn.
      Alternative to this approach is to tell publisher to share responsibility and submit messages in the order they need to be send out. Notification service needs then to preserve this order. And Sender needs to send messages in that order. We can use Apache Kafka as a Temporary Storage, to preserve order (per partition). And Sender becomes a Kafka consumer, that read messages one-by-one in a single thread and sends data out.
      Please check other comments for this video. I have recently replied to another question about ordering. There are some more details there.

  • @sharathchandrareddy8959
    @sharathchandrareddy8959 4 роки тому +1

    Great step by step approach , It would be great if you can provide a link for the final design that viewers can keep it handy as a big picture while learning specifics/details of each sub-component . This recommendation/request applies to all of your system design videos

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Hi Sharath. Thank you for the feedback!
      Can you please clarify how you see it. Do you mean combine all components with their details on a single slide? Or you mean a text blog post version of the video? Or something else?

  • @kanaiyapatel5691
    @kanaiyapatel5691 5 років тому +1

    Probably your design are best and explanations are easy to understand. I encourage you to do more system design videos.

  • @NitinPatel-ld5qd
    @NitinPatel-ld5qd 4 роки тому +2

    Another fantastic addition to my system design favorites! Such a simple and easy to understand design of notification service. Thank you Mikhail and pls keep posting such a great videos :)

  • @152jatin
    @152jatin 4 роки тому +1

    Amazing content !! Probably the best.
    Thanks for sharing your knowledge in such a presentable manner.

  • @xinwang6876
    @xinwang6876 5 років тому +1

    This is really great. The best system design video I found online. I am trying to find some information about : How to design a Google Calendar like system. Would you mind publishing a video on that topic ?

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому

      Thank you for the feedback, Xin! Added your topic to the TODO list. Please do not expect a quick answer though, the list is already quite long ((

  • @deathbombs
    @deathbombs 2 роки тому

    Upon revisiting, I think the requirements at 2:34 are very vague. For example why pub-sub with topics? That's assuming notifications need to be fanned out. For 1:1 messaging that need notifications, maybe individual queues would be better, and a producer/consumer style messaging?

  • @vivekgupta8580
    @vivekgupta8580 4 роки тому +1

    I guess best System Design Interview Resource available. Thanks for creating this channel

  • @silviojr2424
    @silviojr2424 2 роки тому +1

    Your videos are so good! Thank you!

  • @renuyadav6613
    @renuyadav6613 Рік тому

    Why metadata service call in frontend service , what information it is fetching

  • @atabhatti6010
    @atabhatti6010 2 роки тому

    Great video so thank you! However, I'm a little confused. You list the Frontend service responsibilities (~5:24) including SSL termination, authorization, authentication, request dispatching, ... and then (~7:40) describe it as processing the message. Is that right? Initially I initially understood your definition of Frontend Service as an API Gateway. Later I understood it as a Message Processing Service. Are these one and the same thing?

  • @amirhasan5587
    @amirhasan5587 2 роки тому

    Can't appreciate more 🙌 Kudos!
    Pls make videos for 1.Design UA-cam/Netflix , 2. Design Twitter/Facebook, 3. Design Yelp etc..

  • @albertlong8885
    @albertlong8885 4 роки тому +1

    Best System Design videos! I feel Notification system is a lot similar with Message Queue system.
    One question I have is: how messages are retrieved from temporary storage in sender? Do we have a column which indicate if a message is good to send in the temp storage and sender just randomly gets one from the storage every time?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +2

      Hi Albert,
      Apologies for the delayed response. Quite busy these days.
      You are correct about similarities between notification systems and message queues. Some message brokers (e.g. RabbitMQ) support the pub/sub pattern, where message is published to multiple subscribers by the broker server.
      If message queue is used as a Temporary Storage, messages are usually stored in a FIFO order (at least within a single partition). When Sender retrieves a message from the queue, the next message in line is returned. And because Sender uses the Pull mechanism, Temporary Storage does need to track what messages have been retrieved. Temporary Storage just returns the next available message.
      You may be wondering what "the next available" means. Well, there are options. If Sender retrieves messages from Kafka, it needs to pass the offset parameter, index of the next message in the queue. If Sender retrieves from SQS, SQS itself decides what message to return (FIFO, but not guaranteed for standard queues). Sender only needs to ensure that when message is sent out to subscribers, it is deleted in SQS. Otherwise, it can be re-delivered by SQS some time later.
      P.S. Will answer your other questions soon. Thanks for posting them.

    • @albertlong8885
      @albertlong8885 4 роки тому +1

      @@SystemDesignInterview Thanks for the reply. So the temporary storage is not a DB. For each partition, it could be a list in redis? Do you think we should persist it on the disk?

  • @pranavsakulkar
    @pranavsakulkar 2 роки тому +1

    This is simply amazing. Thanks for making these videos. I think you should a write a book or create some paid tutorials. I would definitely pay for those. I am learning a lot through your videos already.

    • @pushpendrasingh1819
      @pushpendrasingh1819 2 роки тому

      he has already a paid course bro.. buy it

    • @pranavsakulkar
      @pranavsakulkar 2 роки тому

      @@pushpendrasingh1819 Thanks for letting me know. Seems to be a new creation. Definitely checking it out.

  • @kunal4350
    @kunal4350 4 роки тому +1

    It is one of the best video i seen till now.Thanks for it. I have one question here hope you can help me on this.Let say this notification has to be send to more than 1 million/billion people as email notification .Probably it will take lot of time for entire people to receive the email notification. Please help me in this case.

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +2

      Hi Kunal. Thank you for the feedback!
      There are several ways to both scale out and scale up delivery of messages. To speed up the process in case of millions/billions of end users. Let me name a few such ideas:
      1. We can further break down the Sender service into multiple web services. Although it becomes a more complex system, the benefit of this approach is that we can create several times more message delivery tasks running in parallel. Please check the following thread for more details: ua-cam.com/video/bBTPZ9NdSk8/v-deo.html&lc=Ugzg_JJd9yUMUX9ySwt4AaABAg
      Things like autoscaling will help a lot to quickly start more instances for emails delivery.
      2. Remember that many distributed systems are regional. Think of it as we have a copy of the same system deployed in various parts of the globe. With such setup, we have a natural partition of data, by geographical region. So, when we need to deliver messages to billions of email addresses, we split this list of emails into regional lists and notification service specific to each region handles the delivery.
      3. We may further partition the list of emails and have a system processing each partition in parallel. E.g. we take emails starting with letters [A-C] and assign to a group of servers to process this list. And so on. There is an architectural pattern called big compute (or high-performance computing) to schedule and monitor many tasks running in parallel.

    • @minalbshah
      @minalbshah 4 роки тому

      Hi kunal, I saw the video and still have some doubts. Can you help me understand.

  • @wondershow123
    @wondershow123 3 роки тому

    If you pick NonSQL as the "Temporary storage", how will your Sender service retrieve all the message candidates (giving you need to filter out all succeeded messages) and need to retrieve the most recent messages. How will you design your NoSQL (like what is sharding key what is sort key) to support both scalability and the sender service's query model?

  • @mdfarooq7145
    @mdfarooq7145 5 років тому +4

    Awesome tutorial

  • @saranyaks6436
    @saranyaks6436 4 роки тому +1

    Great video! Keep up the good work! :)
    Pros for using SQS: data size is small, no strict requirement around ordering for this problem, reliability. On the other hand, that raises several questions - when do we exactly erase the message from the queue. Only when all the tasks have picked up the message? What happens if one of the tasks fail.. etc.
    It might also be useful to discuss about WebSockets for notifying the clients. I'm assuming that's what the tasks would ultimately do?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +3

      Thank you for the feedback, Saranya!
      Agree with you. I should have extended the video to cover details of pushing messages to end clients (subscribers). Specifically, talk about HTTP polling (long and short), websockets, server-sent events. Let me leave this topic for a separate discussion.

  • @wondershow123
    @wondershow123 3 роки тому

    In the "Temporary storage", do you need to differentiate some kind of messages? 1. Pending message (notifications waiting to be sent) 2. Attempted (being tried but not success) 3. Success 4. Failed?

  • @gitarowydominik
    @gitarowydominik 5 місяців тому

    We're designing a notification service, so a pub/sub service. And Kafka is mentioned as a potential choice for a message queue and stream processing platform.
    But Kafka itself can be used as a pub/sub service, so maybe just use that functionality of Kafka instead? :)

    • @gitarowydominik
      @gitarowydominik 5 місяців тому

      Although Kafka only supports subscribers to pull data, no push support, so something to talk about.

  • @tihon4979
    @tihon4979 Рік тому

    Сач э пёрфикт пренонсиэйшн! Айм со эксайтид! Итс эмэйзинг!!!

  • @GaneshManika
    @GaneshManika 3 роки тому +1

    Awesome! every component, every process, and every possibilities are explained very well. Thank you!

  • @racecondition3176
    @racecondition3176 3 роки тому

    Why would you need so many metadata services? Postgres with indexes should be able to handle millions of such metadata values... Or is the expected number of topics above billions?

  • @meow-mi333
    @meow-mi333 2 роки тому

    Thanks for the video. How’s the sender scalable when it only create tasks if the thread pool has enough thread. What happens if the number of subscribers is way higher than the number of threads in pool. Feel like some sort of sharding is required.

  • @jamess5330
    @jamess5330 2 роки тому

    Excellent video! Another super effective way to prepare system design interviews: Do mock interviews with FAANG engineers at Meetapro.

  • @dj.coda.newyork
    @dj.coda.newyork 2 роки тому

    Anyone knows why this channel stopped uploading? I will sign up his course somewhere, is fucking useful and beat every other material in youtube.

  • @ChintanShah22
    @ChintanShah22 4 роки тому +1

    Thank you so much for the great content, big fan of your channel. I have a question about the order of load balancer and reverse proxy in the design. You have the load balancer in front followed by the reverse proxy which is part of the front end. Since reverse proxy is doing rate limiting, ssl termination,etc shouldn't it come before the load balancer ? Also since there is usually just one reverse proxy for multiple web servers, wouldn't a load balancer feeding just one reverse proxy be redundant ?

    • @bluelamar809
      @bluelamar809 4 роки тому

      So example of this, is haproxy as load balancer(which it is really good at), passing message to nginx as reverse proxy. Nginx is good at layer 7 routing and can do auth checks among other things before passing the messages upstream to 1 or more origin servers(for example based on the url).

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +1

      Hi Chintan. Thank you very much for the feedback! And my apologies for the delayed response.
      You are right, there are many different variants how reverse proxy and load balancer can cooperate together. Plus, they both have a couple of overlapping functions, like TLS termination and load distribution, and this can make things even more confusing. In the video, I mostly wanted to demonstrate one particular aspect - TLS termination on the host.
      There are several ways how TLS termination can be done. It can be done on the load balancer level or a reverse proxy level, when both these components are standalone components sitting in front of the service. One of the problems with this approach, traffic goes unencrypted from the load balancer or reverse proxy to the actual service machine. One of the ways to solve this, is to have a proxy running on the machine together with the service. Traffic comes encrypted all the way to the service machine. And is decrypted on the machine.
      This requires some good diagrams to demonstrate a data flow in a simple manner. I should probably create a video to cover these details. Thank you for the question!

  • @vichetlay1646
    @vichetlay1646 Рік тому

    I just found your channel in 2022. You have made a lot of good system design videos. It’s sad that no more new video I hope that you can make more quality videos.

  • @ankitrana3176
    @ankitrana3176 3 роки тому

    How we can deal with duplicate read problems in Horizontal scale Sender Service in the case of Key-Value Database as Temporary storage?