System Design Interview - Rate Limiting (local and distributed)

Поділитися
Вставка
  • Опубліковано 30 вер 2024
  • Please check out my other video courses here: www.systemdesi...
    Topics mentioned in the video:
    - Token bucket algorithm.
    - Object-oriented design of the rate limiting solution.
    - Load balancer max connections, auto-scaling.
    - Message broadcasting: full mesh network topology, gossip communication, distributed cache, coordination service.
    - Communication protocols: TCP, UDP.
    - Embedded rate limiter vs daemon process.
    - Bucket management, synchronization.
    Inspired by the following interview questions:
    Amazon (www.careercup....)
    Google (www.careercup...., www.careercup....)
    Uber (leetcode.com/d...)

КОМЕНТАРІ • 426

  • @DavidSuarez09
    @DavidSuarez09 5 років тому +254

    This is an extremely thorough answer with interviewer gotcha's and system tradeoffs. Exactly the type of videos needed for us engineers. Thank you so much! Keep up the excellent work! Subscribed!

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +21

      Appreciate your feedback, David! Words like these inspire further endeavor.

  • @roshrajthoppilan
    @roshrajthoppilan 2 роки тому +45

    What happened to this channel ? This is the best content out there for System Design Interviews , why did they stop making videos ?

    • @VS-SEA
      @VS-SEA 5 місяців тому

      paid version on his website.

    • @BobbyDigMOB
      @BobbyDigMOB 3 місяці тому

      His content is the system design crash course on Leetcode

  • @secondsandthings
    @secondsandthings 5 років тому +72

    This is the best video style I've seen on System Design interviews because it's also educational and covers lots of useful concepts for engineers.

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +3

      Thank you for the feedback, @secondsandthings! Glad you liked the content.

  • @hangnm81
    @hangnm81 5 років тому +103

    Just landed my SD2 job at Amazon. Thank you so much. Your channel is really helpful

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +13

      Congratulations, Hannah! Really glad for you!

    • @nammi895
      @nammi895 6 місяців тому +1

      Still working at amazon, promoted to SDE 3 or not ?

  • @sandeshkobal
    @sandeshkobal 3 роки тому +9

    27:35 What is your moms favorite flower LMOA

  • @fokkor
    @fokkor 5 років тому +38

    Probably one of the best videos in system design! You're an awesome instructor! I like the the presentation through animation and thoroughness of your videos. This is how in real world interviews are conducted.

  • @mostinho7
    @mostinho7 Рік тому +11

    4:00 why a rate limiter needs to be designed as a distributed system and not just for a single host
    11:00 algorithm for deciding wether to accept or reject req, token bucket algorithm keeps a bucket of tokens for each unique client, and has a re-fill rate. When client makes a req we check if it has tokens remaining, if not reject
    14:50 classes and interfaces oop
    17:00 rate limiter across different hosts in a cluster
    18:30 how token buckets communicate each just says how much they’ve used so far and the others can sum and make decision

  • @pranav-dave
    @pranav-dave 3 роки тому +3

    How do 2 more tokens added at 500 Millisecond? at 14:30

  • @123vat69
    @123vat69 4 роки тому +6

    Regarding the bucket filling example ,14:32-14:37, at T2 do we refill the bucket by 5 or 2? Refill rate is 10/s and at T2 it has passed half of second so the refill rate should be 5,correct me if it’s wrong.Also forgot the main point - Thanks for sharing this wealth of knowledge

    • @irinakamalova4412
      @irinakamalova4412 2 роки тому

      No, it's only 200 ms is passed that's why only 2 tokens were added

    • @EricStratton79
      @EricStratton79 2 роки тому +1

      The way he coded the refill() method, even if no tokens were added, it sets lastRefillTime to "now". We did not add any tokens at t1 because the bucket was already full but it still set lastRefillTime to "now". Meaning at t2 the amount of time elapsed is calculated from t1, not t0 (so 200ms).

  • @JR-zz8bw
    @JR-zz8bw 2 роки тому +2

    I agree with others that using a gossip based system here seems pretty odd. Not only does it not scale well once number of counters increases, but it also does a poor job of actually enforcing the rate limits as the number of nodes grow (lets say you have 100 nodes in your rate limit cluster, you can potentially allow 100 times the max request amount which clearly violates requirements). I think if you want to do this approach, it would be better to shard on clientId and have each node manage a partition of clients. I guess the approach presented here is designed to minimize read latency at these quite big drawbacks.

    • @shuaiwhu
      @shuaiwhu 2 роки тому +1

      This kind of comments should be put on top instead of the complimentary ones.

  • @Hangar1318
    @Hangar1318 2 роки тому +4

    Why did you stop making these videos?

  • @waiyonglow7354
    @waiyonglow7354 4 роки тому +10

    After looking through multiple resources to refresh my memory on distributed system design, your channel contains hands down the most thorough and educational content - easily rivaling university level courses I've taken on similar topics. I've never written a UA-cam comment in my life, but I felt compelled to do so after watching your videos. Thank you for your excellent work.

  • @GunjanSharma-iw6ou
    @GunjanSharma-iw6ou 2 роки тому +11

    This is the most detailed, thorough and informative video I have seen so far in system design series across all the channels. Glad I came across it. Please keep making videos. Subscribed!

  • @omkarkulkarni9202
    @omkarkulkarni9202 3 роки тому +2

    Where are you!!!!! I am desperately waiting for your next paid or unpaid videos? Do let me know your patreon account! This is a masterpiece.

  • @curiossoul
    @curiossoul 2 роки тому +9

    My Lord, I am spellbound by the breadth and depth of various aspects of a design. I just completed reading same design in a very popular design book and it was nothing compared to this. Truly loved it
    Can't believe such precious stuff is free when it can be 5 star content on udemy. Big thanks and looking forward to more videos 🙏

  • @mostafasaeedhashhash8718
    @mostafasaeedhashhash8718 4 роки тому +6

    will you continue making videos and Greatness :) ?

  • @prakritidevverma4315
    @prakritidevverma4315 3 роки тому +19

    Oh my God, you are so good at this. I don't think anyone on UA-cam is providing this much of quality content.

  • @RR8401674
    @RR8401674 5 років тому +6

    Very nice. Please upload more videos. You put together this very nice and I appreciate your time.

  • @iitgupta2010
    @iitgupta2010 5 років тому +11

    what a master piece man .. awesome .. really loved it.

  • @integralCalculus
    @integralCalculus 5 років тому +10

    Truly amazing amount of details covered. Very coherent and organized content. Keep up the excellent work!

  • @abcd12wer
    @abcd12wer 4 роки тому +8

    Thanks for the amazing video. Content is top notch. Great explanation from the host and kudos to the animations as well. Got a follow up question:
    What is the advantage of letting the nodes decide on the token count through message passing (through broadcast or gossip or leader selection -- using coordination service / random leader election) over using a distributed cache ? As the nodes in the cluster might be distributed across data centers in different regions, wouldn't it take more messages (n/w calls using TCP/UDP) and latency to find the result ? Wouldn't using a distributed cache (redis/memcache with built-in fault tolerance) easy to setup/maintain and performance ?
    Could you explain more on this ?
    Also, what would be the latency expectation of these approaches to make a decision on whether to allow a request or not?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +8

      Hi lv. Thank you for the feedback. And insightful questions.
      The main advantage is the simplified maintenance. As we do not need a separate cluster for the distributed cache. Let's say we have a library that does all these communication between nodes. And we have a service that needs to support throttling. We simply use the library in the service code, deploy it together with the service and have throttling functionality with no (or minimal) maintenance effort.
      If our service is deployed across several data centers, you are right, latencies increase. Though, some of my arguments in favor of broadcasting approach is that communication over UDP is fast and even higher latencies will be there for the distributed cache (in assumption cache is also deployed across several data centers).
      I cannot provide good numbers for latency expectations right away, more research/testing/calculations needed. But can tell, that this is the size of the service cluster that matters the most. And latencies depends on it. If service cluster is large (thousands of nodes), broadcasting becomes a poor option. And distributed cache clearly wins. For smaller clusters, I would prefer broadcasting. To save on the distributed cache maintenance cost.
      Please let me know if more details are needed. I will try to provide more information.

    • @abcd12wer
      @abcd12wer 4 роки тому

      @@SystemDesignInterview Thanks for the reply and explanation ! Maintenance cost makes sense. One less moving part to worry about in the system.
      Do you know of any generic census tool/library that we can use right off the shelf (or with minimal changes) for a use case like this?

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому

      Hi lv. I am referencing some solutions here: ua-cam.com/video/FU4WlwfS3G0/v-deo.html&lc=UgyNKUdfSgnOdIy80XJ4AaABAg.8vowCPev_808vx3HDj5hXE

  • @vancouverbill
    @vancouverbill 2 роки тому +5

    absolutely incredible content. in the section on message broadcasting you mentioned that for gossip protocol the nodes randomly choose a neighbor like a disease and this propogates, in this case if a new node is added and the IP is not known to neighbors how can they randomly choose a neighbor and reach that one?
    looking forward to doing your course

  • @zhongyanyuan8421
    @zhongyanyuan8421 4 роки тому +6

    I'm working on some rate limiting stuff now. Although my task is no so complicated as your video topics, it really helped me build up basic concepts. Thanks!

  • @yuenyiupang
    @yuenyiupang 2 роки тому +1

    i was thinking why not just use a redis cluster to resolve the issue, instead implement gossip protocol to sync the rate across all machine, i mean this is good to understand things deep in side , but in practical world, does someone really not using distributed cache to handle rate limit issue? i doubt. but again i love the way that we drill down that detail to understand the foundation

  • @ShubhamJain-ch7et
    @ShubhamJain-ch7et 4 роки тому +5

    This is really thorough and truly mirrors the way system at scale are architected in companies. The best set of videos on system design out there. Please continue making more of these :). Mom's favorite flower part was really funny. I can totally relate to this. Selling your platforms to different teams in your company is already super hard, never complicate it further by telling them to add a service client :D

  • @puneetjain5021
    @puneetjain5021 2 роки тому +2

    Please add more videos. These are very helpful.

  • @MrSatishkumarchakka
    @MrSatishkumarchakka 5 років тому +5

    This is the best series on system design I found on youtube. Excellent presentation. Really appreciate the genuine intentions to share your knowledge. Keep up the excellent work.

  • @vishnusingh4121
    @vishnusingh4121 3 роки тому +3

    By faaaaar the BEST VIDEO and EXPLANATION in this topic! He even has code implementations!! That was super helpful. Thanks a million!

  • @SuperHappykumar
    @SuperHappykumar 5 років тому +2

    Have one doubt regarding token bucket algorithm. In refill() method why did you use "1e9" to calculate tokensToAdd

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +3

      Hi Apekshit,
      1e9 = 1 billion - number of nanoseconds in 1 second. We calculate and keep time in nanoseconds. We first calculate how many nanoseconds have elapsed from the last call of the refill() method. By dividing on 1e9 we convert this number to seconds (or fraction of a second).

    • @SuperHappykumar
      @SuperHappykumar 5 років тому

      @@SystemDesignInterview Thank you

  • @user-uskxnfiw729
    @user-uskxnfiw729 5 років тому +5

    Very informative and educational! Exceptional quality as well. Thank you. I subscribed this channel with the hope you will produce more technical videos like this.

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +1

      Thanks a lot for the feedback, tillnothingleft! Working on a new video right now.

  • @zhiguoqin5930
    @zhiguoqin5930 5 років тому +8

    Pretty high-quality video, thanks!!!

  • @drakezen
    @drakezen 5 років тому +5

    Great videos. I will be looking forward to more updates

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +1

      Thanks, appreciate the feedback! The next video (Distributed Cache Design) will come out by the end of this week.

  • @wayleechu3866
    @wayleechu3866 3 роки тому +3

    Thank you for great video. One question: Why cannot we shard hosts by userid, and the Load Balancer route requests to the hosts by userid? So that we don't have to sync the status of each hosts. Thanks

    • @johelcarvalho9747
      @johelcarvalho9747 3 роки тому +2

      Hi, Waylee, when we talk about sharding our hosts by userId it needs to have a strong business and scalability reason for this. Doing otherwise would inccur in the bed of procrustes' problem Taleb tell us about, which is cutting our legs to fit in bed instead of buying a larger one. Apart from that, we can not make a general solution by relying on sharding, because not every service would need to be sharded or be sharded by user. (I hope I got it right, I'm just learning like you)
      The question I ask is, why don't we bring this rate limiter closer to let's say our api gateway, assuming we have one, and work in a more centralized fashion?

    • @wayleechu3866
      @wayleechu3866 3 роки тому

      @@johelcarvalho9747 Thanks for your answer. It makes sense to me. For your question, you are suggesting we put rate limiter before LB? I think the reason we won't centralize rate limiter is better fault-tolerance. The rate limiter failed in a single host is much better than the failure in the centralized one.

    • @johelcarvalho9747
      @johelcarvalho9747 3 роки тому

      @@wayleechu3866 , I'm not sure this would be an issue, api gateways can have many isntances behind a LB. I was reading that rate limiting capabilities on api gateways is not unorthodox. If I had the chance to implement a rate limiter I would start with placing its logic in the api gateway(I'm assuming we would need one) and initially using a redis server(which would be the bottleneck) to help me controlling the each user request counting. I'm not suggesting it is the right approach, but the one in the video is very complex at my current knowledge state. I liked, but I would like to see other solutions.

    • @shamilmehdi
      @shamilmehdi 2 роки тому

      @@johelcarvalho9747 regarding your question, I think putting that logic in Gateway side will seperate it from Services' side which developers use. That will prevent using Services' context for decision making about eligibility of the request for throttling. Not all requests are eligible for throttling, Gateway side will be premature for deciding whether to call ALLOWREQUEST or not. That is my understanding.

  • @howellPan
    @howellPan 5 років тому +5

    I had to watch it second time around to appreciate the detail and thoroughness. Great work, thanks.

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +1

      Thank you Howell for the feedback and all your other comments! Working on answers to those.

  • @choucolin6742
    @choucolin6742 Рік тому +1

    Thank you very much. This is so helpful.

  • @timothy6966
    @timothy6966 Рік тому +2

    Why did you stop making videos?

    • @tmosest
      @tmosest 5 місяців тому

      Sells the class on LeetCode now.

  • @im5DII
    @im5DII 4 роки тому +2

    I felt tired and boring about learning these CS knowledge these days. But after watching diagram with clear explanation here, curiosity comes back to me and now I felt "wow easy and interesting"

  • @ch33zer
    @ch33zer 6 місяців тому +1

    Around 13:15 there's a bug in the code. If tokensToAdd is 0 we still update the timestamp. This means if we have lots of requests close to each other with all tokensToAdd being 0 we may never refill the bucket. Solution is to only update the timestamp when tokensToAdd is non zero.

  • @yuegu2526
    @yuegu2526 5 років тому +4

    I have to say this is a really awesome system design video, good content, good pictures. Like it so much, please upload more!

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому

      Thank you, Yue, for the feedback. Much appreciated! More videos to come.

  • @kapilgupta8543
    @kapilgupta8543 3 роки тому +2

    So organised, structured, through video seen ever on system design. At any point I expect that would be good if next section explains this thing, and in I see it's there. Felt like it reads mind. :)

  • @loojnow
    @loojnow 3 роки тому +2

    Amazing content! Really appreciate the English subtitles. Keep up the good work!

  • @feirao9027
    @feirao9027 4 роки тому +3

    As a new architecture, I think this channel is VERY HELPFUL starting point for me, with all key points.It is useful not only for interviews. I'm looking forward to more patterns explained. Thanks a lot man!

  • @coneymoney5190
    @coneymoney5190 5 років тому +2

    In the example , Why can't we have the tokens for each client configured on a reverse proxy sitting in front of the service hosts instead of each service host ? Wouldn't this avoid the over head of servers having to reconcile the tokens with each other ?

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +1

      Hi Coney Money,
      Good point. It is totally possible. API Gateways in public clouds follow this pattern. API Gateways nowadays act as a reverse proxies. And most of them support rate limiting (throttling). E.g. docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html

  • @ShaunYCheng
    @ShaunYCheng 2 роки тому +2

    I wish you could upload more videos!

  • @royh7596
    @royh7596 4 роки тому +3

    Thanks for the great video. One question, why do we need explicit "tokens"? why not just synchronized "counts" on each instance, decrement the count and provide a callback to increment when a request is done? Thanks!

    • @bradw2k
      @bradw2k 2 роки тому +1

      I think the idea of tokens is just the visualization, and you are right that really it's just counts (integers).

  • @himani1349
    @himani1349 4 роки тому +2

    Your videos are really helpful, probably one of the best resources on the internet. Please make more system design videos around instagram, news feed, WhatsApp, Netflix, google docs etc.

  • @MrMukulj
    @MrMukulj 3 роки тому +1

    @MIkhail, pretty awesome work you have done. Thank you!. I wanted to propose a slight change to refill method see if you accept it. 1. private void refill ( int tokens) 2. replace line #3 in refill method with -> currentBucketSize = Math.min(currentBucketSize + tokenToAdd, maxBucketSize + tokens); Also in the allowRequest method first line will change from refill() to refill(tokens);

  • @singhalvikash
    @singhalvikash 3 роки тому +2

    The best System Design explanation on Rate limiter I have found so far . Kudos to you.

  • @gurdeepratti2189
    @gurdeepratti2189 2 роки тому +1

    Can there be a third approach where the rate limiter is a separate service on it's own (or maybe a part of the gateway) ? So it throttles requests right in the beginning ?

  • @sureshbabu8794
    @sureshbabu8794 Рік тому +1

    Waiting for more such videos. Very good explanation. Thanks!

  • @dbenbasa
    @dbenbasa 4 роки тому +1

    In case of relying on message broadcasting between the services (instead of having a separate dist. cache service), how would we synchronize the requests?

  • @gilcd85
    @gilcd85 3 роки тому +2

    Thank you for the awesome video!
    One thing I don't understand: why must we have the servers communicate with each other? Can't the load balancer call a RateLimiter microservice with the API being called and designated execution server? Wouldn't it make it much easier to implement and scale that service. I'd probably have it log rates, servers and uses in an in memory REDIS cache

    • @ryeguy01
      @ryeguy01 2 роки тому +1

      You're describing the "distributed cache" approach that is touched on in the video. The downside of it is you are adding latency and will have reduced availability since your code depends on this external service. To be honest this is the approach I would use though, because it's way simpler than coming up with some custom node-to-node system.

  • @nitin_puranik
    @nitin_puranik 2 роки тому +1

    What are your mom's favorite flowers? Didn't see that coming 🤣🤣

  • @andreyvalverde4780
    @andreyvalverde4780 2 роки тому +2

    Please make more videos like this! This is great content!

  • @yueliang171
    @yueliang171 4 роки тому +3

    This is awesome! Best and most comprehensive one I can find on UA-cam. Keep up the great work! Looking forward to more videos!

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +1

      Glad you liked! And thank you for all your comments, Yue Liang. Appreciate your feedback!

  • @sangeetdahal7988
    @sangeetdahal7988 5 років тому +5

    was amazing lecture

  • @Whaleshamu
    @Whaleshamu Рік тому +1

    Awesome content. Your way of explanation is really amazing. Please make more videos on system design.

  • @saranyavivekanandan9044
    @saranyavivekanandan9044 Рік тому +1

    I'm unable to get how the refill method works.. @14:26 how allowRequest(5) has left the bucket with 2 tokens?

    • @bora8254
      @bora8254 Рік тому

      In this example, the refill method refills 10 tokens max in 10 seconds ( meaning that each token is refilled per second or 100 milliseconds ), you can see from the method implementation, the "tokensToAdd" variable is calculated as the diff between the now and the lastRefillTimeStamp. In each allowRequest() method firstly calls the refill() method so the lastRefillTimeStamp property is updated with the now value. In the first allowRequest(6) call , actually it didn't add any token because it was already full ( 10 tokens ) at that time and lastRefillTimeStamp is set to now timestamp. After the allowing 6 request, we left 4 tokens. After 200 miliseconds, the allowRequest(5) request is called and when the refill() is called again as the first operation in the allowRequest(), not it sees the lastRefillTimeStamp as 200 milliseconds before the current time, so after calculating the "tokensToAdd" , it get the 200 milliseconds there, so as 1 token needs 100 milliseconds to be refilled meaning that 200 milliseconds is enough for 2 tokens to be refilled. This is where this 2 tokens come from.

    • @scottwang7571
      @scottwang7571 Рік тому

      @@bora8254 how about we send 9 requests at 300 ms, and 7 requests at 900ms. we should have 6 more tokens at 900ms right. plus the 1 remain token, we should able to handle 7 requests at 900ms point. but in total it will be 16 requests in less than 1 s. is that okay?

  • @nameunknown007
    @nameunknown007 2 роки тому +1

    Haha I liked your content and you are funny too, subscribed to your channel.

  • @santoshbhatnagar2155
    @santoshbhatnagar2155 Рік тому +1

    Hey, really love the depth in which you cover in the videos, please make some new videos as well. Looking forward for them

  • @djBC10000
    @djBC10000 4 роки тому +2

    I automatically like the video before watching it because I know it will be of high quality.
    Thanks again for doing this.

  • @kiransriramula
    @kiransriramula 2 роки тому +1

    You are really amazing. Thanks for making such a high quality video. Cant believe that you explained all these in 30 mins :)

  • @jasonwu3490
    @jasonwu3490 5 років тому +2

    in 14:32 , shouldn't allowRequest(5) return false instead of true? it has 11 requests in the first 0.5s and limit should be at most 10 requests per sec.

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +1

      Hi Jason.
      When bucket is not empty, we can use all the existing tokens from the bucket + whatever has accumulated since the previous call to the bucket (but not more than a maximum allowed). And this may lead to the situation you observed.
      Let's take a look at two scenarios: bucket is empty in the beginning of a second and bucket contains some tokens (not empty) in the beginning of a second.
      If bucket was initially empty, by the end of that second we can only use tokens accumulated during that one second interval, which is 10 tokens. All looks "fair", right? But if there were some tokens in the bucket, we can use all of them in the beginning of that second and use more in the end of that second, when more tokens will be accumulated. So, we have like a credit here.
      If to return to the example in the video, we could use 10 tokens right in the beginning of the second and then 10 (actually 9) more in the end of the second, resulting in 20 total. This will empty out the bucket and during the next second we will have only 10 tokens to spare.
      So, there is some "unfairness" here. But we should not be discouraged by this. There will be scenarios (network/hardware failures) when token calculations are not precise. And this is ok.

    • @Rahul-pr1zr
      @Rahul-pr1zr 5 років тому

      @@SystemDesignInterview Hi, is there a way to overcome this situation and ensure that the limits are strictly enforced?

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +2

      Theoretically - yes. But this is not something you want to do in practice.
      If we want to calculate tokens precisely, we should be talking about strongly consistent rate limiting system. When some host processed a request and used a token, this information is recorded in rate limiting system and strong consistency ensures that all other hosts get the latest token count value.
      But strong consistency comes with a price. Remember the CAP theorem. In case of a network partition we can choose either consistency or availability. If we want to calculate everything precisely, we need to choose consistency and, thus, we lose availability.
      Rate limiting system is one of those examples where you want it to be available (to protect your service). And accuracy should come after availability in the priority list. Everything is a trade-off.

    • @zoozz117
      @zoozz117 5 років тому

      @@SystemDesignInterview Even for a single host, the token buckets could still process 20 requests in 1s base on the example you mentioned: We could use 10 tokens at the beginning of the second and 10 at the end. How to avoid this situation when there is only a single host.

    • @SystemDesignInterview
      @SystemDesignInterview  5 років тому +1

      Hi S Michael,
      Many options can be applied. Let me know which one is your favorite.
      - We can modify refill logic for the token bucket algorithm. So that no more than 10 tokens are refilled per second.
      - We can use a different algorithm. E.g. use straightforward counting. Every time request arrives we increase the count. When counter reaches a particular limit (e.g. 10), we refuse to process requests. Important to note that we need to reset counter every second.
      - Another possible algorithm is to keep request timestamps in a queue. The oldest and the newest timestamps are at the ends of the queue. To make a decision we check the size of the queue. And we should remember to delete "too old" timestamps from the queue.
      I am pretty sure you can come up with more options. But remember about a classic time vs space tradeoff for the algorithm you choose. As this algorithm will be called on each request. And may become a bottleneck on a large scale.

  • @shr_q
    @shr_q 2 роки тому +1

    Excellent explanation! Thank you so much for doing this. Wish I had discovered this sooner.

  • @aditi9059
    @aditi9059 5 років тому +2

    Awesome explanation !!! Please keep up the excellent work and upload more such videos !!!

  • @nishantagrawal6244
    @nishantagrawal6244 5 місяців тому

    Don't understand at 14:22, how 2 more tokens were added in refill method call at t2? As per the algorithm, 5 more tokens should be added.
    Refilling need to be done at t2 since there are not enough tokens left. So
    Number of tokens to be added = (t2 - t0)*refillRate
    = (t0+300+200-t0)*[(1/100) tokens/ms)
    = (500 ms)*(1/100)
    = 5 tokens

  • @kommisettyveerendranath53
    @kommisettyveerendranath53 Рік тому +1

    One of the best and finest content with simple examples. Your way of explanation and content coverage is really amazing. Thank you very much for explaining such valuable concepts in this video.

  • @perpetualcurious4425
    @perpetualcurious4425 3 роки тому +1

    This is the first video I watched on your channel and you earned a subscriber. Thanks for making it so simple.

  • @muthukumarankothandaraman2371
    @muthukumarankothandaraman2371 3 роки тому +1

    I recollect an instance (not in interview) when I was asked if CircuitBreaker can be used as RateLimiter 😊

  • @zhongbot
    @zhongbot 5 років тому +3

    Took 4 pages of notes from this, very good indeed!

  • @renon3359
    @renon3359 3 роки тому +1

    Great video, but just wondering if the system has millions of users, how are we going to cache the throttling rules on the nodes?

  • @mohitsamarth
    @mohitsamarth 2 роки тому +1

    Wish we had more of your videos on these topics! I dont mind paying for such quality content :)

  • @gxbambu
    @gxbambu 3 роки тому +2

    the best system design channel I've ever found on internet! Bravo!

  • @muthukumarankothandaraman2371
    @muthukumarankothandaraman2371 3 роки тому +1

    23:04 and there's one more elephant in the room!! This man's presentation is highly enjoyable - whatever one runs into in practical system-design problems, this man speaks of it 👌👌👌👌

  • @TheRohkan
    @TheRohkan 3 роки тому +1

    We need more videos :) this is the best channel for in-depth system design interview prep after scouring the tube

  • @jaewonlee76
    @jaewonlee76 3 роки тому +1

    This is the best channel for system design interviews. I watched all videos a week before my onsite interviews and got 4 offers from 4 interviews.

  • @bibop224
    @bibop224 4 роки тому +1

    22:45 Could you clarify what is stored in Redis and how the hosts interact with Redis ? I understood that, in the Gossip solution, each node implements a token bucket with the "global" refill rate, and nodes periodically tell each other how many tokens they used. So in Gossip solution, each node has in memory its own token counter, that it decrements every time it receive a gossip message. But how can it work with the Redis approach, if Redis stores the token counter, we need a host to "refill" it, which is similar to what you describe in the next approach "Coordination service". Hosts could periodically decrement a counter in Redis by how many token they used, and also compare with the previous value they have read, to figure out how many tokens were used by the rest of the fleet, but we have to deal with integer overflow and it looks weird, so i doubt it is what you suggested ? Thanks !

    • @SystemDesignInterview
      @SystemDesignInterview  4 роки тому +2

      Hi bibop224. These are all good questions/considerations. Thanks.
      Token Bucket algorithm is indeed not the best option if we want to use Redis to store counts. And mainly because of refill logic (as you pointed out). Refill requires a write operation and it should be atomic (addition and subtraction of tokens). That is why other algorithms may be a better choice. And usually, these are different modifications of a sliding window algorithm.
      One modification is to use Redis's sorted set data type and store request timestamps. At each point in time we can check how many request have been processed within the last minute (engineering.classdojo.com/blog/2015/02/06/rolling-rate-limiter/).
      Another modification is to use Redis's counters. When we count number of requests per some time interval (e.g. 1 minute) and use simple calculations based on ratios (blog.cloudflare.com/counting-things-a-lot-of-different-things/).
      Both of these algorithms has pros and cons.

  • @jisesi
    @jisesi 3 роки тому +1

    I miss your new videos bro. This is the best system design channel I have ever seen in all places

  • @TheProximator
    @TheProximator 3 роки тому +1

    Great video, thank you very much, please add more :)

  • @yushutong722
    @yushutong722 2 роки тому +1

    Didn't understand this part, why the bucket for the same user/key is placed in 3 different servers? Why not shard based on key so there is always one single server handling a specific user/key? I think the complexity in managing that sharding mechanism is lower than to have multiple server sync with each other. Especially in this case when the rate limit interval is X/second, the latency of communication seems too significant comparing to the bucket refill frequency.

    • @AniketSomwanshi-ll7mz
      @AniketSomwanshi-ll7mz 9 місяців тому

      I agree with this. The distributed cache like redis is the one which should store Map. Every host will talk to this. Every host should not store this

  • @graghukalyan
    @graghukalyan 3 роки тому +1

    One of the best SD content i've come across. We are waiting for more content. When is it coming :) ?

  • @ShubhamSingh-ku2ow
    @ShubhamSingh-ku2ow 3 роки тому

    I did not get how RateLimiter uses the RulesCache. Are they part of one process ? I don't think that since the RulesCache is populated via RetrievesRuleTask which is a cron job (a different process) running parallely with the application. Then how does two different processes (here RetrievesRuleTask and RateLimiter) uses a common cache (which I am guessing is basically a Key,Value Hashmap local to particular process)?

  • @egor.okhterov
    @egor.okhterov 2 роки тому

    I don’t like the idea of token synchronisation. That becomes a problem in itself. The network load by this synchronisation mechanism is a function load(users, rps)=users*rps
    The more users with more rps we have the larger the network load, memory load, CPU load.

  • @cantwaittowatch
    @cantwaittowatch 2 роки тому +1

    Great video. Question: Token Bucket Algo example - I didn't quite understand the refill where 2 more tokens were added, that is 4+2 - 5 = 1. If allowRequest is 5 then 4 tokens would be consumed leaving no more tokens, and after the time hits 1000ms, then refill kicks in. Bit confused on the refill. Any thoughts here? thanks

  • @mickys4999
    @mickys4999 Місяць тому

    I needed help understanding the 17:05 video about three hosts and how to put four tokens in each bucket. What is he trying to convey in that part by means of three hosts? Are those three hosts three servers? When the load balancer evenly distributes the request, why won't the request for the key be evenly distributed?

  • @victoriapuscas7028
    @victoriapuscas7028 3 роки тому +2

    More content, please! This is awesome!

  • @maniraj1231
    @maniraj1231 3 роки тому +2

    Very detailed explination.

  • @zustaz
    @zustaz 2 роки тому +1

    Thanks for covering the most interesting and difficult topics on you channel!

  • @bullyellis1
    @bullyellis1 2 роки тому +1

    Hand's down best system design study interviews on youtube.

  • @anu1097
    @anu1097 2 роки тому

    I have a query, why can't we apply the token bucket algorithm on Load Balancer itself ? Or add another Rate Limiter Service before Load Balancer ?

  • @siweizhu7559
    @siweizhu7559 4 роки тому +3

    best system design lessons I have seen so far!

  • @boqunzhang8045
    @boqunzhang8045 2 роки тому +1

    Best system design interview I have ever watched. Short, clean, and thorough. Subscribed!

  • @ravindrabhatt
    @ravindrabhatt 2 роки тому

    It should be Highly available otherwise, for that duration when the service is down, the application is vulnerable for DDOS

  • @avijain6277
    @avijain6277 3 роки тому +1

    If this video was an NFT, I would buy it

  • @chessmaster856
    @chessmaster856 9 місяців тому

    This should be similar to db connection pooling. The connections are limited. Same strategy can be used. But dufferent people have invented diff algorithms. invented

  • @youngbay6315
    @youngbay6315 4 роки тому +2

    Best system design channel. Will recommend to all my friends

  • @sucmanh
    @sucmanh Рік тому

    When would we want to call allowRequest(6)? Shouldn't the parameter in allowRequest() always be one?

  • @raghuav4079
    @raghuav4079 2 роки тому

    If each node's token bucket in cluster get full rate limit , 4 per second in this case , and there are 1000 nodes, potentially it could serve 4000 requests in a second ? Doesn't that defeat whole purpose of rate limiting?

  • @neilteng1735
    @neilteng1735 3 роки тому +1

    Most resourceful and deep system design video on the Internet.

  • @mimislifeinmotion
    @mimislifeinmotion Рік тому

    Rules to follow for System Design Interview Questions
    ua-cam.com/video/uhy9tngnHxc/v-deo.html