System Design of Doordash: Geo-Hashing and WebSockets for Location Based Services

Поділитися
Вставка
  • Опубліковано 8 лип 2024
  • We go through a popular interview question: Design Doordash.
    The system design of Doordash (similar to Swiggy and Zomato in India) involves matching food orders to riders. A rider has to be selected based on their location proximity to the restaurant (since the time taken to deliver a order from restaurant to customer wont change on rider).
    For this, we shard geographical locations using hashes, known as Geo-Hashing. An area can be recursively broken down using a geo hash. Riders within a goehash region can be asked to pick an order on demand.
    Tracking a delivery is done using server-side events or WebSockets. I suggested the idea of WebRTC for this, but it seems like overkill. Let me know your thoughts :D
    Jordan's UA-cam Channel: ‪@jordanhasnolife5163‬
    Location-based databases: • Designing a location d...
    System Design Website: interviewready.io
    00:00 Intro
    01:35 Functional Requirements
    02:50 Capacity Estimations
    06:45 API Endpoints
    08:10 Data Sources
    11:30 Onboarding a restaurant
    12:20 GeoHashes
    23:20 Driver Updates
    27:00 Data Consistency
    28:30 Consistent Hashing
    36:40 Optimizing Deliveries
    40:20 Delivery Tracking
    44:44 WebRTC
    46:18 Concluding thoughts
    You can follow me at:
    Github: github.com/coding-parrot/
    Instagram: / applepie404
    LinkedIn: / gaurav-sen-56b6a941
    Quora: www.quora.com/profile/Gaurav-...
    Twitter: / gkcs_
    #SystemDesign #InterviewReady #Coding

КОМЕНТАРІ • 67

  • @binwelbeck1482
    @binwelbeck1482 Рік тому +22

    Jordan Knows what he is taking about ; I like his confidence and the way he explain the scenario. thanks gaurav for inviting Jordan

  • @payalgupta3487
    @payalgupta3487 Рік тому +4

    Digging deeper into single component is such a good idea. Rightly said that while discussing complex systems on the whole we tend to oversimplify. Enjoyed the video thoroughly. Thanks.

  • @hardikmenger7360
    @hardikmenger7360 Рік тому +5

    This guy was randomly seen on my feed and since then i am following him. His channel is the most no bs system design walkthrough channel.

  • @jackkohnjj
    @jackkohnjj Рік тому +4

    This was a realistic interview in the sense that the interviewer realizes that the scale they are asking to design for is completely insane (500K drivers, 10M orders a day) but still proceeds as a thought experiment. I had an interview recently for a game platform and the interviewer said I should design for 50M concurrent users. LMAO.

  • @modernsimplelife8706
    @modernsimplelife8706 Рік тому +6

    Great video! It's very helpful to see a video where both of the interviewer and the interviewee know what they're talking about in a mock.
    These are some thoughts about WebRTC from my experience. I've been using and implemented WebRTC for 3 years. WebRTC uses UDP under the hood, however it uses SCTP protocol on top of UDP (probably will be replaced by QUIC eventually) to create a TCP-like experience for data channels. The developer is allowed to configure either guaranteed delivery (TCP) or lossy (UDP). WebRTC requires a signaling server to build a root of trust, and connection information for both the user and the driver. WebRTC and Websocket are stateful, meaning the server needs to keep the socket connection open for the entire session, otherwise the clients must reconnect. The stateful nature makes the solution harder to scale, and more complicated.
    As you guys mentioned, most users likely don't track on their food delivery as often as they would with Uber. I think WebRTC is overkill in this use case. The WebRTC connection likely gets broken (from closing the app) many times in the food delivery tracking session, and cause load churns to the server. I personally prefer doing HTTP polling with HTTP/3 transport (which is also based on UDP and support 0-RTT connection establishment). I think it is the simplest solution with reasonable low load to the server and easier to load balance horizontally.
    But, at the end of the day, we should look at metrics to decide the best solution for the problem. Strategically, starting with the simplest solution to quickly gather important metrics would be a good idea. So, I would start with HTTP polling, collect the data, then re-evaluate if WebRTC is needed. Finally, collect the data between HTTP polling and WebRTC. HTTP polling might be the most optimal solution already, simpler solutions can do wonder sometimes 😀. There are also ways to reduce states in WebRTC.

  • @SoumikPradhan
    @SoumikPradhan Рік тому +3

    This collab was worth the wait. Nice discussion.

  • @geekydanish5990
    @geekydanish5990 Рік тому +22

    I have been watching Jordan’s content for quite a while now and dude you are just killing it thanks Gaurav for inviting him

  • @sachin_yt
    @sachin_yt Рік тому +11

    It's amazing watching both of you! Learned a lot! Thanks Gaurav! Def need more of these :)

  • @givemeabreakbro
    @givemeabreakbro Рік тому +8

    Jordan knows what he's talking about. Every answer sounds calculated and well reasoned. Gaurav + Jordan is a killer combo. Looking forward to more such collabs

  • @ayonkar1534
    @ayonkar1534 Рік тому +2

    This is one of the best videos I have come across for SD Excellent interview candidate.

  • @hitmusicworldwide
    @hitmusicworldwide Рік тому +7

    In the USA geohash fixed addresses by zip code+4. The post office has done all the work. A more sophisticated analysis of verified address locations means that you aren't sharding for a lake, or other uninhabited area, saving tables, and flagging possible false orders and incorrect data.

  • @TheHp6515b
    @TheHp6515b Рік тому +12

    Great content. Jordan is very articulate and able to seamlessly translate theoretical knowledge to concrete solution.
    Quick note, consistently hashed cluster is great when keys evenly distribute across the hash space. In this case if we use `geo-hash` directly as the key, we get uneven distribution since small number of prefixes will have most of the data (e.g. geo-hashes representing Manhattan, Seattle etc) i.e. small number of keys having larger data.
    We could use a more dynamic range based sharding (involves a seperate manager service etc) to split key ranges based on load. Alternatively, we can hash the geo-hash itself to get even spread but loose colocation of relatively closer grids which could be important as we zoom-out for finding drivers.
    Either works but worth discussing the tradeoff especially considering latency is critical. (fan-out vs fan-in). Also wasn't clear what approach was proposed.
    Also, Jordan touched about point related to use of pure peer to peer communication for tracking (user and driver directly exchange communicate).
    Few reasons why this isn't practical generally,
    1> Security & Firewalls. Exposed open ports in user's phones are prime target for DOS attacks. Most routers or n/w infra will actually block any new external incoming connection by default and will require a user override to whitelist/forward port. If it's a public router etc there won't be anyway at all to receive connection requests.
    2> Relaying through a central service gives more reliable (in terms of throughput) network path (aws/gcp have interconnects with major isps and dedicated n/w infra), whereas for direct connections require to go through congested public interconnects. This is less of a problem when people are in close geo proximity so insignificant in this case, however relevant for whatsapp etc.
    Understandably not every aspect can be discussed in detail and interviewee isn't even expected to be aware of every aspect.
    Hence FAANG(at least F,G,U) gives interviewee leeway in picking one aspect of their choice for deep dive besides getting overall high level design.
    Side note. It is ok to say i don't know internals of x or y (say websockets or loadbalancers or storage engine etc) and move the interview to what you know. i.e. play by your strengths, not knowing deeply about few things is expected and doesn't result in -ve points but vaguely/incorrectly answering something will lead to -ve point and more followups around the same thing eventually leading to botched interview.

  • @user-pp6fi2bt4w
    @user-pp6fi2bt4w 3 місяці тому

    excellent explanation about geoshashing, thanks guys!

  • @ayushraj6525
    @ayushraj6525 Рік тому

    Learned a lot from this collab.. Thanks a lot Gaurav! 🔥

  • @colton8971
    @colton8971 Рік тому +1

    Jordan is a funny dude and not ugly!

  • @okidokiyowyow356
    @okidokiyowyow356 Рік тому +1

    Hahaha. The introduction is really funny

  • @imranshaikh115
    @imranshaikh115 Рік тому

    Very nice 👌
    I appreciate 🙏 💛 your efforts

  • @RedBeardRetroTech
    @RedBeardRetroTech Рік тому

    Thanks for the vid

  • @nishantgarg2815
    @nishantgarg2815 Рік тому

    Nice one. Learnt a lot. Keep up the good work

  • @suhani7137
    @suhani7137 Рік тому +12

    hi gaurav! thank you for the amazingg content you are providing. i have binge watched your entire content in the past few months.on point explanation...your teaching skills are really goood. just a suggestion if you may...you should collaborate with people who amplify your reach ...like aman dhattarwal..or maybe launch some live courses on their platform...his reach is MASSIVE..it will also help you increase sales for your startup
    Idk if the suggestion i gave is even possible😅but just felt like this precious content and your courses should be reaching more and more people!

  • @kaustabpaul4514
    @kaustabpaul4514 3 місяці тому

    thanks for sharing it was awesome

  • @abhineetsingh6720
    @abhineetsingh6720 Рік тому

    great video

  • @ankitnmnaik229
    @ankitnmnaik229 Рік тому +1

    Bhaiya, i just passed clas 12 th..and i want to get into programming (ml/Ai) but i have never been good at problem solving..so how can I improve my problem solving from now on...can u suggest ways..pls..

  • @user-pj9ck8ef1j
    @user-pj9ck8ef1j Рік тому

    why this video can not be found on gaurav main page home->system design tag?

  • @secondIncomePartTime
    @secondIncomePartTime Рік тому +1

    jordan is genius

  • @apoorvchaturvedi2493
    @apoorvchaturvedi2493 Рік тому

    Thanks Gaurav for this great video. Just one question as redis is being used here as geo-sharded database how concurrency will be handled with redis.

  • @devam9999
    @devam9999 Рік тому +1

    Would have been great to see details about the DB/tables design, and then going for the data storage estimations based on that. Is that something generally expected, or is the high level data storage estimation without getting into details about schema design, as done by Jordan here, also acceptable in an interview?

  • @michahubert1179
    @michahubert1179 2 місяці тому

    Thats a great content ! One thing that happens notoriously without noticing are units. 5Mb is 5 mega bits "b" not bytes "B".

  • @nitishkumarsingh6020
    @nitishkumarsingh6020 Рік тому

    Which tool has been used for drawing?

  • @RahulDasAdhikary-usa
    @RahulDasAdhikary-usa Рік тому

    Nice work! First example is nice - Sri Lanka to India Food delivery 😀 ha ha.. Anyways, I like your channel Gaurav.😊

  • @constantinekhvalin6038
    @constantinekhvalin6038 Рік тому

    is it secure to have peer-to-peer connections between dashers and clients?

  • @tanveer.shaikh
    @tanveer.shaikh Рік тому +1

    hi Gaurav, for every video can you also please mention the pre-requisite concepts , i felt few of the things going over head

  • @deepak655655
    @deepak655655 Місяць тому

    Gaurav talking about UDP and his voice went berserk, talking about coincidence !!!

  • @MrMAhmed22
    @MrMAhmed22 10 місяців тому

    Very nice effort, felt like gaurav missing some stuff on geo sharing and asking off questions was a bit confusing for the interviewee

  • @kamalsmusic
    @kamalsmusic Рік тому

    Why do we need to do range queries within the redis instance in order to find dashers? If each shard of redis handles a range of geohashes, then can each redis shard just have a hashmap which maps a geohash -> [list of dasher ids]? You can use consistent hashing to locate the shard, then just do a hashmap lookup within that shard.
    As long as given a geohash, I can calculate the 8 surrounding box geohashes, this should be sufficient right? I think most libraries for geohash can calculate this quickly

    • @jordanhasnolife5163
      @jordanhasnolife5163 Рік тому +1

      I think it really just depends on the size of the geoshard that you use. If they are big, then you'll have to do a range query to quickly find a nearby driver to a given geohash. If they are the perfect size, then you could just select any of the dashers located in that node.

  • @mansivishwakarma361
    @mansivishwakarma361 Рік тому +1

    💕💕💕💕

  • @chinmaysharma1867
    @chinmaysharma1867 Рік тому

    Do I need to know Web development before study system design ???

  • @sanketsardesai1961
    @sanketsardesai1961 11 місяців тому

    I would have used PostGIS definitely, store the data in geometry and use all the spatial functions, indexing and what not

  • @joaoluismoraes7215
    @joaoluismoraes7215 9 місяців тому

    Hi Gaurav, this interview feature by Interview Ready looks nice. How did you guys build it? WebRTC? Have you written any tech deep dive on implementing it?

    • @gkcs
      @gkcs  9 місяців тому +2

      We wrote the code in Golang, and accept http requests only.
      My next video is a deep dive on the architecture, stay tuned!

  • @hazemabdelalim5432
    @hazemabdelalim5432 Рік тому

    Isn't MYSQL harder to shard ? i believe no-sql is easier to shard in that case so i would choose it .

    • @gkcs
      @gkcs  Рік тому +1

      It has to be sharded manually, yes.

  • @poshettiakhil2664
    @poshettiakhil2664 Рік тому

    Hi Gaurav,
    This question is not related to doordash system design,
    Please can you explain the difference between message bus ,message queue and message broker

    • @protyaybanerjee5051
      @protyaybanerjee5051 Рік тому +1

      Message broker can be a distributed system which provides a reliable and fault tolerant way of delivering messages b/w multiple producer and consumer.
      A message broker can handle many to many mapping between a producer and consumer

  • @joaoluismoraes7215
    @joaoluismoraes7215 9 місяців тому

    I've worked at a company that solved a similar problem but at a smaller scale (it had 200k users and more than half weren't active). Instead of Geohashing, they did it with MongoDB Geospatial queries. Why is Redis geohashing better in this case? When does geospatial queries start to become a problem?

    • @mhmdshaaban
      @mhmdshaaban 7 місяців тому

      MongoDb supports both 2d Index where it uses b-tree and 2dsphere Index where it uses similar approach "geo hasing" I'm curious why do I have to implement that from the get-go and It's already implemented

  • @tusharabbott9946
    @tusharabbott9946 Рік тому

    now i know why swiggy was not able to find a delivery partner for my order.

  • @fabioschubert
    @fabioschubert 11 місяців тому

    Lots of good stuff here but should've at the very least *mentioned* storing the Restaurant's data, menus, building the order, etc.

  • @jackkohnjj
    @jackkohnjj Рік тому

    FYI, lat/lng to geohash is simple arithmetic. It would be ridiculous to call a service to do that.

  • @protyaybanerjee5051
    @protyaybanerjee5051 Рік тому +1

    Interview Ready UX could do with some improvement

    • @gkcs
      @gkcs  Рік тому

      Yes. Thanks for the feedback.
      Any suggestions for improving the UX?

  • @VireshGehlawat
    @VireshGehlawat Рік тому

    The content was great, the tool needs to be improved.
    I can see both of them struggling with the interface.
    @Gaurav - As the interviewer in a real world scenario, the interviewer won't really be drawing as much as you were in this case.

  • @rkara2
    @rkara2 Рік тому

    Why can't you just have all the actors (drivers, customers, businesses) subscribe (websockets) to the system and ping their location (lat/long) in real-time?
    Then when a customer / business needs a driver, they are getting the real-time picture of whos active in the system and where they are.
    So at the end of the day you don't care where drivers / customers are in the world, you just care about their availability.
    So you can have one table called Availability that you match everybody through.

  • @sipwhitemocha
    @sipwhitemocha Рік тому

    Aye shoutout to J. Epstein

  • @KiritiSai93
    @KiritiSai93 2 місяці тому

    The comment about geohash being ordered and doing binary search is a bit flawed. Remember that geohash has this property:
    'abc' is ALWAYS closer to 'abd', but 'def' for example might actually be also closer to 'abc' as well. So closer in edit distance is a sufficient condition but not a necessary one.
    Read System Design book by Alex Xu for more info on how this could happen.

    • @gkcs
      @gkcs  2 місяці тому +1

      ua-cam.com/video/OcUKFIjhKu0/v-deo.html

    • @KiritiSai93
      @KiritiSai93 2 місяці тому

      Thank you! Looks like you've already covered in the topic :)

  • @anirudhsilverking5761
    @anirudhsilverking5761 Рік тому

    I think I gained 30 IQ points just listening to this

  • @timurakhalaya6289
    @timurakhalaya6289 3 місяці тому

    poor performance i would say, in term of technology picked and in terms of planning as well ...

  • @aparnaprabhu
    @aparnaprabhu Рік тому

    I read this as system design of Doordarshan 🤦‍♀️🤦‍♀️