Design Google Drive or Dropbox (Cloud File Sharing Service) | System Design Interview Prep

Поділитися
Вставка
  • Опубліковано 22 гру 2024

КОМЕНТАРІ • 89

  • @interviewpen
    @interviewpen  Рік тому +5

    Thanks for watching! Visit interviewpen.com/? for more great Data Structures & Algorithms + System Design content 🧎

  • @shemleong7571
    @shemleong7571 Рік тому +28

    Great overview. There's a few tweaks I would make: 1) Have ingest service return a presigned url so the client can directly upload to s3. This offloads the bandwidth problem to the client. 2) Tap on event triggers to handle the post-upload activities. 3) Instead of that first queue, rate limiting or API throttling might be a more appropriate way to manage the load.

    • @interviewpen
      @interviewpen  Рік тому +4

      Agreed, using presigned S3 URLs is a great solution to manage load on the ingest API. That would also potentially eliminate the need for the queue in front of that API. Thanks for watching, stay tuned for more!

    • @meprateek24
      @meprateek24 Рік тому +2

      If the file gets uploaded directly to S3 then we might face the same issue that was explained in the beginning of the video about if the connection breaks then the whole upload has to start again. Will S3 upload also happen in chunks?

    • @drhdev
      @drhdev 11 місяців тому +3

      @@meprateek24 Not our problem

    • @justinchan4810
      @justinchan4810 6 місяців тому

      ​@@meprateek24 It's safe to assume the file upload can also be done in chunks, especially since the design stores the file in chunks in S3 (shown at 7:58)

  • @yxawp
    @yxawp Рік тому +13

    IOPS: (1M)(100) / 86,400 is ~ 1150/sec. Not clear why it was calculated to 115,000/sec in Handling Subscriptions section.

    • @interviewpen
      @interviewpen  Рік тому +1

      Good catch, thanks!

    • @mecury007
      @mecury007 8 місяців тому +1

      Yeah I lost 20 minutes trying to understand why 1M * 100 = 10B and not 100M

  • @teetanrobotics5363
    @teetanrobotics5363 Рік тому +5

    I hope this message finds you well. I wanted to take a moment to express my sincere gratitude for the exceptional content you've been sharing on your UA-cam channel. Your recent series of five top-notch and in-depth system design videos have been an absolute treasure trove of knowledge.
    The clarity and depth with which you explain complex concepts are truly commendable. Your videos have been instrumental in helping me grasp the intricacies of system design and architecture. The practical examples you provide, along with your lucid explanations, have made learning a pleasure.
    I want to encourage you to continue creating such invaluable content. Your unique ability to break down complex topics into understandable components is a true gift. If possible, I would love to see more of these insightful system design videos from you in the future.
    Additionally, it would be fantastic if you could consider curating these videos into a playlist. Having them organized in one place would be tremendously helpful for both newcomers and those looking to revisit certain concepts.
    Once again, thank you for your dedication and hard work in sharing your expertise. Your contribution to the learning community is truly appreciated. I eagerly await more of your enlightening videos.

    • @interviewpen
      @interviewpen  Рік тому

      Yes, we do have a “System Design” playlist on this UA-cam, as well as more videos on interviewpen.com
      Thanks for the kind words & thanks for watching 👍

    • @dd-qz2rh
      @dd-qz2rh 11 місяців тому +1

      bro went straight ahead and utlizied that sweet chatgpt power

  • @gordonli4946
    @gordonli4946 Рік тому +1

    18:55, client can directly write into a queue? Not upload chunks to a service first then service will process with S3/db ? Wondering what queue is that in in front of backend service; and 23:00 why userid + fileid won’t have the scatter and gather issue as fileid alone?each user has lots of fileid/chunkid and we need at least a table for user/fid/cid anyway

    • @interviewpen
      @interviewpen  Рік тому +1

      For the first point, you're absolutely right. We'd need some sort of interface between the client and our queue to enable this. For the second, we need to shard on both user ID and file ID separately (user ID could be a global index), enabling us to query on the field we're looking for. Thanks!

  • @firezdog
    @firezdog Рік тому +2

    security was not mentioned at all, nor anything about concurrent writes -- but much better than anything i could have done, that being said.

    • @interviewpen
      @interviewpen  Рік тому +1

      There’s a limited amount of information we can convey in one video, but yes-security and concurrency are both super important things to consider here! Thanks for watching.

  • @sagarmantri4743
    @sagarmantri4743 11 місяців тому +2

    At 28:39, the calculation of IOPS is seems wrong. (1M)(100)/86900 => 115000/sec? It should be roughly 1e6 * 100 / 1e5 = 1000/sec, am I missing something?

    • @interviewpen
      @interviewpen  11 місяців тому

      Yes, you're right. Should've been 1150, not 115000. Good catch :)

  • @buntysingh7315
    @buntysingh7315 Рік тому +4

    thanks for taking the effort!

  • @Oz1111
    @Oz1111 Рік тому

    These system design vids are great. Given your expertise and how well you cover these topics, can you do a basics of system design explaining different services and common parts of system design? I know there are other channels that do this but I'd love to have you do one as your content is super clear and easy to follow. Thanks.

    • @interviewpen
      @interviewpen  Рік тому

      Thanks! If you're looking for a full course, check out interviewpen.com!

  • @hackaholic01
    @hackaholic01 Рік тому

    For the Storage usage validate, you can remove the all overhead by below
    client, will have the file stats, client can request user metadata and check is there any storage available before uploading the file.

    • @interviewpen
      @interviewpen  Рік тому +1

      Thanks for watching. I might be misunderstanding, but it sounds like you're suggesting having the client itself validate whether it has bought enough storage. You're right that doing this could reduce some overhead, but it would defeat the entire purpose of that step since the client could simply lie to the service about how much storage it has when uploading a file. It's important to make sure logic like this happens on the server side since clients are inherently untrusted.

  • @Pebblejo
    @Pebblejo Рік тому +2

    if you use "user+fileID" as the shard key, doesn't that mean you still need to query multiple nodes to retrieve all the info of all the files belong to the same users? how's that better than using only the fileID?

    • @interviewpen
      @interviewpen  Рік тому +3

      Yep, since file IDs are already unique, adding the user ID to the shard key has very little effect. Thanks for watching!

  • @hfspace
    @hfspace Рік тому +2

    one thing that has not been touched and comes to my mind immediately, is that the way chunking is handled here has room for improvement. because what if someone changes a file in the middle and adds loads of data to it (which would result in multiple new chunks in multiple different locations in the file). then you could reload the complete file or you implement some more complex indexing for the chunks, i guess and do a reindex operation.

    • @interviewpen
      @interviewpen  Рік тому +3

      Yes this is correct - we just skimmed over it and said "do chunking", but the chunking itself is a mini-research paper in itself. We find this is the case with a lot of concepts we cover! So we just try out best to hit the major details!
      Thanks for watching - more coming!!!

  • @jack.klimov
    @jack.klimov 2 місяці тому +1

    I thought the whole idea of Google Drive and Dropbox was to focus on distributed storage rather than just using ready-made cloud solutions like S3. In my opinion, that's the most interesting aspect of such a task.

    • @interviewpen
      @interviewpen  2 місяці тому +1

      For sure :) We have other videos about how BLOB storage systems are designed on interviewpen.com

    • @jack.klimov
      @jack.klimov 2 місяці тому

      @@interviewpen oh great! Thank you, i will have a look

  • @Wei-up2jn
    @Wei-up2jn 8 місяців тому

    Great Video! Thanks for sharing! One question about the sharding: if we are sharding by UserId + FileId, doesn't it mean we still have to do scatter-gather if we want to get the full file list of a user?

    • @interviewpen
      @interviewpen  8 місяців тому

      Yes, that is correct. There's never a perfect way to shard a DB, so that's the tradeoff with this approach--we'd have to fetch files from every node to get all of a user's files.

  • @kumar_gautam24
    @kumar_gautam24 Рік тому +2

    Thanks, great content

    • @interviewpen
      @interviewpen  Рік тому

      Glad you liked it, more content is on the way!

  • @sivam5204
    @sivam5204 7 місяців тому +1

    Chunk concept could be explained more.:)

  • @harshraj22_
    @harshraj22_ Рік тому +4

    Assuming by Queue you meant the message queue, I would like to know your thoughts about using kafka instead of queue for notification service, with their pros and cons. Btw, great video :)

    • @interviewpen
      @interviewpen  Рік тому +2

      We never specified specified what platform we would use for queues, but Kafka is a great choice for a system like this. The distributed nature of Kafka queues means they can be horizontally scaled to handle an extremely high load, and that would enable the system to handle the high traffic requirements. Glad you enjoyed the video!

  • @vinayak6564
    @vinayak6564 5 місяців тому

    I feel it doesn't make sense to put chunks in queue, direct client having access to a messaging-queue-system is not practically good idea from security perspective. Also it doesn't reduce load anyhow as messaging queues also need to be scaled if not injestion servers, so it is just adding extra layer just for the sake of adding. Correct me if I am wrong.

    • @vinayak6564
      @vinayak6564 5 місяців тому

      Only messaging queue for notification service makes sense.

    • @interviewpen
      @interviewpen  5 місяців тому

      The idea behind this was that if there are bursts of load, it wouldn't slow down users uploading their data. But I fully agree with you that it doesn't make sense for a client to have direct access, so it's not a very useful solution in this case. A better solution might be to use a tiered storage system behind our BLOB store which can provide very fast reads and writes for frequently accessed data while moving older data to cheaper storage mediums. Thanks for watching!

    • @vinayak6564
      @vinayak6564 5 місяців тому

      @@interviewpen Thanks for the prompt response and answer! Great content btw finished watching blob storage system design after this.

  • @PritamDas-g7d1y
    @PritamDas-g7d1y Рік тому +3

    Thanks for the video love it

  • @amirafshari1613
    @amirafshari1613 Рік тому

    @interviewpen what do you think of mentioning managed solutions instead. so for example instead of a manually sharded DB, a cosmos DB managed Postgres that autoshards or a Citus distributed SQL cluster that auto shards?

    • @interviewpen
      @interviewpen  Рік тому +1

      Totally! There's usually managed solutions for most of the services that we discuss in these designs, but we try to keep the videos general so you can understand the concepts regardless of how they're deployed. Thanks for watching!

  • @ravikant-hi8mz
    @ravikant-hi8mz Рік тому +2

    What softwares do you use? Including the grey board thing to draw. Please suggest what you are using🙂

    • @interviewpen
      @interviewpen  Рік тому +1

      GoodNotes. thanks for watching - more coming

  • @khanhtoanle8396
    @khanhtoanle8396 Рік тому +1

    Nice video!

  • @AdarshMadrecha
    @AdarshMadrecha Рік тому +1

    Good insights

  • @Tony-dp1rl
    @Tony-dp1rl Рік тому

    With the latency and buffering inherent in the queue usage and file IO and user notifications, I doubt there is a need to shard the database at all, and if there was due to load, then SQL isn't a good choice, but storage-backed Redis would be much better. SQL is a terrible choice for generic metadata.

    • @interviewpen
      @interviewpen  Рік тому +1

      Well, we'd likely have pretty high error rates if we tried to send that many writes to a single shard. On the second point, you're right that SQL isn't ideal in many use cases; it's hard to shard due to its relational model. There's tons of options for NoSQL sharded databases that could be used in this system. Thanks!

  • @nvskiran
    @nvskiran Рік тому

    S3 already provides option to upload in chunks. Why are you not using that?

    • @interviewpen
      @interviewpen  Рік тому

      Yes, manually chunking our files gives us some more control (especially around updating pieces of the file), but multipart uploads could certainly work in this same design. Thanks for watching!

  • @ebu7
    @ebu7 Рік тому +1

    Please make a video about NAS(Network Attached Storage) system design.

    • @interviewpen
      @interviewpen  Рік тому

      We'll add it to the list. Thanks for watching, more content is on the way!

    • @semenivanoff8615
      @semenivanoff8615 Рік тому

      NAS is a storage accesible by IP (CIFS or NFS) what is so special about it?
      Or you mean any specific model of a storage system like NetApp?

  • @fatcat22able
    @fatcat22able Рік тому +1

    I feel kind of dumb - what is meant by "edit" in this context? Great video!

    • @interviewpen
      @interviewpen  Рік тому +1

      I'm not sure which part of the video you're referring to specifically, but an edit is just a single change to a file that triggers a chunk of data to be updated in the system. Thanks!

    • @fatcat22able
      @fatcat22able Рік тому

      @@interviewpen Thank you for the response! I guess I'm having trouble understanding how a file would be changed in the context of this application?
      My immediate thought was that a change to a file would entail a full reupload. But I could understand it if the service were such that, if I've uploaded an image to the service, and then I make a change to that image locally, then those changes would be uploaded as chunks in order to update the image in the system as opposed to reuploading & replacing the full image, correct? And this change is what we call an edit? Please let me know if I'm understanding this correctly. Thank you!

  • @nealpan
    @nealpan Рік тому

    Great

  • @zuowang5185
    @zuowang5185 7 місяців тому

    Is this prep for a new grad level?

    • @interviewpen
      @interviewpen  7 місяців тому

      System design questions are more likely to be asked for more senior level interviews, but companies are more and more starting to ask these types of questions in more junior roles as well! Either way, it's a good idea to have some understanding of these concepts for any role.

  • @ashiquehoque762
    @ashiquehoque762 Рік тому +1

    Could you please share "how QR CODE WORKS?"

    • @interviewpen
      @interviewpen  Рік тому +2

      Thanks for watching! We'll add that to the list of things to cover. But from a basic perspective, a QR code reader looks for predefined patterns in the image; then it reads the black/white squares in a specific order. Each square is read as a bit, 1 or 0, and all together they form a binary representation of a URL or other message.

  • @gxo-mt5vo
    @gxo-mt5vo Рік тому +2

    Useful video but focused too much on back of envelope calculations, and we have 100 mil writes per day, not 10 bil

    • @interviewpen
      @interviewpen  Рік тому

      There's 100 million users, each performing 100 edits per day => 10B edits per day. The back of the envelope math might seem grueling, but it's really important to make sure we choose the right solutions to scale the system. Thanks for watching, and for the feedback!

    • @vinaychavadi7411
      @vinaychavadi7411 11 місяців тому

      @@interviewpen DAU is 1 million users, 100 edits per day per user => 100 Million edits perday.

    • @natalitshuva9530
      @natalitshuva9530 Місяць тому

      @@interviewpen there's 1M users a day (DAU), not 100M, so the calculation in the video is wrong

  • @pradgarimella
    @pradgarimella 6 місяців тому

    Too much emphasis on calculations. In a real system design interview , candidates will spend 2 mins max on calculations. Anything more you are screwed

    • @interviewpen
      @interviewpen  6 місяців тому

      I don’t agree-one of the most important parts of the system design interview is showing that you can translate product requirements into a solution that fits the use case. This means understanding the load that will be placed on each part of the system. Thanks for watching!

  • @marcusaurelius6607
    @marcusaurelius6607 Рік тому +9

    good attempt. but you would not pass our interview with _that_ level of understanding systems. cheers from DB =)

    • @interviewpen
      @interviewpen  Рік тому +1

      Cool - any specific suggestions on where we could go deeper with our content? Let us know!

    • @avi7278
      @avi7278 Рік тому +17

      You just take this troll at his word that he is from Dropbox?

    • @biswajitsingh8790
      @biswajitsingh8790 Рік тому

      @@avi7278😂😂😂😂

    • @robl39
      @robl39 Рік тому +1

      Please explain what you’d expect

    • @entx8491
      @entx8491 Рік тому

      ​@@avi7278nothing suggests he did, it's still a valid question which would in turn make his statement valid.

  • @aadill77
    @aadill77 6 місяців тому

    very bad explanation. and the architecture is also not crisp. too naive

  • @islamicmedia.c-s3g
    @islamicmedia.c-s3g 11 місяців тому

    Hi