How does a distributed file system like Google GFS work?

Поділитися
Вставка
  • Опубліковано 3 жов 2024
  • I've been reading up distributed file systems more specifically "The Google File System" which is the original seminal paper written by Google, this then gave birth to other distributed file systems like "Hadoop Distributed File System" aka HDFS, as well as Amazon S3. While S3 is not an open standard, it's practically used by all the cloud vendors.
    Here is the link to the original Google paper:
    pdos.csail.mit...
    In this session we take a very broad and high level overview of how distributed file systems work in general.
    For me I learn by building, and my hope is that eventually we can potentially explore building a toy distributed file system!

КОМЕНТАРІ • 5

  • @TavishMcEwen
    @TavishMcEwen 2 дні тому +1

    This is right up my alley

  • @TavishMcEwen
    @TavishMcEwen 2 дні тому

    Great video dude
    Have you thought about whether this could work without a master server?

    • @watthedoodle
      @watthedoodle  День тому

      Thanks buddy! that's an interesting question, I haven't thought about a master less distributed system! I guess one could make it such that each worker node could act as a master and then use things like the RAFT or Paxos consensus algorithm to elect a different leader, so in essence making any worker node be able to take on master node role. But I think that would make the design a lot more complex! but still an interesting idea!

    • @TavishMcEwen
      @TavishMcEwen День тому

      @@watthedoodle I'm wondering whether a system similar to Distributed Hash Tables(without the hashing) could be implemented, where there is no master whatsoever

    • @watthedoodle
      @watthedoodle  День тому +1

      @@TavishMcEwen maybe, but this metastore is highly dynamic as nodes will get re-balanced so using a DHT would not be the appropriate. The metastore itself however could be made to have "high availability" by being clustered itself. An distributed object file system is dynamic on top of the worker nodes being dynamic as well, meaning we have to also contend with not only nodes joining and dying, but also files mutating as well. All this to me indicates having a centralised metastore is vastly more efficient and simpler.