How does a distributed file system like Google GFS work?
Вставка
- Опубліковано 3 жов 2024
- I've been reading up distributed file systems more specifically "The Google File System" which is the original seminal paper written by Google, this then gave birth to other distributed file systems like "Hadoop Distributed File System" aka HDFS, as well as Amazon S3. While S3 is not an open standard, it's practically used by all the cloud vendors.
Here is the link to the original Google paper:
pdos.csail.mit...
In this session we take a very broad and high level overview of how distributed file systems work in general.
For me I learn by building, and my hope is that eventually we can potentially explore building a toy distributed file system!
This is right up my alley
Great video dude
Have you thought about whether this could work without a master server?
Thanks buddy! that's an interesting question, I haven't thought about a master less distributed system! I guess one could make it such that each worker node could act as a master and then use things like the RAFT or Paxos consensus algorithm to elect a different leader, so in essence making any worker node be able to take on master node role. But I think that would make the design a lot more complex! but still an interesting idea!
@@watthedoodle I'm wondering whether a system similar to Distributed Hash Tables(without the hashing) could be implemented, where there is no master whatsoever
@@TavishMcEwen maybe, but this metastore is highly dynamic as nodes will get re-balanced so using a DHT would not be the appropriate. The metastore itself however could be made to have "high availability" by being clustered itself. An distributed object file system is dynamic on top of the worker nodes being dynamic as well, meaning we have to also contend with not only nodes joining and dying, but also files mutating as well. All this to me indicates having a centralised metastore is vastly more efficient and simpler.