Thursday, September 29, 2011
Review: GFS
Google File System is a scalable distributed file system for large distributed data-intensive applications that has been implemented and used in Google. The goal of the file system has been the same as any previous distributed file system; to achieve a compromise between performance, scalability, reliability & scalability. To do that, they made a different fundamental assumption. Firstly, component failures are the norm than the exception. Secondly, files are very big (In orders of multi-GB). Finally, there are barely any random writes in the system; Thus, most writes exist by appending new data to the system. Architecture-wise, their system is very interesting too. There is a single master with a number of chunkservers. The master node is the application "contact point" for meta-data business such as retrieving the location of the data they would like to retrieve, etc. The paper then goes on to describe on how snapshots are implemented, etc. Some of the things that I'm wondering is that how will the assumptions differ if Google were to build their file system now (eg: Is it going to be also mostly append-only system? Is it only 1 master system?). However, clearly we have seen that GFS has been very successful as it has been utilized as the storage system within Google and their model is very good in the distributed system literature.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment