Thursday, September 29, 2011

Review: GFS

Google File System is a scalable distributed file system for large distributed data-intensive applications that has been implemented and used in Google. The goal of the file system has been the same as any previous distributed file system; to achieve a compromise between performance, scalability, reliability & scalability. To do that, they made a different fundamental assumption. Firstly, component failures are the norm than the exception. Secondly, files are very big (In orders of multi-GB). Finally, there are barely any random writes in the system; Thus, most writes exist by appending new data to the system. Architecture-wise, their system is very interesting too. There is a single master with a number of chunkservers. The master node is the application "contact point" for meta-data business such as retrieving the location of the data they would like to retrieve, etc. The paper then goes on to describe on how snapshots are implemented, etc. Some of the things that I'm wondering is that how will the assumptions differ if Google were to build their file system now (eg: Is it going to be also mostly append-only system? Is it only 1 master system?). However, clearly we have seen that GFS has been very successful as it has been utilized as the storage system within Google and their model is very good in the distributed system literature.

No comments:

Post a Comment