Thursday, October 6, 2011

Review: MapReduce

MapReduce is used for programming distributed system that's currently very popular. Users specify a map function that processes a key/value pair to generate intermediate pairs of key/value, which is then passed to a reduce function that merges all those intermediate pairs of key/value in a way that's specified by the users too. This abstraction is very nice since the users do not need to focus on the complexities presented by distributed systems (e.g. failure tolerance, consistency issues, etc) and just focus on coding the main algorithm of the task (e.g. counting the number of words in the internet, etc) instead. The paper also goes on to describe how common tasks such as unix utilities could be adapted to the MapReduce programming framework. However, as simple this abstraction may be, this imposes restriction on the programmers to be creative. Also, I feel that some low-level details are still being exposed to the programmers, such as how the users are required to specify the number of mappers that they will need. I believe that there are a lot of improvements that need to be made on the programming framework.

No comments:

Post a Comment