Friday, October 7, 2011

Review: Pig Latin & Hive

PigLatin & Hive are designed with similar goal in mind and thus, they share common functionalities. Both of them are written to execute queries/plans on HDFS, an open-source, map-reduce implementation. Both of them also have schema for metadata. Both of them provide relatively simple SQL optimizations when compared to standard RDBMS.

Hive is a data warehouse infrastructure built on top of Hadoop that facillitates querying and managing large datasets residing in distributed storage. Hive also defines a simple SQL-like qeury, called QL. This SQL queries are compiled into MapReduce jobs to be executed as efficiently as possible. Providing this SQL-like interface is better for system administrations in my opinion since that means that sys admins will be familiar with the commands. PigLatin, which is created by Yahoo!, states that it is designed to be the sweet spot between SQL & MapReduce. The nice thing is that Pig has a debugger to its language. On the other hand, Hive has a web-interface to visualize the various schemas and issue queries which would be a great help to the developers.

No comments:

Post a Comment