Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
The 0.9.0 release continues the trend of extending Hive’s SQL support. Hive now understands the BETWEEN operator and the NULL-safe equality operator, plus several new user defined functions (UDF) have now been added. New UDFs include printf(), sort_array(), and java_method(). Also, the concat_ws() function has been modified to support input parameters consisting of arrays of strings.
This Hive release also includes several significant improvements to the query compiler and execution engine. HIVE-2642 improved Hive’s ability to optimize UNION queries, HIVE-2881 made the the map-side JOIN algorithm more efficient, and Hive’s ability to generate optimized execution plans for queries that contain multiple GROUP BY clauses was significantly improved in HIVE-2621.
HBase users will also be interested in several improvements to Hive’s HBase StorageHandler, mainly:
- The ability to access primitive types stored in binary format within HBase (HIVE-1634),
- And support for filter-pushdown for keys (HIVE-2861, HIVE-2815, HIVE-2771).