The Cloudera‘s Distribution for Hadoop (CDH) cloud scripts enable you to run Hadoop on cloud providers’ clusters. CDH consists of 100% open source Apache Hadoop plus nine other open source projects from the Hadoop ecosystem. CDH is thoroughly tested and certified to integrate with the widest range of operating systems and hardware, databases and data warehouses, and business intelligence and ETL systems.
As a reminder, Cloudera releases major versions of CDH, our 100% open source distribution of Hadoop and related projects, annually and then updates to CDH every three months. Updates primarily comprise bug fixes but we will also add enhancements. We only include fixes or enhancements in updates that maintain compatibility, improve system stability and still allow customers and users to skip updates as they see fit.
We’re pleased to announce the availability of CDH4.1. We’ve seen excellent adoption of CDH4.0 since it went GA at the end of June and a number of exciting use cases have moved to production. CDH4.1 is an update that has a number of fixes but also a number of useful enhancements. Among them:
- Quorum based storage – Quorum-based Storage for HDFS provides the ability for HDFS to store its own NameNode edit logs, allowing you to run a highly available NameNode without external storage or custom fencing.
- Hive security and concurrency – we’ve fixed some long standing issues with running Hive. With CDH4.1, it is now possible to run a shared Hive instance where users submit queries using Kerberos authentication. In addition this new Hive server supports multiple users submitting queries at the same time.
- Support for DataFu – the LinkedIn data science team was kind enough to open source their library of Pig UDFs that make it easier to perform common jobs like sessionization or set operations. Big thanks to the LinkedIn team!!!
- Oozie workflow builder – since we added Oozie to CDH more than two years ago, we have often had requests to make it easier to develop Oozie workflows. The newly enhanced job designer in Hue enables users to use a visual tool to build and run Oozie workflows.
- FlumeNG improvements – since its release, FlumeNG has become the backbone for some exciting data collection projects, in some cases collecting as much as 20TB of new event data per day. In CDH4.1 we added an HBase sink as well as metrics for monitoring as well as a number of performance improvements.
- Various performance improvements – CDH4.1 users should experience a boost in their MapReduce performance from CDH4.0.
- Various security improvements – CDH4.1 enables users to configure the system to encrypt data in flight during the shuffle phase. CDH now also applies Hadoop security to users who access the filesystem via a FUSE mount.
CDH4.1 is available on all of the usual platforms and form factors. You can install it via Cloudera Manager or learn how to install the packages manually here.