CDH 4.1.3 has been released

CDH 4.1.3 is now available. CDH (Cloudera’s Distribution, including Apache Hadoop) is Cloudera’s 100% open-source Hadoop distribution . This version is a maintenance release that fixing some key issues including the following:

  • HBASE-7498 – Make REST server thread pool size configurable
  • HADOOP-6762 – Exception while doing RPC I/O closes channel
  • OOZIE-1130 -
  • Upgrade from 3.2 to 3.3 failing due to change in WorkflowInstance structure
  • MAPREDUCE-2217 – The expire launching task should cover the UNASSIGNED task
  • MAPREDUCE-4907 - TrackerDistributedCacheManager issues too many getFileStatus calls
  • OOZIE-994 – ActionCheckXCommand does not handle failures properly


Release note

CDH 4.1 has been released

The Cloudera‘s Distribution for Hadoop (CDH) cloud scripts enable you to run Hadoop on cloud providers’ clusters. CDH consists of 100% open source Apache Hadoop plus nine other open source projects from the Hadoop ecosystem. CDH is thoroughly tested and certified to integrate with the widest range of operating systems and hardware, databases and data warehouses, and business intelligence and ETL systems.


As a reminder, Cloudera releases major versions of CDH, our 100% open source distribution of Hadoop and related projects, annually and then updates to CDH every three months.  Updates primarily comprise bug fixes but we will also add enhancements.  We only include fixes or enhancements in updates that maintain compatibility, improve system stability and still allow customers and users to skip updates as they see fit.

We’re pleased to announce the availability of CDH4.1.  We’ve seen excellent adoption of CDH4.0 since it went GA at the end of June and a number of exciting use cases have moved to production.  CDH4.1 is an update that has a number of fixes but also a number of useful enhancements.  Among them:

  • Quorum based storage – Quorum-based Storage for HDFS provides the ability for HDFS to store its own NameNode edit logs, allowing you to run a highly available NameNode without external storage or custom fencing.
  • Hive security and concurrency – we’ve fixed some long standing issues with running Hive.  With CDH4.1, it is now possible to run a shared Hive instance where users submit queries using Kerberos authentication.  In addition this new Hive server supports multiple users submitting queries at the same time.
  • Support for DataFu – the LinkedIn data science team was kind enough to open source their library of Pig UDFs that make it easier to perform common jobs like sessionization or set operations.  Big thanks to the LinkedIn team!!!
  • Oozie workflow builder – since we added Oozie to CDH more than two years ago, we have often had requests to make it easier to develop Oozie workflows.  The newly enhanced job designer in Hue enables users to use a visual tool to build and run Oozie workflows.
  • FlumeNG improvements –  since its release, FlumeNG has become the backbone for some exciting data collection projects, in some cases collecting as much as 20TB of new event data per day.  In CDH4.1 we added an HBase sink as well as metrics for monitoring as well as a number of performance improvements.
  • Various performance improvements – CDH4.1 users should experience a boost in their MapReduce performance from CDH4.0.
  • Various security improvements – CDH4.1 enables users to configure the system to encrypt data in flight during the shuffle phase.  CDH now also applies Hadoop security to users who access the filesystem via a FUSE mount.

CDH4.1 is available on all of the usual platforms and form factors.  You can install it via Cloudera Manager or learn how to install the packages manually here.


Understanding Cloudera by using its VirtualBox Demo

Cloudera is the Apache packaging solution to deploy the integrated solution including Hadoop, Sqoop, Pig, Hive, HBase, ZooKeeper, Oozie, Hume, Flume, and Whirr, The Cloudera VirtualBox Demo will bring you all this platform configured and ready to experiment with in less than 5 minutes.

Making it easy for users to experiment with these tools increases the chances for adoption.

CDH Mac OS X VirtualBox VM



Video: What is Hadoop? Other big data terms like MapReduce?

What is Hadoop? Other big data terms like MapReduce? Cloudera’s CEO talks us through big data: