Apache MRUnit 0.9.0-incubating has been released

MRUnit is a Java library that helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended. This is considered a best practice in software development since it helps identify defects early, before they’re deployed to a production system.

The MRUnit project is quite active, 0.9.0 is our fourth release since entering the incubator and we have added 4 new committers beyond the projects initial charter! We are very interested in having new contributors and committers join the project! Please join our mailing list to find out how you can help!

The MRUnit build process has changed to produce mrunit-0.9.0-hadoop1.jar and mrunit-0.9.0-hadoop2.jar instead of mrunit-0.9.0-hadoop020.jar, mrunit-0.9.0-hadoop100.jar and mrunit-0.9.0-hadoop023.jar. The hadoop1 classifier is for all Apahce Hadoop versions based off the 0.20.X line including 1.0.X. The hadoop2 classifier is for all Apache Hadoop versions based off the 0.23.X line including the unreleased 2.0.X.

This release contains 6 bug fixes, 15 improvements, and 2 new features. I will highlight a few below:

  • Support custom counter checking in MRUNIT-68
  • runTest() should optionally ignore output order in MRUNIT-91
  • Driver.runTest throws RuntimeException should it throw AssertionError in MRUNIT-54
  • o.a.h.mrunit.mapreduce.MapReduceDriver should support a combiner in MRUNIT-67
  • Better support for other serializations besides Writable: MRUNIT-70MRUNIT-86MRUNIT-99MRUNIT-77
  • Better error messages from validate, null checking and forgetting to set mappers and reducers: MRUNIT-74MRUNIT-66MRUNIT-65
  • add static convenience methods to PipelineMapReduceDriver class in MRUNIT-89
  • Test and Deprecate Driver.{*OutputFromString,*InputFromString} Methods in MRUNIT-48

An Apache 2.2 module serving files from MongoDB

mod_gridfs” is an Apache 2.x module that supports serving of files from MongoDB GridFS.

More details available here: https://bitbucket.org/onyxmaster/mod_gridfs/



CouchDB 1.2.0 has been released

Big time for Apache CouchDB, the 1.2.0 version has been released and is now available for download.

You can grab your copy here:


Windows packages are now available. Grab them at the same download link.

This release also coincides with a revamped project homepage!

This is a big release with lots of updates. Please also note that this release contains breaking changes.

These release notes are based on the NEWS file.


  • Added a native JSON parser

    Performance critical portions of the JSON parser are now implemented in C. This improves latency and throughput for all database and view operations. We are using the fabulous yajl library.

  • Optional file compression (database and view index files)

    This feature is enabled by default.

    All storage operations for databases and views are now passed through Google’s snappy compressor. The result is simple: since less data has to be transferred from and to disk and through the CPU & RAM, all database and view accesses are now faster and on-disk files are smaller. Compression can be changed to gzip compression with options that specify the compression ratio or it can be fully disabled as well.

  • Several performance improvements, especially regarding database writes and view indexing

    Combined with the two preceding improvements, we made some less obvious algorithmic improvements that take the Erlang runtime system into account when writing data to databases and view index files. The net result is much improved performance for most common operations including building views.

    The JIRA ticket (COUCHDB-976) has more information.

  • Performance improvements for the built-in changes feed filters _doc_ids and _design


The security system got a major overhaul making it way more secure to run CouchDB as a public database server for CouchApps. Unfortunately we had to break a bit of backwards compatibility with this, but we think it is well worth the trouble.

  • Documents in the _users database can no longer be read by everyone

    Documents in the _users databases can now only be read by the respective authenticated user and administrators. Before, all docs were world-readable including their password hashes and salts.

  • Confidential information in the _replication database can no longer be read by everyone

    Similar to documents in the _users database, documents in the _replicator database now get passwords and OAuth tokens stripped when read by a user that is not the creator of the replication or an administrator.

  • Password hashes are now calculated by CouchDB instead of the client

    Previously, CouchDB relied on the client to hash and salt the user’s password. Now, it accepts plain text passwords and hashes them before they are committed to disk, following traditional best practices.

  • Allow persistent authentication cookies

    Cookie based authentication can now keep a user logged in over a browser restart.

  • OAuth secrets can now be stored in the users system database

    This is better for managing large numbers of users and tokens than the old, clumsy way of storing OAuth tokens in the configuration system and configuration system.

  • Updated bundled erlang_oauth library to the latest version

    The Erlang library that handles OAuth authentication has been updated to the latest version.

Build System

  • cURL is no longer required to build CouchDB as it is only required by the command line JavaScript test runner

    This makes building CouchDB on certain platforms easier.


  • Added a data_size property to database and view group information URIs

    With this you can now calculate how much actual data is stored in a database file or view index file and compare it with the file size that is already being reported. The difference is CouchDB-specific overhead most of which can be reclaimed during compaction. This is used to power the automatic compaction feature (see below).

  • Added optional field since_seq to replication objects/documents

    This allows you to start a replication from a certain database update sequence instead from the start.

  • The _active_tasks API now exposes more granular fields for each task type

    The replication and compaction tasks, e.g. report their progress in the task info.

  • Added built-in changes feed filter _view

    With this you can use a view’s map function as a changes filter instead of duplicating.

Core Storage

  • Added support for automatic compaction

    This feature is disabled by default, but it can be enabled in the configuration page in Futon or the .ini files.

    Compaction is a regular maintenance task for CouchDB. This can now be automated based on multiple variables:

    • A threshold for the file_size to disk_size ratio (say 70%)
    • A time window specified in hours and minutes (e.g 01:00-05:00)

    Compaction can be cancelled if it exceeds the closing time. Compaction for views and databases can be set to run in parallel, but that is only useful for setups where the database directory and view directory are on different disks.

    In addition, if there’s not enough space (2 × data_size) on the disk to complete a compaction, an error is logged and the compaction is not started.


  • A new replicator implementation that offers more performance and configuration options

    The replicator has been rewritten from scratch. The new implementation is more reliable, faster and has more configuration than the previous implementation. If you have had any issues with replication in previous releases, we strongly recommend giving 1.2.0 a spin.

    Configuration options include:

    • Number of worker processes
    • Batch size per worker
    • Maximum number of HTTP connections
    • Number of connection retries

    See default.ini for the full list of options and their default values.

    This allows you to fine-tune replication behaviour tailored to your environment. A spotty mobile network connection can benefit from a single worker process and small batch sizes to reliably, albeit slowly, synchronise data. A full-duplex 10GigE server-to-server connection on a LAN can benefit from more workers and higher batch sizes. The exact values depend on your particular setup and we recommend some experimentation before settling on a set of values.


  • Futon’s Status screen (active tasks) now displays two new task status fields: Started on and Updated on
  • Simpler replication cancellation

    Running replications can now be cancelled with a single click.

Log System

  • Log correct stack trace in all cases

    In certain error cases, CouchDB would return a stack trace from the log system itself and hide the real error. Now CouchDB always returns the correct error.

  • Improvements to log messages for file-related errors

    CouchDB requires correct permissions for a number of files. Error messages related to file permission errors were not always obvious and are now improved.

Various Bugfixes

  • Fixed old index file descriptor leaks after a view cleanup
  • Fixes to the _changes feed heartbeat option when combined with a filter. It affected continuous pull replications with a filter
  • Fix use of OAuth with VHosts and URL rewriting
  • The requested_path property of query server request objects now has the path requested by clients before VHosts and rewriting
  • Fixed incorrect reduce query results when using pagination parameters
  • Made icu_driver work with Erlang R15B and later
  • Improvements to the build system and etap test suite
  • Avoid invalidating view indexes when running out of file descriptors

Breaking Changes

This release contains breaking changes:


It is very important that you understand these changes before you upgrade.

HBase 0.92.1 has been released

Apache HBase 0.92.1 is now available. This release is a marked improvement in system correctness, availability, and ease of use. It’s also backwards compatible with 0.92.0 — except for the removal of the rarely-used transform functionality from the REST interface in HBASE-5228.

Apache HBase 0.92.1 is a bug fix release covering 61 issues – including 6 blockers and 6 critical issues, such as:

Apache ZooKeeper 3.3.5 has been released

Apache ZooKeeper release 3.3.5 is now available. This is a bug fix release covering 11 issues, two of which were considered blockers. Some of the more serious issues include:

  • ZOOKEEPER-1367 Data inconsistencies and unexpired ephemeral nodes after cluster restart
  • ZOOKEEPER-1412 Java client watches inconsistently triggered on reconnect
  • ZOOKEEPER-1277 Servers stop serving when lower 32bits of zxid roll over
  • ZOOKEEPER-1309 Creating a new ZooKeeper client can leak file handles
  • ZOOKEEPER-1389 It would be nice if start-foreground used exec $JAVA in order to get rid of the intermediate shell process
  • ZOOKEEPER-1089 zkServer.sh status does not work due to invalid option of nc

Stability, Compatibility and Testing

3.3.5 is a stable release that’s fully backward compatible with 3.3.4. Only bug fixes relative to 3.3.4 have been applied. Version 3.3.5 will be incorporated into the upcoming CDH3U4 release.

HBase 0.90.6 has been released

Apache HBase 0.90.6 is now available. It is a bug fix release covering 31 bugs and 5 improvements.  Among them, 3 are blockers and 3 are critical, such as:

  • HBASE-5008HBase can not provide services to a region when it can’t flush the region, but considers it stuck in flushing,
  • HBASE-4773: HBaseAdmin may leak ZooKeeper connections,
  • HBASE-5060: HBase client may be blocked forever when there is a temporary network failure.

This release has improved system robustness and availability by fixing bugs that cause potential data loss, system unavailability, possible deadlocks, read inconsistencies and resource leakage.

The 0.90.6 release is backward compatible with 0.90.5. The fixes in this release will be included in CDH3u4.

Apache Hadoop 0.23.1 has been released

Hadoop-0.23.1 contains several major advances from 0.23.0:

  • Lots of bug fixes and improvements in both HDFS and MapReduce
  • Major performance work to make this release either match or exceed performance of Hadoop-1 in most aspects of both HDFS and MapReduce.
  • Several downstream projects like HBase, Pig, Oozie, Hive etc. are better integrated with this release


See the Hadoop 0.23.1 Release Notes for more details.


Download page

Apache HTTP Server v2.4 has been released

Numerous enhancements make Apache HTTP Server v2.4 ideally suited for Cloud environments. They include:
•    Improved performance (lower resource utilization and better concurrency)
•    Reduced memory usage
•    Asyncronous I/O support
•    Dynamic reverse proxy configuration
•    Performance on par, or better, than pure event-driven Web servers
•    More granular timeout and rate/resource limiting capability
•    More finely-tuned caching support, tailored for high traffic servers and proxies.

Additional features include easier problem analysis, improved configuration flexibility, more powerful authentication and authorization, and documentation overhaul. For the complete feature list, please see http://httpd.apache.org/docs/2.4/new_features_2_4.html

Apache ZooKeeper 3.4.3 has been released

Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination.

Apache ZooKeeper 3.4.3 has been released. This is a bug fix release covering 18 issues


  • ZOOKEEPER-1367 is the most serious of the issues addressed, it could cause data corruption on restart. This version also adds support for compiling the client on ARM architectures.
  • ZOOKEEPER-1367  Data inconsistencies and unexpired ephemeral nodes after cluster restart
  • ZOOKEEPER-1343  getEpochToPropose should check if lastAcceptedEpoch is greater or equal than epoch
  • ZOOKEEPER-1373  Hardcoded SASL login context name clashes with Hadoop security configuration override
  • ZOOKEEPER-1089  zkServer.sh status does not work due to invalid option of nc
  • ZOOKEEPER-973    bind() could fail on Leader because it does not setReuseAddress on its ServerSocket
  • ZOOKEEPER-1374  C client multi-threaded test suite fails to compile on ARM architectures.
  • ZOOKEEPER-1348  Zookeeper 3.4.2 C client incorrectly reports string version of 3.4.1

If you are running 3.4.2 or earlier, be sure to upgrade immediately. See my earlier post for details on what’s new in 3.4.

Stability, Compatibility and Testing

The 3.4 series has been through a number of releases, incorporating feedback from users and addressing found issues. The Apache community is now considering the 3.4.3 release to be of beta quality.

Apache Cassandra 0.8.10 has been released

Cassandra 0.8.10 has been released,this version is a maintenance/bug fix release[1] for the 0.8 branch

It introduces the following changes:

  • fix thread safety issues in commitlog replay, primarily affecting systems with many (100s) of CF definitions (CASSANDRA-3751)
  • prevent new nodes from thinking down nodes are up forever (CASSANDRA-3626)
  • use correct list of replicas for LOCAL_QUORUM reads when read repairis disabled (CASSANDRA-3696)
  • block on flush before compacting hints (may prevent OOM) (CASSANDRA-3733)
  • (Pig) fix CassandraStorage to use correct comparator in Super ColumnFamily case (CASSANDRA-3251)
  • Fix relevant tomstone ignored with super columns (CASSANDRA-3875)
  • Support TimeUUID in CassandraStorage (CASSANDRA-3327)


Release note