NoSQL Benchmark

There is probably no perfect NoSQL database. Every database has its advantages and disadvantages that become more or less important depending on your preferences and the type of tasks your trying to achieve.

Altoros Systems as performed an independent and interesting benchmark to help you sort out the current prons and crons between different solution including: HBase,Cassandra,Riak and MongoDb

http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/tech/2012/102212-nosql-263595.html

What makes this research unique?

Often referred to as NoSQL, non-relational databases feature elasticity and scalability in combination with a capability to store big data and work with cloud computing systems, all of which make them extremely popular. NoSQL data management systems are inherently schema-free (with no obsessive complexity and a flexible data model) and eventually consistent (complying with BASE rather than ACID). They have a simple API, serve huge amounts of data and provide high throughput.

In 2012, the number of NoSQL products reached 120-plus and the figure is still growing. That variety makes it difficult to select the best tool for a particular case. Database vendors usually measure productivity of their products with custom hardware and software settings designed to demonstrate the advantages of their solutions. We wanted to do independent and unbiased research to complement the work done by the folks at Yahoo.

Using Amazon virtual machines to ensure verifiable results and research transparency (which also helped minimize errors due to hardware differences), we have analyzed and evaluated the following NoSQL solutions:

● Cassandra, a column family store
● HBase (column-oriented, too)
● MongoDB, a document-oriented database
● Riak, a key-value store

We also tested MySQL Cluster and sharded MySQL, taking them as benchmarks.

After some of the results had been presented to the public, some observers said MongoDB should not be compared to other NoSQL databases because it is more targeted at working with memory directly. We certainly understand this, but the aim of this investigation is to determine the best use cases for different NoSQL products. Therefore, the databases were tested under the same conditions, regardless of their specifics.

Cassandra 1.1.6 has been released

Cassandra 1.1.6 has been release and can be downloaded here: http://cassandra.apache.org/download

Changes in version 1.1.6:
 * Wait for writes on synchronous read digest mismatch (CASSANDRA-4792)
 * fix commitlog replay for nanotime-infected sstables (CASSANDRA-4782)
 * preflight check ttl for maximum of 20 years (CASSANDRA-4771)
 * (Pig) fix widerow input with single column rows (CASSANDRA-4789)
 * Fix HH to compact with correct gcBefore, which avoids wiping out
   undelivered hints (CASSANDRA-4772)
 * LCS will merge up to 32 L0 sstables as intended (CASSANDRA-4778)
 * NTS will default unconfigured DC replicas to zero (CASSANDRA-4675)
 * use default consistency level in counter validation if none is
   explicitly provide (CASSANDRA-4700)
 * Improve IAuthority interface by introducing fine-grained
   access permissions and grant/revoke commands (CASSANDRA-4490, 4644)
 * fix assumption error in CLI when updating/describing keyspace
   (CASSANDRA-4322)
 * Adds offline sstablescrub to debian packaging (CASSANDRA-4642)
 * Automatic fixing of overlapping leveled sstables (CASSANDRA-4644)
 * fix error when using ORDER BY with extended selections (CASSANDRA-4689)
 * (CQL3) Fix validation for IN queries for non-PK cols (CASSANDRA-4709)
 * fix re-created keyspace disappering after 1.1.5 upgrade
   (CASSANDRA-4698, 4752)
 * (CLI) display elapsed time in 2 fraction digits (CASSANDRA-3460)
 * add authentication support to sstableloader (CASSANDRA-4712)
 * Fix CQL3 'is reversed' logic (CASSANDRA-4716, 4759)
 * (CQL3) Don't return ReversedType in result set metadata (CASSANDRA-4717)
 * Backport adding AlterKeyspace statement (CASSANDRA-4611)
 * (CQL3) Correcty accept upper-case data types (CASSANDRA-4770)
 * (cqlsh) Fix table completion for CREATE KEYSPACE (CASSANDRA-4334)
 * Support allow_deletes for Hadoop clusters (CASSANDRA-4499)
 * (cqlsh) Provide consistent ordering for COPY TO and COPY FROM (CASSANDRA-4594)
 * Fix race when setting cql version with thrift sync server (CASSANDRA-4657)
 * (CQL3) Fix start IN queries with ORDER BY (CASSANDRA-4689)
 * (cqlsh) Fix auto completion with fully qualified names (CASSANDRA-4423)
 * (CLI) allow to insert double values (CASSANDRA-4661)
 * (cqlsh) Multi-line support for history buffer (CASSANDRA-4666)
Merged from 1.0:
 * Switch from NBHM to CHM in MessagingService's callback map, which
   prevents OOM in long-running instances (CASSANDRA-4708)

Cassandra 1.0.12 has been released

Cassandra 1.0.12 has been released introducing the following changes:

  • Switch from NBHM to CHM in MessagingService’s callback map, whichprevents OOM in long-running instances (CASSANDRA-4708)
  • increase Xss to 160k to accomodate latest 1.6 JVMs (CASSANDRA-4602)
  • fix toString of hint destination tokens (CASSANDRA-4568)
  • (Hadoop) fix setting key length for old-style mapred api (CASSANDRA-4534)
  • (Hadoop) fix iterating through a resultset consisting entirely
  • of tombstoned rows (CASSANDRA-4466)
  • Fix multiple values for CurrentLocal NodeID (CASSANDRA-4626)

 

Download http://cassandra.apache.org/download/

Cassandra 1.1.4 has been released

Cassandra 1.1.4 (maintenance) has been released and can be downloaded here:  http://cassandra.apache.org/download

Bug fixed:

  • fix offline scrub to catch >= out of order rows (CASSANDRA-4411)
  • fix cassandra-env.sh on RHEL and other non-dash-based systems(CASSANDRA-4494)
  • Merged from 1.0, (Hadoop) fix setting key length for old-style mapred api (CASSANDRA-4534)

Cassandra 1.1.3 has been released

Cassandra 1.1.3  has been released, download available here

  • munmap commitlog segments before rename (CASSANDRA-4337)
  • (JMX) rename getRangeKeySample to sampleKeyRange to avoid returning multi-MB results as an attribute (CASSANDRA-4452)
  • flush based on data size, not throughput; overwritten columns no longer artificially inflate liveRatio (CASSANDRA-4399)
  • update default commitlog segment size to 32MB and total commitlog size to 32/1024 MB for 32/64 bit JVMs, respectively (CASSANDRA-4422)
  • avoid using global partitioner to estimate ranges in index sstables (CASSANDRA-4403)
  • restore pre-CASSANDRA-3862 approach to removing expired tombstones from row cache during compaction (CASSANDRA-4364)
  • (stress) support for CQL prepared statements (CASSANDRA-3633)
  • Correctly catch exception when Snappy cannot be loaded (CASSANDRA-4400)
  • (cql3) Support ORDER BY when IN condition is given in WHERE clause (CASSANDRA-4327)
  • (cql3) delete “component_index” column on DROP TABLE call (CASSANDRA-4420)
  • change nanoTime() to currentTimeInMillis() in schema related code (CASSANDRA-4432)
  • add a token generation tool (CASSANDRA-3709)
  • Fix LCS bug with sstable containing only 1 row (CASSANDRA-4411)
  • fix “Can’t Modify Index Name” problem on CF update (CASSANDRA-4439)
  • Fix assertion error in getOverlappingSSTables during repair (CASSANDRA-4456)
  • fix nodetool’s setcompactionthreshold command (CASSANDRA-4455)
  • Ensure compacted files are never used, to avoid counter overcount (CASSANDRA-4436)
  • Merged from 1.0:
    • Push the validation of secondary index values to the SecondaryIndexManager (CASSANDRA-4240)
    • (Hadoop) fix iterating through a resultset consisting entirely of tombstoned rows (CASSANDRA-4466)

Cassandra 1.0.11 has been released

Cassandra 1.0.11 has been released and can be downloaded here

It introduces the following bug changes and bugfix

  • Allow dropping columns shadowed by not-yet-expired supercolumn or row tombstones in PrecompactedRow (CASSANDRA-4396)
  • synchronize LCS getEstimatedTasks to avoid CME (CASSANDRA-4255)
  • ensure unique streaming session id’s (CASSANDRA-4223)
  • kick off background compaction when min/max thresholds change (CASSANDRA-4279)
  • improve ability of STCS.getBuckets to deal with 100s of 1000s of sstables, such as when convertinb back from LCS (CASSANDRA-4287)
  • Oversize integer in CQL throws NumberFormatException (CASSANDRA-4291)
  • Set gc_grace on index CF to 0 (CASSANDRA-4314)
  • fix 1.0.x node join to mixed version cluster, other nodes >= 1.1 (CASSANDRA-4195)
  • Fix LCS splitting sstable base on uncompressed size (CASSANDRA-4419)
  • Push the validation of secondary index values to the SecondaryIndexManager (CASSANDRA-4240)
  • Don’t purge columns during upgradesstables (CASSANDRA-4462)
  • Make cqlsh work with piping (CASSANDRA-4113)
  • Validate arguments for nodetool decommission (CASSANDRA-4061)
  • Report thrift status in nodetool info (CASSANDRA-4010)

Cassandra 1.1.2 has been released

Cassandra 1.1.2 has been released and can be downloaded here
Bug fixes and other changes:
  • Fix cleanup not deleting index entries (CASSANDRA-4379)
  • Use correct partitioner when saving + loading caches (CASSANDRA-4331)
  • Check schema before trying to export sstable (CASSANDRA-2760)
  • Raise a meaningful exception instead of NPE when PFS encountersan unconfigured node + no default (CASSANDRA-4349)
  • fix bug in sstable blacklisting with LCS (CASSANDRA-4343)
  • LCS no longer promotes tiny sstables out of L0 (CASSANDRA-4341)
  • skip tombstones during hint replay (CASSANDRA-4320)
  • fix NPE in compactionstats (CASSANDRA-4318)
  • enforce 1m min keycache for auto (CASSANDRA-4306)
  • Have DeletedColumn.isMFD always return true (CASSANDRA-4307)
  • (cql3) exeption message for ORDER BY constraints said primary filter can bean IN clause, which is misleading (CASSANDRA-4319)
  • (cql3) Reject (not yet supported) creation of 2ndardy indexes on tables withcomposite primary keys (CASSANDRA-4328)
  • Set JVM stack size to 160k for java 7 (CASSANDRA-4275)
  • cqlsh: add COPY command to load data from CSV flat files (CASSANDRA-4012)
  • CFMetaData.fromThrift to throw ConfigurationException upon error (CASSANDRA-4353)
  • Use CF comparator to sort indexed columns in SecondaryIndexManager(CASSANDRA-4365)
  • add strategy_options to the KSMetaData.toString() output (CASSANDRA-4248)
  • (cql3) fix range queries containing unqueried results (CASSANDRA-4372)
  • (cql3) allow updating column_alias types (CASSANDRA-4041)
  • (cql3) Fix deletion bug (CASSANDRA-4193)
  • Fix computation of overlapping sstable for leveled compaction (CASSANDRA-4321)
  • Improve scrub and allow to run it offline (CASSANDRA-4321)
  • Fix assertionError in StorageService.bulkLoad (CASSANDRA-4368)
  • (cqlsh) add option to authenticate to a keyspace at startup (CASSANDRA-4108)
  • (cqlsh) fix ASSUME functionality (CASSANDRA-4352)
  • Fix ColumnFamilyRecordReader to not return progress > 100% (CASSANDRA-3942)
  • Merged from 1.0:
    •  Set gc_grace on index CF to 0 (CASSANDRA-4314)


				

Cassandra 1.1.1 has been released

Cassandra 1.1.1 has been released and can be downloaded here, it introduces the following changes:

  • add getsstables command to nodetool (CASSANDRA-4199)
  • apply parent CF compaction settings to secondary index CFs (CASSANDRA-4280)
  • preserve commitlog size cap when recycling segments at startup(CASSANDRA-4201)
  • (Hadoop) fix split generation regression (CASSANDRA-4259)
  • ignore min/max compactions settings in LCS, while preserving behavior that min=max=0 disables autocompaction (CASSANDRA-4233)
  • log number of rows read from saved cache (CASSANDRA-4249)
  • calculate exact size required for cleanup operations (CASSANDRA-1404)
  • avoid blocking additional writes during flush when the commitlog gets behind temporarily (CASSANDRA-1991)
  • enable caching on index CFs based on data CF cache setting (CASSANDRA-4197)
  • warn on invalid replication strategy creation options (CASSANDRA-4046)
  • remove [Freeable]Memory finalizers (CASSANDRA-4222)
  • include tombstone size in ColumnFamily.size, which can prevent OOM during sudden mass delete operations by yielding a nonzero liveRatio(CASSANDRA-3741)
  • Open 1 sstableScanner per level for leveled compaction (CASSANDRA-4142)
  • Optimize reads when row deletion timestamps allow us to restrict the set of sstables we check (CASSANDRA-4116)
  • add support for commitlog archiving and point-in-time recovery (CASSANDRA-3690)
  • avoid generating redundant compaction tasks during streaming (CASSANDRA-4174)
  • add -cf option to nodetool snapshot, and takeColumnFamilySnapshot to StorageService mbean (CASSANDRA-556)
  • optimize cleanup to drop entire sstables where possible (CASSANDRA-4079)
  • optimize truncate when autosnapshot is disabled (CASSANDRA-4153)
  • update caches to use byte[] keys to reduce memory overhead (CASSANDRA-3966)
  • add column limit to cli (CASSANDRA-3012, 4098)
  • clean up and optimize DataOutputBuffer, used by CQL compression and CompositeType (CASSANDRA-4072)
  • optimize commitlog checksumming (CASSANDRA-3610)
  • identify and blacklist corrupted SSTables from future compactions (CASSANDRA-2261)
  • Move CfDef and KsDef validation out of thrift (CASSANDRA-4037)
  • Expose API to repair a user provided range (CASSANDRA-3912)
  • Add way to force the cassandra-cli to refresh its schema (CASSANDRA-4052)
  • Avoid having replicate on write tasks stacking up at CL.ONE (CASSANDRA-2889)
  • (cql3) Backwards compatibility for composite comparators in non-cql3-aware clients (CASSANDRA-4093)
  • (cql3) Fix order by for reversed queries (CASSANDRA-4160)
  • (cql3) Add ReversedType support (CASSANDRA-4004)
  • (cql3) Add timeuuid type (CASSANDRA-4194)
  • (cql3) Minor fixes (CASSANDRA-4185)
  • (cql3) Fix prepared statement in BATCH (CASSANDRA-4202)
  • (cql3) Reduce the list of reserved keywords (CASSANDRA-4186)
  • (cql3) Move max/min compaction thresholds to compaction strategy options (CASSANDRA-4187)
  • Fix exception during move when localhost is the only source (CASSANDRA-4200)
  • (cql3) Allow paging through non-ordered partitioner results (CASSANDRA-3771)
  • (cql3) Fix drop index (CASSANDRA-4192)
  • (cql3) Don’t return range ghosts anymore (CASSANDRA-3982)
  • fix re-creating Keyspaces/ColumnFamilies with the same name as dropped ones (CASSANDRA-4219)
  • fix SecondaryIndex LeveledManifest save upon snapshot (CASSANDRA-4230)
  • fix missing arrayOffset in FBUtilities.hash (CASSANDRA-4250)
  • (cql3) Add name of parameters in CqlResultSet (CASSANDRA-4242)
  • (cql3) Correctly validate order by queries (CASSANDRA-4246)
  • rename stress to cassandra-stress for saner packaging (CASSANDRA-4256)
  • Fix exception on colum metadata with non-string comparator (CASSANDRA-4269)
  • Check for unknown/invalid compression options (CASSANDRA-4266)
  • (cql3) Adds simple access to column timestamp and ttl (CASSANDRA-4217)
  • (cql3) Fix range queries with secondary indexes (CASSANDRA-4257)
  • Better error messages from improper input in cli (CASSANDRA-3865)
  • Try to stop all compaction upon Keyspace or ColumnFamily drop (CASSANDRA-4221)
  • (cql3) Allow keyspace properties to contain hyphens (CASSANDRA-4278)
  • (cql3) Correctly validate keyspace access in create table (CASSANDRA-4296)
  • Avoid deadlock in migration stage (CASSANDRA-3882)
  • Take supercolumn names and deletion info into account in memtable throughput (CASSANDRA-4264)
  • Add back backward compatibility for old style replication factor (CASSANDRA-4294)
  • Preserve compatibility with pre-1.1 index queries (CASSANDRA_4264)
Merged from 1.0:
  • Fix super columns bug where cache is not updated (CASSANDRA-4190)
  • fix maxTimesta mp to include row tombstones (CASSANDRA-4116)
  • (CLI) properly handle quotes in create/update keyspace commands (CASSANDRA-4129)
  • Avoids possible deadlock during bootstrap (CASSANDRA-4159)
  • fix stress tool that hangs forever on timeout or error (CASSANDRA-4128)
  • stress tool to return appropriate exit code on failure (CASSANDRA-4188)
  • fix compaction NPE when out of disk space and assertions disabled (CASSANDRA-3985)
  • synchronize LCS getEstimatedTasks to avoid CME (CASSANDRA-4255)
  • ensure unique streaming session id’s (CASSANDRA-4223)
  • kick off background compaction when min/max thresholds change (CASSANDRA-4279)
  • improve ability of STCS.getBuckets to deal with 100s of 1000s of sstables, such as when convertinb back from LCS (CASSANDRA-4287)
  • Oversize integer in CQL throws NumberFormatException (CASSANDRA-4291)

Cassandra 1.1 has been released

Cassandra 1.1 has beent released introducing some improvements:

Miscellaneous:

  • Hadoop can now handle Cassandra wide rows by passing true to the new widerows parameter ofsetInputColumnFamily. The WordCount example has been updated to cover this.
  • Compression is enabled by default (for newly created tables).
  • We now ship the stress load-testing tool in binary builds
  • SerializingCacheProvidor is now supported on Windows.

Apache Cassandra 1.1.0 have been released

Apache Cassandra 1.1.0 have been released, please find hereafter a new features summary:

  • Concurrent schema updates are now supported, with any conflicts automatically resolved.  This makes temporary columnfamilies and other uses of dynamic schema appropriate to use in applications.
  • The CQL language has undergone a major revision, CQL3, the highlights of which are covered at [1].  CQL3 is not backwards-compatibile with CQL2, so we’ve introduced a set_cql_version Thrift method to specify which version you want.
  • Row-level isolation: multi-column updates to a single row have always been *atomic* (either all will be applied, or none thanks to the CommitLog, but until 1.1 they were not *isolated* — a reader may see mixed old and new values while the update happens.
  • Finer-grained control over data directories, allowing a ColumnFamily to be pinned to specfic volume, e.g. one backed by SSD.
  • The bulk loader is no longer a fat client; it can be run from an existing machine in a cluster.
  • A new write survey mode has been added, similar to bootstrap (enabled via  -Dcassandra.write_survey=true), but the node will not automatically join the cluster.  This is useful for cases such as testing different compaction strategies with live traffic without affecting the cluster.
  • Key and row caches are now global, similar to the global memtable threshold. Manual tuning of cache sizes per-columnfamily is no longer required.
  • Off-heap caches no longer require JNA, and will work out of the box on Windows as well as Unix platforms.
  • Streaming is now multithreaded.
  • Compactions may now be aborted via JMX or nodetool.
  • The stress tool is not new in 1.1, but it is newly included in binary builds now, as well as the source tree.
  • Hadoop: a new BulkOutputFormat is included which will directly write SSTables locally and then stream them into the cluster. YOU SHOULD USE BulkOutputFormat BY DEFAULT.  ColumnFamilyOutputFormat is still around in case for some strange reason you want results trickling out over Thrift, but BulkOutputFormat is significantly more efficient.
  • Hadoop: KeyRange.filter is now supported with ColumnFamilyInputFormat, allowing index expressions to be evaluated server-side to reduce the amount of data sent to Hadoop.
  • Hadoop: ColumnFamilyRecordReader has a wide-row mode, enabled viaa boolean parameter to setInputColumnFamily, that pages through data column-at-a-time instead of row-at-a-time.
  • Pig: can use the wide-row Hadoop support, by setting PIG_WIDEROW_INPUT to true.  This will produce each row’s columns in a bag.

Download