Redis 3.0.0 beta-1 is out

Redis 3.0.0 Beta 1 (version 2.9.50) is out.

Release date: 11 Feb 2014

This is the first beta of Redis 3.0.0 (official version is 2.8.50).

The following is a list of improvements in Redis 3.0, compared to Redis 2.8.

  • [NEW] Redis Cluster: a distributed implementation of a subset of Redis.
  • [NEW] New “embedded string” object encoding resulting in less cache misses. Big speed gain under certain work loads.
  • [NEW] WAIT command to block waiting for a write to be transmitted to the specified number of slaves.
  • [NEW] MIGRATE connection caching. Much faster keys migraitons.
  • [NEW] MIGARTE new options COPY and REPLACE.
  • [NEW] CLIENT PAUSE command: stop processing client requests for a specified amount of time.

Twemproxy v 0.3.0 has been released

twemproxy v0.3.0 is out: bug fixes and support for smartos (solaris) / bsd (macos)

twemproxy (pronounced “two-em-proxy”), aka nutcracker is a fast and lightweight proxy for memcached and redis protocol. It was primarily built to reduce the connection count on the backend caching servers.


  • Fast.
  • Lightweight.
  • Maintains persistent server connections.
  • Keeps connection count on the backend caching servers low.
  • Enables pipelining of requests and responses.
  • Supports proxying to multiple servers.
  • Supports multiple server pools simultaneously.
  • Shard data automatically across multiple servers.
  • Implements the complete memcached ascii and redis protocol.
  • Easy configuration of server pools through a YAML file.
  • Supports multiple hashing modes including consistent hashing and distribution.
  • Can be configured to disable nodes on failures.
  • Observability through stats exposed on stats monitoring port.
  • Works with Linux, *BSD, OS X and Solaris (SmartOS)


More details and source code available here:

Redis 2.6.13 has been released

Redis 2.6.13 has been released, it is a recommended upgrade and especially suggested if you experienced:

1) Strange issues with Lua scripting.

2) Not reconfigured reappearing master using Sentinel.

3) Server continusly trying to save on save error.

(This version of Redis may also help with AOF and slow / busy disks and latency issues.)

* [FIX] Throttle BGSAVE attempt on saving error.
* [FIX] redis-cli: raise error on bad command line switch.
* [FIX] Redis/Jemalloc Gitignore were too aggressive.
* [FIX] Test: fix RDB test checking file permissions.
* [FIX] Sentinel: always redirect on master->slave transition.
* [FIX] Lua updated to version 5.1.5. Fixes rare scripting issues.
* [NEW] AOF: improved latency figures with slow/busy disks.
* [NEW] Sentinel: turn old master into a slave when it comes back.
* [NEW] More explicit panic message on out of memory.
* [NEW] redis-cli: --latency-history mode implemented.


Redis 2.6.12 has been released

Redis 2.6.12 has been released and can be downloaded from


UPGRADE URGENCY: MODERATE, nothing very critical but a few non trivial bugs.

* [BUGFIX]   redis-cli –bigkeys: don’t crash with empty DB.
* [BUGFIX]   stop-writes-on-bgsave-error now works in redis.conf
* [BUGFIX]   Don’t crash at startup if RDB is there but can’t be opened.
* [BUGFIX]   Initial value for master_link_down_since_seconds is now huge.
* [BUGFIX]   Allow SELECT while loading the DB.
* [BUGFIX]   Don’t replicate/AOF an empty MULTI/EXEC if the transaction
is empty or containing just read-only commands.
* [BUGFIX]   EXPIRE should not be able to resurrect keys (see issue #1026).
* [IMPROVED] Extended SET back ported from Redis 2.8 / unstable
See for more information.
* [IMPROVED] Test suite improved.

Cassandra performance review


Original article available here


Four years ago, well before starting DataStax, I evaluated the then-current crop of distributed databases and explained why I chose Cassandra. In a lot of ways, Cassandra was the least mature of the options, but I chose to take a long view and wanted to work on a project that got the fundamentals right; things like documentation and distributed testscould come later.


2012 saw that validated in a big way, as the most comprehensive NoSQL benchmark to date was published at the VLDB conference by researchers at the University of Toronto. They concluded,

In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear in- creasing throughput from 1 to 12 nodes.

As a sample, here’s the throughput results from the mixed reads, writes, and (sequential) scans:

I encourage you to take a few minutes to skim the full results.

There are both architectural and implentation reasons for Cassandra’s dominating performance here. Let’s get down into the weeds and see what those are.


Cassandra incorporates a number of architectural best practices that affect performance. None are unique to Cassandra, but Cassandra is the only NoSQL system that incorporates all of them.

Fully distributed: Every Cassandra machine handles a proportionate share of every activity in the system. There are no special cases like the HDFS namenode or MongoDB mongos that require special treatment or special hardware to avoid becoming a bottleneck. And with every node the same, Cassandra is far simpler to install and operate, which has long-term implications for troubleshooting.

Log-structured storage engine: A log-structured engine that avoids overwrites to turn updates into sequential i/o is essential both on hard disks (HDD) and solid-state disks (SSD). On HDD, because the seek penalty is so high; on SSD, to avoid write amplification and disk failure. This is why you see mongodb performance go through the floor as the dataset size exceeds RAM.

Tight integration with its storage engine: Voldemort and Riak support pluggable storage engines, which both limits them to a lowest-common-denominator of key/value pairs, and limits the optimizations that can be done with the distributed replication engine.

Locally-managed storage: HBase has an integrated, log-structured storage engine, but relies on HDFS for replication instead of managing storage locally. This means HBase is architecturally incapable of supporting Cassandra-style optimizations like putting the commitlog on a separate disk, or mixing SSD and HDD in a single cluster with appropriate data pinned to each.


An architecture is only as good as its implementation. For the first years after Cassandra’s open-sourcing as an Apache project, every release was a learning experience. 0.3, 0.4, 0.5, 0.6, each attracted a new wave of users that exposed some previously unimportant weakness. Today, we estimate there are over a thousand production deployments of Cassandra, the most for any scalable database. Some are listed here. To paraphrase ESR, “With enough eyes, all performance problems are obvious.”

What are some implementation details relevant to performance? Let’s have a look at some of the options.


MongoDB can be a great alternative to MySQL, but it’s not really appropriate for the scale-out applications targeted by Cassandra. Still, as early members of the NoSQL category, the two do draw comparisons.

One important limitation in MongoDB is database-level locking. That is, only one writer may modify a given database at a time. Support for collection-level (a set of documents, analogous to a relational table) locking is planned. With either database- or collection-level locking, other writers or readers are locked out. Even a small number of writes can produce stalls in read performance.

Cassandra uses advanced concurrent structures to provide row-level isolation without locking. Cassandra eveneliminated the need for row-level locks for index updates in the recent 1.2 release.

A more subtle MongoDB limitation is that when adding or updating a field in a document, the entire document must be re-written. If you pre-allocate space for each document, you can avoid the associated fragmentation, but even with pre-allocation updating your document gets slower as it grows.

Cassandra’s storage engine only appends updated data, it never has to re-write or re-read existing data. Thus, updates to a Cassandra row or partition stay fast as your dataset grows.


Riak presents a document-based data model to the end user, but under the hood it maps everything to a key/value storage API. Thus, like MongoDB, updating any field in a document requires rewriting the whole thing.

However, Riak does emphasize the use of log-structured storage engines. Both the default BitCask backend and LevelDB are log-structured. Riak increasingly emphasizes LevelDB since BitCask does not support scan operations (which are required for indexes), but this brings its own set of problems.

LevelDB is a log-structured storage engine with a different approach to compaction than the one introduced by Bigtable. LevelDB trades more compaction i/o for less i/o at read time, which can be a good tradeoff for many workloads, but not all. Cassandra added support for leveldb-style compaction about a year ago.

LevelDB itself is designed to be an embedded database for the likes of Chrome, and clear growing pains are evident when pressed into service as a multi-user backend for Riak. (A LevelDB configuration for Voldemort also exists.) Basho cites “one stall every 2 hours for 10 to 30 seconds”, “cases that can still cause [compaction] infinite loops,” and no way to create snapshots or backups as of the recently released Riak 1.2.


HBase’s storage engine is the most similar to Cassandra’s; both drew on Bigtable’s design early on.

But despite a later start, Cassandra’s storage engine is far ahead of HBase’s today, in large part because building on HDFS instead of locally-managed storage makes everything harder for HBase. Cassandra added online snapshotsalmost four years ago; HBase still has a long ways to go.

HDFS also makes SSD support problematic for HBase, which is becoming increasingly relevant as SSD price/performance improves. Cassandra has excellent SSD support and even support for mixed SSD and HDD within the same cluster, with data pinned to the medium that makes the most sense for it.

Other differences that may not show up at benchmark time, but you would definitely notice in production:

HBase can’t delete data during minor compactions — you have to rewrite all the data in a region to reclaim disk space. Cassandra has deleted tombstones during minor compactions for over two years.

While you are running that major compaction, HBase gives you no way to throttle it and limit its impact on your application workload. Cassandra introduced this two years ago and continues to improve it. Dealing with local storage also lets Cassandra avoid polluting the page cache with sequential scans from compaction.

Compaction might seem like bookkeeping details, but it does impact the rest of the system. HBase limits you to two or three column families because of compaction and flushing limitations, forcing you to do sub-optimal things to your data model as a workaround.


I honestly think Cassandra is one to two years ahead of the competition, but I’m under no illusions that Cassandra itself is perfect. We have plenty of improvements to make still; from the recently released Cassandra 1.2 to our ticket backlog, there is no shortage of work to do.

Here are some of the areas I’d like to see Cassandra improve this year:

If working on an industry-leading, open-source database doing cutting edge performance work on the JVM sounds interesting to you, please get in touch.

Redis 2.6.9 has been released

Redis 2.6.9 has been released and is available for download:

New feature,bugfix and improvement:

  • [BUGFIX] Changing master at runtime (SLAVEOF command) in presence of network problems, or in very rapid succession, could result in non-critical problems (GitHub Issue #828).
  • [IMPROVED] CLINGET GETNAME and SETNAME to set and query connection names reported by CLIENT LIST. Very useful for debugging of problems.
  • [IMPROVED] redis-cli is now able to transfer an RDB file from a remote server to a local file using the –rdb <filename> command line option.

Nine Databases in 45 Minutes



The following video provides a crash course in nine key databases:  Postgres, CouchDB, MarkLogic, Riak, VoltDB, MongoDB, Neo4j, HBase and Redis. All in just 45 minutes.

Miles Pomeroy, Chad Maughan, and Jonathan Geddes run through each database in five minutes each, thus the title, “9 Databases in 45 minutes.” The video proceeds in efficient and occasionally amusing fashion, where a countdown clock and a gong keep the presentations terse.

ACID transaction in the NoSQL world

NoSQL solutions have lighter weight transactional semantics than relational databases generaly speaking

Actually some of them does support ACID, or alike, transaction :

Redis 2.6.6 has been released

Redis 2.6.6 and Redis 2.4.18 are out.

[BUGFIX]   Jemalloc updated to 3.2.0.

Redis 2.6.5 has been released

Redis 2.6.5 has been released and can be downloaded here

It includes the following changes:

* [IMPROVED] RDB/AOF childern now log amount of additional memory used
             because of copy on write.
* [BUGFIX]   MIGRATE non critical fixes (see commits for details).
* [BUGFIX]   MULTI/EXEC: now EXEC aborts on errors before EXEC.
* [BUGFIX]   Fix integer overflow in zunionInterGenericCommand resulting
             into Z[INTER|UNION][STORE] commands to crash under extremely
             unlikely conditions (almost impossible in real world).
* [BUGFIX]   EVALSHA is now case insensitive (and will not crash).