In five years, Apache Cassandra has grown into one of the most widely used NoSQL databases in the world and serves as the backbone for some of today’s most popular applications including as Facebook,Netflix,Twitter.
This newest version, Cassandra 2.0 just announced, includes multiple new features. But perhaps the biggest of them is that “Cassandra 2.0 makes it easier than ever for developers to migrate from relational databases and become productive quickly.”
New features and improvements include:
Lightweight transactions allow ensuring operation linearizability similar to the serializable isolation level offered by relational databases, which prevents conflicts during concurrent requests
Triggers, which enable pushing performance-critical code close to the data it deals with, and simplify integration with event-driven frameworks like Storm
CQL enhancements such as cursors and improved index support
Improved compaction, keeping read performance from deteriorating under heavy write load
Eager retries to avoid query timeouts by sending redundant requests to other replicas if too much time elapses on the original request
Custom Thrift server implementation based on LMAX Disruptor that achieves lower message processing latencies and better throughput with flexible buffer allocation strategies
The Mongo-Hadoop Adapter 1.1 have been released, it makes easy to use Mongo databases, or mongoDB backup files in .bson format, as the input source or output destination for Hadoop Map/Reduce jobs. By inspecting the data and computing input splits, Hadoop can process the data in parallel so that very large datasets can be processed quickly.
The Mongo-Hadoop adapter also includes support for Pig and Hive, which allow very sophisticated MapReduce workflows to be executed just by writing very simple scripts.
Pig is a high-level scripting language for data analysis and building map/reduce workflows
Hive is a SQL-like language for ad-hoc queries and analysis of data sets on Hadoop-compatible file systems.
Hadoop streaming is also supported, so map/reduce functions can be written in any language besides Java. Right now the Mongo-Hadoop adapter supports streaming in Ruby, Node.js and Python.
How it Works
How the Hadoop Adapter works
The adapter examines the MongoDB Collection and calculates a set of splits from the data
Each of the splits gets assigned to a node in Hadoop cluster
In parallel, Hadoop nodes pull data for their splits from MongoDB (or BSON) and process them locally
Hadoop merges results and streams output back to MongoDB or BSON
FieldDB beta was officially launched in English and Spanish on August 1st 2012 in Patzun, Guatemala as an app for fieldlinguists.
New PLOCAL (Paginated Local) storage engine. In comparison with LOCAL it’s more durable (no usage of MMAP) and supports better concurrency on parallel transactions. To migrate your database to PLOCAL follow this guide: migrate-from-local-storage-engine-to-plocal
New Hash Index type with better performance on lookups. It does not support ranges
New “transactional” SQL command to execute commands inside a transaction. This is useful for “create edge” SQL command to avoid the graph get corrupted
Import now migrates RIDs allowing to import databases in a different one from the original
“Breadth first” strategy added on traversing (Java and SQL APIs)
Server can limit maximum live connections (to prevent DOS)
Fetch plan support in SQL statements and in binary protocol for synchronous commands too
Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.
To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much more quickly than with disk-based systems like Hadoop MapReduce.
To make programming faster, Spark provides clean, concise APIs in Scala, Java and Python. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets
Perhaps you’ve heard about the next generation of databases roughly classified as NoSQL databases? These databases are generally much better than RDBMS at scaling, performance, and ease-of-development (e.g. in NoSQL the object-relational impedance mismatch usually disappears). Unfortunately, many talks on NoSQL are very academic and general. Not this one. This session will introduce the ideas around the so-called NoSQL movement, and we’ll learn how to leverage MongoDB (a popular open source NoSQL db) to build .NET applications using LINQ as the data access language. We’ll build out a .NET application using LINQ and MongoDB in a series of interactive demos using Visual Studio 2012 and C#.
Redis 2.6.13 has been released, it is a recommended upgrade and especially suggested if you experienced:
1) Strange issues with Lua scripting.
2) Not reconfigured reappearing master using Sentinel.
3) Server continusly trying to save on save error.
(This version of Redis may also help with AOF and slow / busy disks and latency issues.)
* [FIX] Throttle BGSAVE attempt on saving error.
* [FIX] redis-cli: raise error on bad command line switch.
* [FIX] Redis/Jemalloc Gitignore were too aggressive.
* [FIX] Test: fix RDB test checking file permissions.
* [FIX] Sentinel: always redirect on master->slave transition.
* [FIX] Lua updated to version 5.1.5. Fixes rare scripting issues.
* [NEW] AOF: improved latency figures with slow/busy disks.
* [NEW] Sentinel: turn old master into a slave when it comes back.
* [NEW] More explicit panic message on out of memory.
* [NEW] redis-cli: --latency-history mode implemented.