Cassandra 2.0, just big

In five years, Apache Cassandra has grown into one of the most widely used NoSQL databases in the world and serves as the backbone for some of today’s most popular applications including as Facebook,Netflix,Twitter.

cassandra

 

This newest version, Cassandra 2.0 just announced, includes multiple new features. But perhaps the biggest of them is that “Cassandra 2.0 makes it easier than ever for developers to migrate from relational databases and become productive quickly.”

New features and improvements include:

  • Lightweight transactions allow ensuring operation linearizability similar to the serializable isolation level offered by relational databases, which prevents conflicts during concurrent requests
  • Triggers, which enable pushing performance-critical code close to the data it deals with, and simplify integration with event-driven frameworks like Storm
  • CQL enhancements such as cursors and improved index support
  • Improved compaction, keeping read performance from deteriorating under heavy write load
  • Eager retries to avoid query timeouts by sending redundant requests to other replicas if too much time elapses on the original request
  • Custom Thrift server implementation based on LMAX Disruptor that achieves lower message processing latencies and better throughput with flexible buffer allocation strategies

Official website

Official announce

Download

Mongo-Hadoop Adapter 1.1

The Mongo-Hadoop Adapter 1.1 have been released, it makes  easy to use Mongo databases, or mongoDB backup files in .bson format, as the input source or output destination for Hadoop Map/Reduce jobs. By inspecting the data and computing input splits, Hadoop can process the data in parallel so that very large datasets can be processed quickly.

The Mongo-Hadoop adapter also includes support for Pig and Hive, which allow very sophisticated MapReduce workflows to be executed just by writing very simple scripts.

  • Pig is a high-level scripting language for data analysis and building map/reduce workflows
  • Hive is a SQL-like language for ad-hoc queries and analysis of data sets on Hadoop-compatible file systems.

Hadoop streaming is also supported, so map/reduce functions can be written in any language besides Java. Right now the Mongo-Hadoop adapter supports streaming in Ruby, Node.js and Python.

How it Works

How the Hadoop Adapter works

  • The adapter examines the MongoDB Collection and calculates a set of splits from the data
  • Each of the splits gets assigned to a node in Hadoop cluster
  • In parallel, Hadoop nodes pull data for their splits from MongoDB (or BSON) and process them locally
  • Hadoop merges results and streams output back to MongoDB or BSON

 

FieldDB is available

FieldDB is a free, modular, open source project developed collectively by field linguists and software developers to make an expandable user-friendly app which can be used to collect, search and share your data, both online and offline. It is fundamentally an app written in 100% Javascript which runs entirely client side, backed by a NoSQL database (we are currently using CouchDB and its offline browser wrapper PouchDB alpha). It has a number of webservices which it connects to in order to allow users to perform tasks which require the internet/cloud (ie, syncing data between devices and users, sharing data publicly, running CPU intensive processes to analyze/extract/search audio/video/text). While the app was designed for “field linguists” it can be used by anyone collecting text data or collecting highly structured data where the fields on each data point require encryption or customization from user to user, and where the schema of the data is expected to evolve over the course of data collection while in the “field.”

FieldDB beta was officially launched in English and Spanish on August 1st 2012 in Patzun, Guatemala as an app for fieldlinguists.

More information about FieldDB are available here: https://github.com/OpenSourceFieldlinguistics/FieldDB

OrientDB 1.5 released

OrientDB 1.5 has been released, fix a bunch of issue and bring the following new feature and enhancement:

All the issues: https://github.com/orientechnologies/orientdb/issues?milestone=5&page=1&state=closed

  • New PLOCAL (Paginated Local) storage engine. In comparison with LOCAL it’s more durable (no usage of MMAP) and supports better concurrency on parallel transactions. To migrate your database to PLOCAL follow this guide: migrate-from-local-storage-engine-to-plocal
  • New Hash Index type with better performance on lookups. It does not support ranges
  • New “transactional” SQL command to execute commands inside a transaction. This is useful for “create edge” SQL command to avoid the graph get corrupted
  • Import now migrates RIDs allowing to import databases in a different one from the original
  • “Breadth first” strategy added on traversing (Java and SQL APIs)
  • Server can limit maximum live connections (to prevent DOS)
  • Fetch plan support in SQL statements and in binary protocol for synchronous commands too

Upgrade note: https://github.com/orientechnologies/orientdb/wiki/Upgrade

Download link: https://github.com/orientechnologies/orientdb/releases/download/1.5/orientdb-graphed-1.5.0.zip

Neo4j 1.9.2 has been released

Neo4j, version 1.9.2 is now available.

 

  • Optimize IO performance on Windows
  • Improve procedure to the set up networking for HA clusters
  • Some fixes to the REST API

Neo4j 1.9.2 is available immediately and is an easy upgrade from any other 1.9.x versions

You can download from the neo4j.org web site

Cassandra 2.0.0-beta1 have been released

The latest development release , the 2.0.0-beta1, is now available for download:

Full changes list:

  • Removed on-heap row cache (CASSANDRA-5348)
  • use nanotime consistently for node-local timeouts (CASSANDRA-5581)
  • Avoid unnecessary second pass on name-based queries (CASSANDRA-5577)
  • Experimental triggers (CASSANDRA-1311)
  • JEMalloc support for off-heap allocation (CASSANDRA-3997)
  • Single-pass compaction (CASSANDRA-4180)
  • Removed token range bisection (CASSANDRA-5518)
  • Removed compatibility with pre-1.2.5 sstables and network messages(CASSANDRA-5511)
  • removed PBSPredictor (CASSANDRA-5455)
  • CAS support (CASSANDRA-5062, 5441, 5442, 5443, 5619, 5667)
  • Leveled compaction performs size-tiered compactions in L0 (CASSANDRA-5371, 5439)
  • Add yaml network topology snitch for mixed ec2/other envs (CASSANDRA-5339)
  • Log when a node is down longer than the hint window (CASSANDRA-4554)
  • Optimize tombstone creation for ExpiringColumns (CASSANDRA-4917)
  • Improve LeveledScanner work estimation (CASSANDRA-5250, 5407)
  • Replace compaction lock with runWithCompactionsDisabled (CASSANDRA-3430)
  • Change Message IDs to ints (CASSANDRA-5307)
  • Move sstable level information into the Stats component, removing the
  • need for a separate Manifest file (CASSANDRA-4872)
  • avoid serializing to byte[] on commitlog append (CASSANDRA-5199)
  • make index_interval configurable per columnfamily (CASSANDRA-3961, CASSANDRA-5650)
  • add default_time_to_live (CASSANDRA-3974)
  • add memtable_flush_period_in_ms (CASSANDRA-4237)
  • replace supercolumns internally by composites (CASSANDRA-3237, 5123)
  • upgrade thrift to 0.9.0 (CASSANDRA-3719)
  • drop unnecessary keyspace parameter from user-defined compaction API (CASSANDRA-5139)
  • more robust solution to incomplete compactions + counters (CASSANDRA-5151)
  • Change order of directory searching for c*.in.sh (CASSANDRA-3983)
  • Add tool to reset SSTable compaction level for LCS (CASSANDRA-5271)
  • Allow custom configuration loader (CASSANDRA-5045)
  • Remove memory emergency pressure valve logic (CASSANDRA-3534)
  • Reduce request latency with eager retry (CASSANDRA-4705)
  • cqlsh: Remove ASSUME command (CASSANDRA-5331)
  • Rebuild BF when loading sstables if bloom_filter_fp_chance
  • has changed since compaction (CASSANDRA-5015)
  • remove row-level bloom filters (CASSANDRA-4885)
  • Change Kernel Page Cache skipping into row preheating (disabled by default)(CASSANDRA-4937)
  • Improve repair by deciding on a gcBefore before sending
  • out TreeRequests (CASSANDRA-4932)
  • Add an official way to disable compactions (CASSANDRA-5074)
  • Reenable ALTER TABLE DROP with new semantics (CASSANDRA-3919)
  • Add binary protocol versioning (CASSANDRA-5436)
  • Swap THshaServer for TThreadedSelectorServer (CASSANDRA-5530)
  • Add alias support to SELECT statement (CASSANDRA-5075)
  • Don’t create empty RowMutations in CommitLogReplayer (CASSANDRA-5541)
  • Use range tombstones when dropping cfs/columns from schema (CASSANDRA-5579)
  • cqlsh: drop CQL2/CQL3-beta support (CASSANDRA-5585)
  • Track max/min column names in sstables to be able to optimize slice
  • queries (CASSANDRA-5514, CASSANDRA-5595, CASSANDRA-5600)
  • Binary protocol: allow batching already prepared statements (CASSANDRA-4693)
  • Allow preparing timestamp, ttl and limit in CQL3 queries (CASSANDRA-4450)
  • Support native link w/o JNA in Java7 (CASSANDRA-3734)
  • Use SASL authentication in binary protocol v2 (CASSANDRA-5545)
  • Replace Thrift HsHa with LMAX Disruptor based implementation (CASSANDRA-5582)
  • cqlsh: Add row count to SELECT output (CASSANDRA-5636)
  • Include a timestamp with all read commands to determine column expiration(CASSANDRA-5149)
  • Streaming 2.0 (CASSANDRA-5286, 5699)
  • Conditional create/drop ks/table/index statements in CQL3 (CASSANDRA-2737)
  • more pre-table creation property validation (CASSANDRA-5693)
  • Redesign repair messages (CASSANDRA-5426)
  • Fix ALTER RENAME post-5125 (CASSANDRA-5702)
  • Disallow renaming a 2ndary indexed column (CASSANDRA-5705)
  • Rename Table to Keyspace (CASSANDRA-5613)
  • Ensure changing column_index_size_in_kb on different nodes don’t corrupt the
  • sstable (CASSANDRA-5454)
  • Move resultset type information into prepare, not execute (CASSANDRA-5649)
  • Auto paging in binary protocol (CASSANDRA-4415, 5714)
  • Don’t tie client side use of AbstractType to JDBC (CASSANDRA-4495)
  • Adds new TimestampType to replace DateType (CASSANDRA-5723, CASSANDRA-5729)

 

Spark the Open Source Future of Big Data

Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much more quickly than with disk-based systems like Hadoop MapReduce.

To make programming faster, Spark provides clean, concise APIs in Scala, Java and Python. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets

Spark is open source under a BSD license, so download it to check it out.

dotnetConf – Applied NoSQL in .NET

Live video from the dotnetConf

Perhaps you’ve heard about the next generation of databases roughly classified as NoSQL databases? These databases are generally much better than RDBMS at scaling, performance, and ease-of-development (e.g. in NoSQL the object-relational impedance mismatch usually disappears). Unfortunately, many talks on NoSQL are very academic and general. Not this one. This session will introduce the ideas around the so-called NoSQL movement, and we’ll learn how to leverage MongoDB (a popular open source NoSQL db) to build .NET applications using LINQ as the data access language. We’ll build out a .NET application using LINQ and MongoDB in a series of interactive demos using Visual Studio 2012 and C#.

 

Redis 2.6.13 has been released

Redis 2.6.13 has been released, it is a recommended upgrade and especially suggested if you experienced:

1) Strange issues with Lua scripting.

2) Not reconfigured reappearing master using Sentinel.

3) Server continusly trying to save on save error.

(This version of Redis may also help with AOF and slow / busy disks and latency issues.)

* [FIX] Throttle BGSAVE attempt on saving error.
* [FIX] redis-cli: raise error on bad command line switch.
* [FIX] Redis/Jemalloc Gitignore were too aggressive.
* [FIX] Test: fix RDB test checking file permissions.
* [FIX] Sentinel: always redirect on master->slave transition.
* [FIX] Lua updated to version 5.1.5. Fixes rare scripting issues.
* [NEW] AOF: improved latency figures with slow/busy disks.
* [NEW] Sentinel: turn old master into a slave when it comes back.
* [NEW] More explicit panic message on out of memory.
* [NEW] redis-cli: --latency-history mode implemented.

Download: http://redis.io/download

Light Table 0.4 has been released

Light Table 0.4 has been released and can be downloaded  here
Full Changes list include:
  • FIX: change bundle id for Mac .app
  • FIX: make the fuzzy matching take separators into account
  • FIX: setting the exclude path didn’t take effect until restart
  • FIX: remove errant print statement (#405)
  • FIX: pipe separator highlights (#406)
  • FIX: dramatically improve rendering performance.
  • FIX: correctly parse version parts to numbers for comparison.
  • FIX: set syntax needed a better error message and description (#388)
  • FIX: better searching of the PATH on windows
  • FIX: don’t fail startup if a file/folder in a workspace was deleted
  • FIX: default exclude pattern was too greedy
  • FIX: handle semi-colonless JS much better
  • FIX: remove the tab symbols from the solarized theme
  • FIX: workspace buttons no longer overflow
  • FIX: handle the no available client much more gracefully
  • ADDED: the ability to split the window into multiple tabsets
  • ADDED: you can now have multiple windows open (Cmd/Ctrl-Shift-N to open a window, Cmd/Ctrl-Shift-W to close)
  • ADDED: python eval!
  • ADDED: ipython client integration
  • ADDED: nodejs client
  • ADDED: browser tab Browser: add browser tabBrowser: refresh active browser tab
  • ADDED: browser client using chrome-devtools
  • ADDED: Magical JS VM patching for live updates through the devtools integration
  • ADDED: command grouping
  • ADDED: connect tab that now shows which clients are active
  • ADDED: you can now unset a client from an editor
  • ADDED: connect tab now has add connection that lists all available client types
  • ADDED: executing a command by name with a keybinding will prompt you with the keybinding
  • ADDED: token-based auto-complete (press tab after a character)
  • ADDED: trailing whitespace is now removed on save (use the toggle remove trailing whitespace command to disable)
  • ADDED: line-ending detection on save
  • ADDED: You can now eval any arbitrary selection, just select text and press cmd/ctrl+enter
  • ADDED: Better styling for filter lists
  • ADDED: greatly improved startup time
  • ADDED: new folder, new file, rename, and delete to workspace context menu
  • ADDED: workspaces now watch the file system for changes
  • ADDED: Inline inspectable results for Javascript
  • ADDED: Console inspectable results for Javascript
  • ADDED: A greatly improved console with source information
  • ADDED: You can now put the console in a tab via the Console: Open the console in a tab command
  • ADDED: cancelable eval for Clojure and Python
  • ADDED: editor context menu for cut/copy/paste
  • ADDED: Light Table Docs! Docs: Open Light Table's documentation
  • ADDED: Recent workspaces are remembered, added Workspace: Create new workspace
  • CHANGED: clients tab is now connect
  • CHANGED: moved to acorn for Javascript parsing instead of Esprima
  • CHANGED: completely remove JQuery for significant memory performance increases
  • UPDATED: latest codemirror

More details available here:

http://www.chris-granger.com/2013/04/28/light-table-040/