Luxnosql | Data story

UnQLite4

UnQLite is an Embeddable NoSQL (Key/Value store and Document-store) database engine. Unlike most other NoSQL databases, UnQLite does not have a separate server process. UnQLite reads and writes directly to ordinary disk files. A complete database with multiple collections, is contained in a single disk file. The database file format is cross-platform, you can freely copy a database between 32-bit and 64-bit systems or between big-endian and little-endian architectures.UnQLite features includes:

More information on the official website: http://www.unqlite.org/

dotnetConf – Applied NoSQL in .NET1

Live video from the dotnetConf

Perhaps you’ve heard about the next generation of databases roughly classified as NoSQL databases? These databases are generally much better than RDBMS at scaling, performance, and ease-of-development (e.g. in NoSQL the object-relational impedance mismatch usually disappears). Unfortunately, many talks on NoSQL are very academic and general. Not this one. This session will introduce the ideas around the so-called NoSQL movement, and we’ll learn how to leverage MongoDB (a popular open source NoSQL db) to build .NET applications using LINQ as the data access language. We’ll build out a .NET application using LINQ and MongoDB in a series of interactive demos using Visual Studio 2012 and C#.

 

Redis 2.6.13 has been released1

Redis 2.6.13 has been released, it is a recommended upgrade and especially suggested if you experienced:

1) Strange issues with Lua scripting.

2) Not reconfigured reappearing master using Sentinel.

3) Server continusly trying to save on save error.

(This version of Redis may also help with AOF and slow / busy disks and latency issues.)

* [FIX] Throttle BGSAVE attempt on saving error.
* [FIX] redis-cli: raise error on bad command line switch.
* [FIX] Redis/Jemalloc Gitignore were too aggressive.
* [FIX] Test: fix RDB test checking file permissions.
* [FIX] Sentinel: always redirect on master->slave transition.
* [FIX] Lua updated to version 5.1.5. Fixes rare scripting issues.
* [NEW] AOF: improved latency figures with slow/busy disks.
* [NEW] Sentinel: turn old master into a slave when it comes back.
* [NEW] More explicit panic message on out of memory.
* [NEW] redis-cli: --latency-history mode implemented.

Download: http://redis.io/download

Light Table 0.4 has been released1

Light Table 0.4 has been released and can be downloaded  here
Full Changes list include:
  • FIX: change bundle id for Mac .app
  • FIX: make the fuzzy matching take separators into account
  • FIX: setting the exclude path didn’t take effect until restart
  • FIX: remove errant print statement (#405)
  • FIX: pipe separator highlights (#406)
  • FIX: dramatically improve rendering performance.
  • FIX: correctly parse version parts to numbers for comparison.
  • FIX: set syntax needed a better error message and description (#388)
  • FIX: better searching of the PATH on windows
  • FIX: don’t fail startup if a file/folder in a workspace was deleted
  • FIX: default exclude pattern was too greedy
  • FIX: handle semi-colonless JS much better
  • FIX: remove the tab symbols from the solarized theme
  • FIX: workspace buttons no longer overflow
  • FIX: handle the no available client much more gracefully
  • ADDED: the ability to split the window into multiple tabsets
  • ADDED: you can now have multiple windows open (Cmd/Ctrl-Shift-N to open a window, Cmd/Ctrl-Shift-W to close)
  • ADDED: python eval!
  • ADDED: ipython client integration
  • ADDED: nodejs client
  • ADDED: browser tab Browser: add browser tabBrowser: refresh active browser tab
  • ADDED: browser client using chrome-devtools
  • ADDED: Magical JS VM patching for live updates through the devtools integration
  • ADDED: command grouping
  • ADDED: connect tab that now shows which clients are active
  • ADDED: you can now unset a client from an editor
  • ADDED: connect tab now has add connection that lists all available client types
  • ADDED: executing a command by name with a keybinding will prompt you with the keybinding
  • ADDED: token-based auto-complete (press tab after a character)
  • ADDED: trailing whitespace is now removed on save (use the toggle remove trailing whitespace command to disable)
  • ADDED: line-ending detection on save
  • ADDED: You can now eval any arbitrary selection, just select text and press cmd/ctrl+enter
  • ADDED: Better styling for filter lists
  • ADDED: greatly improved startup time
  • ADDED: new folder, new file, rename, and delete to workspace context menu
  • ADDED: workspaces now watch the file system for changes
  • ADDED: Inline inspectable results for Javascript
  • ADDED: Console inspectable results for Javascript
  • ADDED: A greatly improved console with source information
  • ADDED: You can now put the console in a tab via the Console: Open the console in a tab command
  • ADDED: cancelable eval for Clojure and Python
  • ADDED: editor context menu for cut/copy/paste
  • ADDED: Light Table Docs! Docs: Open Light Table's documentation
  • ADDED: Recent workspaces are remembered, added Workspace: Create new workspace
  • CHANGED: clients tab is now connect
  • CHANGED: moved to acorn for Javascript parsing instead of Esprima
  • CHANGED: completely remove JQuery for significant memory performance increases
  • UPDATED: latest codemirror

More details available here:

http://www.chris-granger.com/2013/04/28/light-table-040/

LevelDB a fast and lightweight key/value database library by Google1

LevelDB a fast and lightweight key/value database library by Google

https://code.google.com/p/leveldb/

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

Features

  • Keys and values are arbitrary byte arrays.
  • Data is stored sorted by key.
  • Callers can provide a custom comparison function to override the sort order.
  • The basic operations are Put(key,value)Get(key)Delete(key).
  • Multiple changes can be made in one atomic batch.
  • Users can create a transient snapshot to get a consistent view of data.
  • Forward and backward iteration is supported over the data.
  • Data is automatically compressed using the Snappy compression library.
  • External activity (file system operations etc.) is relayed through a virtual interface so users can customize the operating system interactions.
  • Detailed documentation about how to use the library is included with the source code.

Limitations

  • This is not a SQL database. It does not have a relational data model, it does not support SQL queries, and it has no support for indexes.
  • Only a single process (possibly multi-threaded) can access a particular database at a time.
  • There is no client-server support builtin to the library. An application that needs such support will have to wrap their own server around the library.

Performance

Here is a performance report (with explanations) from the run of the included db_bench program. The results are somewhat noisy, but should be enough to get a ballpark performance estimate.

Setup

We use a database with a million entries. Each entry has a 16 byte key, and a 100 byte value. Values used by the benchmark compress to about half their original size.

   LevelDB:    version 1.1
   Date:       Sun May  1 12:11:26 2011
   CPU:        4 x Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
   CPUCache:   4096 KB
   Keys:       16 bytes each
   Values:     100 bytes each (50 bytes after compression)
   Entries:    1000000
   Raw Size:   110.6 MB (estimated)
   File Size:  62.9 MB (estimated)

 

Write performance

The “fill” benchmarks create a brand new database, in either sequential, or random order. The “fillsync” benchmark flushes data from the operating system to the disk after every operation; the other write operations leave the data sitting in the operating system buffer cache for a while. The “overwrite” benchmark does random writes that update existing keys in the database.

 

   fillseq      :       1.765 micros/op;   62.7 MB/s     
   fillsync     :     268.409 micros/op;    0.4 MB/s (10000 ops)
   fillrandom   :       2.460 micros/op;   45.0 MB/s     
   overwrite    :       2.380 micros/op;   46.5 MB/s

 

Each “op” above corresponds to a write of a single key/value pair. I.e., a random write benchmark goes at approximately 400,000 writes per second.

Each “fillsync” operation costs much less (0.3 millisecond) than a disk seek (typically 10 milliseconds). We suspect that this is because the hard disk itself is buffering the update in its memory and responding before the data has been written to the platter. This may or may not be safe based on whether or not the hard disk has enough power to save its memory in the event of a power failure.

Read performance

We list the performance of reading sequentially in both the forward and reverse direction, and also the performance of a random lookup. Note that the database created by the benchmark is quite small. Therefore the report characterizes the performance of leveldb when the working set fits in memory. The cost of reading a piece of data that is not present in the operating system buffer cache will be dominated by the one or two disk seeks needed to fetch the data from disk. Write performance will be mostly unaffected by whether or not the working set fits in memory.

 

   readrandom   :      16.677 micros/op;  (approximately 60,000 reads per second)
   readseq      :       0.476 micros/op;  232.3 MB/s    
   readreverse  :       0.724 micros/op;  152.9 MB/s

 

LevelDB compacts its underlying storage data in the background to improve read performance. The results listed above were done immediately after a lot of random writes. The results after compactions (which are usually triggered automatically) are better.

 

   readrandom   :      11.602 micros/op;  (approximately 85,000 reads per second)   
   readseq      :       0.423 micros/op;  261.8 MB/s    
   readreverse  :       0.663 micros/op;  166.9 MB/s

 

Some of the high cost of reads comes from repeated decompression of blocks read from disk. If we supply enough cache to the leveldb so it can hold the uncompressed blocks in memory, the read performance improves again:

   readrandom   :       9.775 micros/op;  (approximately 100,000 reads per second before compaction)
   readrandom   :       5.215 micros/op;  (approximately 190,000 reads per second after compaction)

Oracle NoSQL Database 2.0.39 released1

Oracle NoSQL Database 2.0.39 has been released and introduce several improvements, a couple of new Oracle product integration points as well as a number of important bug fixes. These new features and fixes include:

- An integration with Oracle Coherence has been provided that allows Oracle NoSQL Database to be used as a cache for Oracle Coherence applications, also allowing applications to directly access cached data from Oracle NoSQL Database. Documentation can be foundhttp://bit.ly/14e6jEP.

- Oracle NoSQL Database Enterprise Edition now has support for semantic technologies. Specifically, the Resource Description Framework (RDF), SPARQL query language, and a subset of the Web Ontology Language (OWL) are now supported. These capabilities are referred to as the RDF Graph feature of Oracle NoSQL Database. The RDF Graph feature provides a Java-based interface to store and query semantic data in Oracle NoSQL Database Enterprise Edition. Documentation can be found http://bit.ly/Y7aQX4.

Find the complete list of changes in the change log.

Changelog: http://bit.ly/ZweZDS
Download: http://bit.ly/yLGVg3

VoltDB v3.2 has been released1

VoltDB v3.2 has been  released and can be downloaded here: http://voltdb.com/community/downloads.php

Changes include:

  • Enhanced Support for Live Schema Updates
  • Improved Performance and Resilience of Catalog Updates
  • New Return Status for Snapshot Restore
  • hange to the Default Heartbeat Timeout

 

The following issues have been fixed:

  • Automated snapshots and node failure

    It was possible for automated snapshots to silently stop occurring after a node failed and rejoined the cluster. This did not happen all the time, but could not be corrected without restarting the cluster. This issue has been corrected.

  • The sqlcmd command and stored procedure names

    Previously, the sqlcmd command line tool could not invoke a stored procedure if the procedure name started with a SQL statement keyword, such as “select” or “delete”. This issue has been corrected.

  • Enterprise Manager fails to recognize cluster changes

    In recent versions of VoltDB, it was possible for the Enterprise Manager to start a database cluster but not recognize when the database completed startup. Similarly, if a node failed to rejoin or a recover operation did not complete the Enterprise Manager might not recognize these conditions. The symptom in all cases was that the database or server icon would not stop “spinning” in the Enterprise Manager control panel. These issues are now fixed.

MemSQL ships 2.0. Scales in-memory database across hundreds of nodes, thousands of cores1

MemSQL runs on 64-bit Linux. Ideally suited for machines with multi-core processors and at least 8 GB of RAM Download MemSQL

MemSQL goal was to deliver the fastest OLTP database ever. Inspired by the scale and architectures we saw at Facebook, we hoped to help every enterprise leverage in-memory technologies similar to those that leading web companies use.

Customers like Zyngaand Morgan Stanley not only wanted to quickly commit transactions to the database, they also wanted instant answers to questions about how their real-time data compared to historical data. This inspired the MemSQL team to build something new – a solution that supports highly concurrent transactional and analytical workloads at Big Data scale.

Today MemSQL’s real-time analytics platform is available for download. This is the first generally available version of MemSQL that scales horizontally on commodity hardware. It provides the blazing fast performance for which MemSQL is known, and now does it at Big Data scale. Customers have deployed MemSQL across hundreds of nodes and dozens of terabytes of data, and we’ve tested at even greater volumes and velocities. (Check out ourcalculator to get an idea of the number of reads and writes you can perform depending on the size of your cluster.)

This is also the first version to include MemSQL Watch, a visual web-based interface for monitoring and managing your cluster. We expect this to be the beginning of our foray into real-time visualizations as many of our customers look to operationalize their analytics.

Deploying a database can be difficult, so we’ve made it as simple as possible.  for free on our site and take it for a spin. You’ll definitely be impressed by the performance, but you’ll also be impressed by what’s missing:

  • Batched loading – Don’t wait until the middle of the night to refresh your reports.
  • Complicated programming languages (and a limited talent pool) – Use SQL for real-time analytics.
  • An expensive, proprietary box (and a plan to rip and replace it in a few years) – Scale incrementally on commodity hardware.
  • A lengthy implementation cycle – Launch your first MemSQL instance in minutes in the cloud.

TokuDB open sourced and v7 released1

http://www.tokutek.com/2013/04/announcing-tokudb-v7-open-source-and-more/

TokuDB Version 7, for MySQL and MariaDB is going open source.

The free Community Edition is fully functional and fully performant. It has all the compression you’ve come to expect from TokuDB. It has hot schema changes: no-down-time column insertion, deletion, renaming, etc., as well as index creation. It has clustering secondary keys. We are also announcing an Enterprise Edition (coming soon) with additional benefits, such as a support package and advanced backup and recovery tools.

Making TokuDB open source is a natural next step for Tokutek’s involvement in the MySQL community. So far, Tokutek has been involved in the community in many ways:

TokuDB v7 maintains all our established advantages: fast trickle load, fast bulk load, fast range queries through clustering indexes, no fragmentation, and full MySQL/MariaDB compatibility for ease of installation.

In addition, there are plenty of other performance improvements included in this version. For starters, TokuDB v7 adds support for Direct I/O. Also, you asked for it, you got it: TokuDB v7 has significantly enhanced Engine Status information.

For details on updates to pricing and supported MySQL and MariaDB versions, please see our FAQ.

To learn more about TokuDB:

  • Download executables here.
  • The source code is available on GitHub.

Local Secondary Indexes for Amazon DynamoDB1

Amazon Webservice Blog just announce a new feature,  you can now create local secondary indexes for Amazon DynamoDB tables. These indexes provide give you the power to query your tables in new ways, and can also increase retrieval efficiency.

What’s a Local Secondary Index?
The local secondary index model builds on DynamoDB’s existing key model.

Up until today you would have to select one of the following two primary key options when you create a table:

  • Hash - A strongly typed (string, number, or binary) value that uniquely identifies each item in a particular table. DynamoDB allows you to retrieve items by their hash keys.
  • Hash + Range - A pair of strongly typed items that collectively form a unique identifier for each item in a particular table. DynamoDB supports range queries that allow you to retrieve some or all of the items that match the hash portion of the primary key.

With today’s release we are extending the Hash + Range option with support for up to five local secondary indexes per table. Like the primary key, the indexes must be defined when you create the table. Each index references a non-primary attribute, and enables efficient retrieval using a combination of the hash key and the specified secondary key.

You can also choose to project some or all of the table’s other attributes into a secondary index. DynamoDB will automatically retrieve attribute values from the table or from the index as required. Projecting a particular attribute will improve retrieval speed and lessen the amount of provisioned throughput consumed, but will require additional storage space. Items within a secondary index are stored physically close to each other and in sorted order for fast query performance.

Show Me an Example!
Let’s say that you need to access a table containing information about heads of state. You create a table like this, with Country as the hash key and PresNumber as the range key:

Country PresNumber Name VP Party Age YearsInOffice
United States 1 George Washington John Adams None 57 6.34
United States 2 John Adams Thomas Jefferson Federalist 61 4
United States 3 Thomas Jefferson Aaron Burr Democratic-Republican 57 8
United States 4 James Madison George Clinton Democratic-Republican 57 8
United States 5 James Monroe Daniel Tompkins Democratic-Republican 58 8

With this schema, you can retrieve heads of state using a country and the ordinal number for the presidency. Now, let’s say that you want to query by Age (upon taking office) as well. You would create the table with a local secondary index (Age) like this:

Country Age PresNumber Name VP Party YearsInOffice
United States 57 1 George Washington John Adams None 6.34
United States 61 2 John Adams Thomas Jefferson Federalist 4
United States 57 3 Thomas Jefferson Aaron Burr Democratic-Republican 8
United States 57 4 James Madison George Clinton Democratic-Republican 8
United States 58 5 James Monroe Daniel Tompkins Democratic-Republican 8

How Do I Create and Use a Local Secondary Index?
As I noted earlier, you must create your local secondary indexes when you create the DynamoDB table. Here is how you would create them in the AWS Management Console:

DynamoDB’s existing Query API now supports the use of local secondary indexes. Your call must specify the table, the name of the index, the attributes you want to be returned, and any query conditions that you want to apply. We have examples in JavaPHP, and .NET / C#.

Costs and Provisioned Throughput
Let’s talk about the implications of local secondary indexes on the DynamoDB cost structure.

Every secondary index means more work for DynamoDB. When you add, delete, or replace items in a table that has local secondary indexes, DynamoDB will use additional write capacity units to update the relevant indexes.

When you query a table that has one or more local secondary indexes, you need to consider two distinct cases:

For queries that use index keys and projected attributes, DynamoDB will read from the index instead of from the table and will compute the number of read capacity units accordingly. This can result in lower costs if there are less attributes in the index than in the table.

For index queries that read non-projected attributes, DynamoDB will need to read the table and the index. This will consume additional read capacity units.

 

Follow LuxNoSQL on Twitter
 
Join the LuxNoSQL Community on LinkedIn