PostgreSQL introduced jsonb support

Binary JSON

PostgreSQL has introduce jsonb.. a diamond in the crown of PostgreSQL 9.4.Based on an elegant hash opclass for GIN, which competes with MongoDB performance in contains operator .

Feature’s documentation : http://www.postgresql.org/docs/devel/static/datatype-json.html

Feature’s story:  http://obartunov.livejournal.com/177247.html

hbase-0.98.0 has been released

hbase-0.98.0 has been released

This release includes:

  • several new security features like cell visibility labels, cell ACLs, and transparent server side encryption.
  • significant performance improvements, such as a change to the write ahead log threading model that provides higher transaction throughput under high load, reverse scanners, MapReduce over snapshot files, and striped compaction

The complete list of changes in this release can be found in the release notes: http://goo.gl/y25W2h

What do you know about SQL performance?

The 3-Minute Test: What do you know about SQL performance?

“SQL-Tuning is black magic like alchemy: it consists of obscure rules, understood only by a handful of insiders.”

That is a myth. SQL databases use well-known algorithms to deliver predictable performance. It is, however, easy to write SQL queries that cannot use the most efficient algorithm and thus deliver unexpected performance.

 http://use-the-index-luke.com/3-minute-test

 

 

VoltDB 4.0

VoltDB 4.0 is now available!

 The highlights of VoltDB v4.0 include:

  • Enhanced in-memory analytics capabilities with a host of new SQL support.
  • Greatly improved analytic read throughput performance.
  • Clusters can grow elastically, increasing both throughput and capacity, by adding nodes to running clusters without blocking ongoing operations.
  • Support for Groovy stored procedures, a message queue export connector, a MySQL migration utility and a host of other features.
  • Online training, free, at Volt University, along with Volt Vanguard Certification.

Here’s the details on what’s new in VoltDB v4.0.  You can download it here

Official announce from VoltDB blog: http://voltdb.com/announcing-voltdb-4-0-enhanced-in-memory-analytics-and-online-elasticity/

Enhanced In-Memory Analytics

VoltDB is renown for its ability to execute very fast writes – we’ve benchmarked writes into the millions of transactions per second range, on small clusters, running on bare metal as well as cloud instances.

But fast writes without fast reads are less useful. Since its first version, VoltDB has allowed for transactional reads to support writes as well as provide a window into fast changing data.

At ingestion or processing time, stored procedures can transactionally perform lookups and queries as data is coming into the system, allowing for richer writes at scale. Separately, global transactional reads can trigger events, support dashboards and even live decisioning on immediate data. Mixing complex reads and writes transactionally, and at scale, has traditionally separated VoltDB from other write-heavy systems.

In 4.0, VoltDB has added both features and improved performance of analytic-focused read queries. We’re focused on helping users understand their data as soon as they have it.

First, VoltDB delivers major new SQL capabilities, now supporting SQL UNION, self/outer/explicit JOIN, CASE, HAVING, SQL IN, Group-by column functions and materialized view group-by column functions. Our SQL support is approaching SQL-92 compatibility, while also adding non-standard features to support our key use cases. For example, VoltDB can now build a materialized view that aggregates the value of a JSON field by 5-minute time windows.

Second, we’ve removed some of the transactional overhead when running many kinds of global read queries, including ad-hoc SQL. These queries are still reading a live, fully serializable view of committed data, but they’re now up to 50x faster. This directly translates into more powerful dashboards, more consumers of analytics and richer decisioning.

Online Elastic Database

As a natively clustered database, VoltDB can scale to meet the needs of almost any high-velocity application. While some users have megabytes of state, others have terabytes. While some users process hundreds of operations per second, others process millions. But what if your business is growing and you want your VoltDB cluster to grow with it? Since we shipped VoltDB 1.0, users have been asking to add nodes to the cluster without any interruption in service, to rebalance data in the background while their apps continue to work for them.

We call this feature “elasticity” and it’s shipping in VoltDB 4.0. VoltDB can now seamlessly add nodes to a running cluster, increasing storage and throughput with each new node. Since all topology changes and data movement are transactional and durable, your data is protected while the cluster is expanding or rebalancing. We’ve also carefully engineered this feature so that most customers will see no impact to their workload during expansion.

Online Operations

In addition to the new online elasticity, VoltDB also supports online catalog and schema updates.  Tables can be added or dropped. Columns, indexes and materialized views can be added, dropped and modified.  And of course, stored procedures can be added, updated, or deleted.

VoltDB upgraded its network partition and fault detection handling.  In the real world, network partitions happen for a variety of reasons. Both hardware and software issues can cause a node to become unavailable, or unreachable to other nodes in a cluster. For version 4.0, VoltDB enhanced it’s fault detection and recovery functionality to remain available in the face even more kinds of failure.  The result is increased availability of VoltDB clusters where links might break, for example, when VoltDB nodes are running in different availability zones.

Groovy – Our First Non-Java Stored Procedure Language

In VoltDB 4.0 we’ve added the ability to code the procedure implementation in the DDL itself with inline Groovy scripts. Check out our Groovy Voter sample for a familiar example of VoltDB processing in Groovy.  We hope this is the first of many new stored procedure languages – please drop us a note if you have a favorite, or would like to contribute to this effort.

Integrations, Migrations and Miscellaneous Features

VoltDB v4.0 introduces a whole host of other features. I’ll run through them here quickly:

  • VoltDB Export allows you to transactionally push data from VoltDB into another system, similar to an ETL (extract, transform, load) process. In 3.x, we could export to systems using JDBC, as well as to flat files. VoltDB 4.0 adds a new Export connector to leverage message queues. Presently available as a Beta, you can now export to a message queue using Kafka. Feel free to contact us if you’d like early access.
  • We’ve added three new @Statistics selectors that can help identify performance hot spots in your application.  They are PROCEDUREPROFILE, identifying the percentage of execution time each procedure takes, PROCEDUREINPUT, identifying the breakdown of data flow into stored procedures, and PROCEDUREOUTPUT, identifying the result set data flow from stored procedures.  These new statistics help you to quickly identify what transactions are taking the most time or I/O in your application.
  • You can now get a leg up porting your MySQL to VoltDB.  Our new utility, fondly called Voltify, will extract your database schema from your MySQL database and create a VoltDB catalog automatically.  This utility, coupled with our high performance CSV loader, enables you to rapidly move your MySQL database to VoltDB.
  • The VoltDB JDBC driver has been enhanced to support parameterized ad hoc SQL statements, setting query timeouts, as well as additional metadata methods.

VoltDB Training

VoltDB recently rolled out a new offering, Volt University, offering free online training on VoltDB key concepts.  There are ten lessons, found at http://voltdb.com/resources/volt-university/tutorials/ that can help you come up to VoltDB-speed quickly. Additionally, should you wish formal Volt Vanguard certification, we are offering an online official certification course, which you can register for here: https://university.voltdb.com/

Download VoltDB 4.0

We here at VoltDB are very excited about the release and hope you are too.  You can download VoltDB 4.0 here
.

 

Couchbase Server 2.5 released

Couchbase Server 2.5 has been released, it includes the following new features and enhancement:

  • Rack Awareness (Enterprise Edition only)
  • XDCR data security (Enterprise Edition only)
  • Optimized connection management

 

High Availability with Rack Awareness

To ensure enterprise-class availability and reliability, master data and replicated data should be stored on different server racks. Couchbase Server 2.5 Enterprise Edition’s newly introduced Rack Awareness provides a simple, flexible and effective solution for data replication that is easy to scale and administer. With Couchbase Rack Awareness, the user can create logical groupings of Couchbase Server nodes and replica copies of the data are automatically distributed across server nodes located on different racks. This intelligent data replication ensures that data is secure despite disruptions such as power outages, or switch or rack failure.

Rack Awareness is especially needed for applications running on public clouds, such as Amazon EC2, where customers have no control over infrastructure availability and uptime. With Couchbase Server Enterprise Edition 2.5, customers running applications on a public cloud can leverage Rack Awareness to ensure that replica data is stored on separate zones to maintain 24/7 application uptime.

 

Download http://www.couchbase.com/download

Hypertable version 0.9.7.16 released

Hypertable Version 0.9.7.16 has been released, its brins the following changes:

  • Upgraded to C++11 compiler
  • issue 1179: Fixed insert perf problem introduced by bad commit in 0.9.7.13
  • issue 1193: Fixed split_row/end_row comparison in Range::estimate_split_row()
  • Fixed memory leak in index table mutator
  • Avoid aggressive merging during low memory mode
  • Fixed BalancePlanAuthority::change_receiver_plan_location() to properly increment generation
  • issue 1191: Fixed DEB and RPM package installation
  • Fixed alloc-dealloc-mismatch error in hypertable_ldi_select_test
  • Fixed HQL-delete test
  • Fixed Spirit parser issues
  • issue 1104: Fixed intermittent failure of issue190 test
  • issue 1123: fixed ldd.sh script
  • Got rid of INFO log message in OperationRecover::decode_state()
  • issue 1193: Replaced assert with instrumentation logging
  • issue 1189: Propagate exceptions from ~TableMutator()
  • Modified issue890 test to compile java file into build directory
  • Modified Filesystem::readdir to return vector of Dirent structures
  • Reverted “merging compactions ahead of minor compactions” commit
  • Allow arbitrary column selection for secondary index queries
  • issue 1032: Fixed COUNTER columns “wrap around” on underflow problem
  • Added NO_CACHE option to SELECT statement

 

Download http://hypertable.com/download/09716

Relate note http://cdn.hypertable.com/packages/0.9.7.16/CHANGES

DeepDB

DeepDB provides simultaneous transactions and analytics (row store and column store)—in the same data set, in
real-time. Official website : http://deep.is/

It claims to be fully transactional (ACID compliant) and introduces breakthroughs covering six fundamental attributes to the performance:

 

Constant-Time Indexing

Minimizes indexing cost, enabling highly
indexed databases

Updates indexes in real-time, in-memory 
and on disk

Uses summary indexing to achieve
yottabyte scale

Segmented Column Store

Adds columnar attributes to table-
oriented indexes

Embeds meta-data including
statistical aggregation

Allows for delta updates instead
of a full column rebuild

 

Streaming I/O

Massively optimized, enabling wire-
speed throughput

Concurrent operations for updates 
in-memory and on-disk

Optimizations for SSD, HDD, and 
in-memory-only operation

 

Intelligent Caching

Eliminates on-disk tree traversals

Adaptive segment sizes (no fixed pages)

Point-read capable,
retrieves only what is necessary

 

Adaptive Concurrency

Minimizes delays and wait states to maximize
CPU throughput

Fine-grained lightweight
locking mechanisms

Eliminates most OS context switches

 

Designed for the Cloud

Continually optimizing system

Eliminates downtime for
scheduled maintenance

Zero-touch adaptive configuration

 

About FoundationDB 2.0

FoundationDB 2.0 combines the power of ACID transactions with the scalability, fault tolerance, and operational elegance of distributed NoSQL databases. This release was driven by specific customer feedback for increased language support, network security, and higher-level tools for managing data within FoundationDB.

FoundationDB 2.0 adds Go and PHP to the list of languages with native FoundationDB support.

Along with the additional language and layer support, 2.0 also ships with full Transport Layer Security which encrypts all FoundationDB network traffic, enabling security and authentication between both servers and clients via a public/private key infrastructure.

Also in 2.0, monitoring improvements report more detailed information about potential low-memory scenarios even before they happen.

FoundationDB 2.0 is backwards-compatable with all previous API versions, so any code that you wrote against an old version of FoundationDB will still run; there have been minimal API changes so updating your code to the new API version will be a snap.

Download FoundationDB 2.0

Upgrade as documented here (just remember that you’ll need to upgrade both clients and servers at the same time).

More information on the  Google Group 

Redis 3.0.0 beta-1 is out

Redis 3.0.0 Beta 1 (version 2.9.50) is out.

Release date: 11 Feb 2014

This is the first beta of Redis 3.0.0 (official version is 2.8.50).

The following is a list of improvements in Redis 3.0, compared to Redis 2.8.

  • [NEW] Redis Cluster: a distributed implementation of a subset of Redis.
  • [NEW] New “embedded string” object encoding resulting in less cache misses. Big speed gain under certain work loads.
  • [NEW] WAIT command to block waiting for a write to be transmitted to the specified number of slaves.
  • [NEW] MIGRATE connection caching. Much faster keys migraitons.
  • [NEW] MIGARTE new options COPY and REPLACE.
  • [NEW] CLIENT PAUSE command: stop processing client requests for a specified amount of time.

http://redis.io/download

Mongolab about disk usage and data structure of MongoDB

Mongolab published great presentation helping to understand the disk-space allocation and  the data structure of mongoDB’s  on disk.

  1. How big is your MongoDB?
  2. Managing disk space in MongoDB

 

Managing disk space in MongoDB

In our previous post on MongoDB storage structure and dbStats metrics, we covered how MongoDB stores data and the differences between the dataSize, storageSize and fileSize metrics. We can now apply this knowledge to evaluate strategies for re-using MongoDB disk space.

When documents or collections are deleted, empty record blocks within data files arise. MongoDB attempts to reuse this space when possible, but it will never return this space to the file system. This behavior explains why fileSize never decreases despite deletes on a database.

If your app frequently deletes or if your fileSize is significantly larger than the size of your data plus indexes, you can use one of the methods below reclaim free space.

Getting your free space back

Compacting individual collections

You can compact individual collections using the compact command. This command rewrites and defragments all data in a collection, as well as all of the indexes on that collection.

Important notes on compacting:

  • This operation blocks all other database activity when running and should be used only when downtime for your database is acceptable. If you are running a replica set, you can perform compaction on secondaries in order to avoid blocking the primary and use failover to make the primary a secondary before compacting it.

  • Compacting individual collections will not reduce your storage footprint on disk (i.e., your fileSize) but it will defragment the collections you compact.

Compacting one or more databases

For a single-node MongoDB deployment, you can use the db.repairDatabase() command to compact all the collections in the database. This operation rewrites all the data and indexes for each collection in the database from scratch and thereby compacts and defragments the entire database.

To compact all the databases on your server process, you can stop your mongod process and run it with the “–repair” option.

Important notes on running a repair:

  • This operation blocks all other database activity when running and should be used only when downtime for your database is acceptable.

  • Running a repair requires free disk space equal to the size of your current data set plus 2 GB.  You can use space in a different volume than the one that your mongod is running in by specifying the “–repairpath” option.

Compacting all databases on a server by re-syncing replica set nodes

For a multi-node MongoDB deployment, you can resync a secondary from scratch to reclaim space. By resyncing each node in your replica set you effectively rewrite the data files from scratch and thereby defragment your database.

Please note that if your cluster is comprised of only two electable nodes, you will sacrifice high availability during the resync because the secondary is completely wiped before syncing.

If your app is sensitive to downtime, we recommend a process similar to the one we use here at MongoLab which we call a “rolling node replacement.” This process replaces each node in your cluster in turn by bringing a new node into the cluster, replicating the data to that new node and removing the old node.  In this way,  your cluster can maintain the same level of redundancy during the compaction as during normal operations.

A tip about efficiently using space

usePowerOf2Sizes

Setting the usePowerof2Sizes option is a proactive approach to reusing space in collections that experience frequent document moves or deletions. This option supersedes the default padding factor mechanism and reduces the impact of fragmentation within the collection by allocating additional space for each document in intervals that follow the powers of 2. Setting this option for a specific collection makes it less likely that documents in that collection need to be moved when they grow in size, less likely that a document will need to be moved more than once in its lifetime, and more likely that space left by moving documents can be reused by new or other moved documents.

 

How big is your MongoDB?

As your MongoDB grows in size, information from the db.stats()diagnostic command (or the database “Stats” tab in our management portal) becomes increasingly helpful for evaluating hardware requirements.

We frequently get questions about the dataSize, storageSize and fileSize metrics, so we want to help developers better understand how MongoDB storage works and what these particular metrics mean.

MongoDB storage structure basics

First, we’ll go over the basics of how MongoDB stores your data.

Data files

Every MongoDB instance consists of a namespace file,  journal files and data files. For our discussion, we’ll only be focusing on data files, since that is where all of the data and indexes for your database reside.

Data files store BSON documents, indexes, and MongoDB-generated metadata in structures called extents. Each data file is made up of multiple extents.

Extents

Extents are logical containers within data files used to store documents and indexes.

Photo of data files and extents

The above diagram illustrates the relationship between data files and extents. Note:

  • Data and indexes are each contained in their own sets of extents; no extent will ever contain content for more than one collection
  • Data and indexes are never contained within the same extent
  • The data and indexes for a collection will usually span multiple extents
  • When a new extent is needed, MongoDB will attempt to use available space within current data files. If space cannot be found MongoDB will create new data files.

Metrics from db.stats()

Now that we understand the basics of how MongoDB storage is organized, we can explore metrics commonly examined with db.stats(): dataSize, storageSize and fileSize.

dataSize

Picture of MongoDB dbStats dataSize

The dataSize metric is the sum of the the sizes (in bytes) of all the documents and padding stored in the database.

While dataSize does decrease when you delete documents, dataSize does not decrease when documents shrink because the space used by the original document has already been allocated (to that particular document) and cannot be used by other documents.

Alternatively, if a user updates a document with more data, dataSize will remain the same as long as the new document fits within its originally padded pre-allocated space.

storageSize

Photo of MongoDB dbStats storageSize

The storageSize metric is equal to the size (in bytes) of all the data extents in the database. This number is larger than dataSize because it includes yet-unused space (in data extents) and space vacated by deleted or moved documents within extents.

The storageSize does not decrease as you remove or shrink documents.

fileSize

Photo of MongoDB dbStats fileSize

The fileSize metric is equal to the size (in bytes) of all the data extents, index extents and yet-unused space (in data files) in the database. This metric represents the storage footprint of your database on disk. fileSize is larger than storageSize because it includes index extents and yet-unused space in data files.

While fileSize does decrease when you delete a database, fileSize does not decrease as you remove collections, documents or indexes.