SQL or NoSQL – understanding the underlying issues

I tried recently to explain how it is not one or the other: SQL, NoSQL once again is not the question, the choice to be made. But instead, it is the the underlying issues which has to be understood and used to drive your choice.

  • Ability to scale

If you’re application can’t serve any longer its users, whatever how good and smart it used to work it is no longer working ….. So scaling, trough the techniques  of clustering,sharding and distributed process had become a must. One requirement that few RDBMS have been able to implement. Obviously the historical reasons, the old ways, are responsible: traditionally the SQL database was running on a single machine (one single big server with the biggest cpu available and all the RAM you could have afford). Before scaling solutions were made available, performance issue tried to be solved using cache techniques(memcached was created in 2003) but is all the same problem, if your application and service stop to serve its users it is game over.

  • ACID – transactional database

Most application does not need to support transaction, the ability  for a single process to perform multiple data-manipulation and finally enforce this set of operations or cancel them all, at any step, those rolling back to the initial data situation(before your program starts). Such feature, is available for all programs(and related instance) accessing a database concurrently. Such magic and complex set of features ensure to provide so called consistency and integrity. As I said, most application does not need to support transaction. Most NoSQL databases are non-ACID and does not support transaction.

  • Data model

Traditional RDBMS have relied on the relational models which can be overly restrictive. A strong relational models, when modelling complex data, requires skills and time to be created, maintained and documented(in view of knowledge transfer). In practice the relational data model will limit your future development since you can’t easily change a relational models. The NoSQL solution provides different data structure such as document,graph and key-value which enable non-relational data models. To make a long story short ,the data model (relational or not) will not ease your designs (still highly critical) but it will eventually ease its implementations.

 

MongoDB 2.6 released

MongoDB 2.6 has been released with new majors features as primary target, but it also improve performance.

Performance improvements:

  • efficient use of network resources
  • oplog processing is 75% faster
  • classes of scan, sort, $in and $all performance are significantly improved
  • bulk operators for writes improve updates by as much as 5x.

Features improvements:

  • Text Search Integration
  • Insert and Update Improvements
  • A new write protocol integrates write operations with write concerns(The protocol also provides improved support for bulk operations)
  • A new authorization model that provides the ability to create custom User-Defined Roles and the ability to specify user privileges at a collection-level granularity.

Full release note

PostgreSQL introduced jsonb support

Binary JSON

PostgreSQL has introduce jsonb.. a diamond in the crown of PostgreSQL 9.4.Based on an elegant hash opclass for GIN, which competes with MongoDB performance in contains operator .

Feature’s documentation : http://www.postgresql.org/docs/devel/static/datatype-json.html

Feature’s story:  http://obartunov.livejournal.com/177247.html

DeepDB

DeepDB provides simultaneous transactions and analytics (row store and column store)—in the same data set, in
real-time. Official website : http://deep.is/

It claims to be fully transactional (ACID compliant) and introduces breakthroughs covering six fundamental attributes to the performance:

 

Constant-Time Indexing

Minimizes indexing cost, enabling highly
indexed databases

Updates indexes in real-time, in-memory 
and on disk

Uses summary indexing to achieve
yottabyte scale

Segmented Column Store

Adds columnar attributes to table-
oriented indexes

Embeds meta-data including
statistical aggregation

Allows for delta updates instead
of a full column rebuild

 

Streaming I/O

Massively optimized, enabling wire-
speed throughput

Concurrent operations for updates 
in-memory and on-disk

Optimizations for SSD, HDD, and 
in-memory-only operation

 

Intelligent Caching

Eliminates on-disk tree traversals

Adaptive segment sizes (no fixed pages)

Point-read capable,
retrieves only what is necessary

 

Adaptive Concurrency

Minimizes delays and wait states to maximize
CPU throughput

Fine-grained lightweight
locking mechanisms

Eliminates most OS context switches

 

Designed for the Cloud

Continually optimizing system

Eliminates downtime for
scheduled maintenance

Zero-touch adaptive configuration

 

About FoundationDB 2.0

FoundationDB 2.0 combines the power of ACID transactions with the scalability, fault tolerance, and operational elegance of distributed NoSQL databases. This release was driven by specific customer feedback for increased language support, network security, and higher-level tools for managing data within FoundationDB.

FoundationDB 2.0 adds Go and PHP to the list of languages with native FoundationDB support.

Along with the additional language and layer support, 2.0 also ships with full Transport Layer Security which encrypts all FoundationDB network traffic, enabling security and authentication between both servers and clients via a public/private key infrastructure.

Also in 2.0, monitoring improvements report more detailed information about potential low-memory scenarios even before they happen.

FoundationDB 2.0 is backwards-compatable with all previous API versions, so any code that you wrote against an old version of FoundationDB will still run; there have been minimal API changes so updating your code to the new API version will be a snap.

Download FoundationDB 2.0

Upgrade as documented here (just remember that you’ll need to upgrade both clients and servers at the same time).

More information on the  Google Group 

Redis 3.0.0 beta-1 is out

Redis 3.0.0 Beta 1 (version 2.9.50) is out.

Release date: 11 Feb 2014

This is the first beta of Redis 3.0.0 (official version is 2.8.50).

The following is a list of improvements in Redis 3.0, compared to Redis 2.8.

  • [NEW] Redis Cluster: a distributed implementation of a subset of Redis.
  • [NEW] New “embedded string” object encoding resulting in less cache misses. Big speed gain under certain work loads.
  • [NEW] WAIT command to block waiting for a write to be transmitted to the specified number of slaves.
  • [NEW] MIGRATE connection caching. Much faster keys migraitons.
  • [NEW] MIGARTE new options COPY and REPLACE.
  • [NEW] CLIENT PAUSE command: stop processing client requests for a specified amount of time.

http://redis.io/download

hRaven v0.9.8

The @twitterhadoop team just released hRaven v0.9.8

hRaven collects run time data and statistics from map reduce jobs running on Hadoop clusters and stores the collected job history in an easily queryable format. For the jobs that are run through frameworks (Pig or Scalding/Cascading) that decompose a script or application into a DAG of map reduce jobs for actual execution, hRaven groups job history data together by an application construct. This allows for easier visualization of all of the component jobs’ execution for an application and more comprehensive trending and analysis over time.

 
Requirements
  • Apache HBase (0.94+) – a running HBase cluster is required for the hRaven data storage
  • Apache Hadoop – hRaven current supports collection of job data on specific versions of Hadoop:
    • CDH upto CDH3u5, Hadoop 1.x upto MAPREDUCE-1016
    • Hadoop 1.x post MAPREDUCE-1016 and Hadoop 2.0 are supported in versions 0.9.4 onwards

https://github.com/twitter/hraven

 

 

Twemproxy v 0.3.0 has been released

twemproxy v0.3.0 is out: bug fixes and support for smartos (solaris) / bsd (macos)

twemproxy (pronounced “two-em-proxy”), aka nutcracker is a fast and lightweight proxy for memcached and redis protocol. It was primarily built to reduce the connection count on the backend caching servers.

Features

  • Fast.
  • Lightweight.
  • Maintains persistent server connections.
  • Keeps connection count on the backend caching servers low.
  • Enables pipelining of requests and responses.
  • Supports proxying to multiple servers.
  • Supports multiple server pools simultaneously.
  • Shard data automatically across multiple servers.
  • Implements the complete memcached ascii and redis protocol.
  • Easy configuration of server pools through a YAML file.
  • Supports multiple hashing modes including consistent hashing and distribution.
  • Can be configured to disable nodes on failures.
  • Observability through stats exposed on stats monitoring port.
  • Works with Linux, *BSD, OS X and Solaris (SmartOS)

 

More details and source code available here: https://github.com/twitter/twemproxy

Dex, the Index Bot for MongoDB

Dex, the Index Bot

Dex is a MongoDB performance tuning tool that compares queries to the available indexes in the queried collection(s) and generates index suggestions based on simple heuristics. Currently you must provide a connection URI for your database.

Dex uses the URI you provide as a helpful way to determine when an index is recommended. Dex does not take existing indexes into account when actually constructing its ideal recommendation.

Currently, Dex only recommends complete indexes, not partial indexes. Dex ignores partial indexes that may be used by the query in favor of a better index, if one is not found. Dex recommends partially-ordered indexes according to a rule of thumb:

Your index field order should first answer:

  1. Equivalent value checks
  2. Sort clauses
  3. Range value checks ($in, $nin, $lt/gt, $lte/gte, etc.)

Note that your data cardinality may warrant a different order than the suggested indexes.

https://github.com/mongolab/dex

Mesos 0.13.0 released

Mesos 0.13 has been released and fix many bugs and include the following improvment:

  • [MESOS-46] – Refactor MasterTest to use fixture
  • [MESOS-134] – Add Python documentation
  • [MESOS-140] – Unrecognized command line args should fail the process
  • [MESOS-242] – Add more tests to Dominant Share Allocator
  • [MESOS-305] – Inform the frameworks / slaves about a master failover
  • [MESOS-346] – Improve OSX configure output when deprecated headers are present.
  • [MESOS-360] – Mesos jar should be built for java 6
  • [MESOS-409] – Master detector code should stat nodes before attempting to create
  • [MESOS-472] – Separate ResourceStatistics::cpu_time into ResourceStatistics::cpu_user_time and ResourceStatistics::cpu_system_time.
  • [MESOS-493] – Expose version information in http endpoints
  • [MESOS-503] – Master should log LOST messages sent to the framework
  • [MESOS-526] – Change slave command line flag from ‘safe’ to ‘strict’
  • [MESOS-602] – Allow Mesos native library to be loaded from an absolute path
  • [MESOS-603] – Add support for better test output in newer versions of autools

Download the most recent stable release: 0.13.0. (Release Notes)