Postgres Outperforms MongoDB and Ushers in New Developer Reality

According to EnterpriseDB’s recent benchmark, Postgres Outperforms MongoDB and Ushers in New Developer Reality

Potgres would outperform MongoDB performance but also the MongoDB data size requirement would be outperformed by by approx. 25%

EDB found that Postgres outperforms MongoDB in selecting, loading and inserting complex document data in key workloads involving 50 million records:

  • Ingestion of high volumes of data was approximately 2.1 times faster in Postgres
  • MongoDB consumed 33% more the disk space
  • Data inserts took almost 3 times longer in MongoDB
  • Data selection took more than 2.5 times longer in MongoDB than in Postgres

Find the full article here

The benchmark tools is available on GitHub: https://github.com/EnterpriseDB/pg_nosql_benchmark

IBM’s new Power8 chip technology unveiled

IBM Unveils Power8 Chip As Open Hardware. Google and other OpenPower Foundation partners express interest in IBM’s Power8 chip designs and server motherboard specs since Power8 has been designed with some specific big-data handling characteristics.It is, for example, an eight-threaded processor, meaning each of 12 cores in a CPU will coordinate the processing of eight sets of instructions at a time — a total of 96 processes. “processes” is to understood as a set of related instructions making up a discrete process within a program. By designating sections of an application that can run as a process and coordinate the results, a chip can accomplish more work than a single-threaded chip.

By licensing technology to partners, IBM is borrowing a tactic used by ARM in the market for chips used in smartphones and tablets. But the company faces an uphill battle.

More information:

http://openpowerfoundation.org/

http://bits.blogs.nytimes.com/

Dex, the Index Bot for MongoDB

Dex, the Index Bot

Dex is a MongoDB performance tuning tool that compares queries to the available indexes in the queried collection(s) and generates index suggestions based on simple heuristics. Currently you must provide a connection URI for your database.

Dex uses the URI you provide as a helpful way to determine when an index is recommended. Dex does not take existing indexes into account when actually constructing its ideal recommendation.

Currently, Dex only recommends complete indexes, not partial indexes. Dex ignores partial indexes that may be used by the query in favor of a better index, if one is not found. Dex recommends partially-ordered indexes according to a rule of thumb:

Your index field order should first answer:

  1. Equivalent value checks
  2. Sort clauses
  3. Range value checks ($in, $nin, $lt/gt, $lte/gte, etc.)

Note that your data cardinality may warrant a different order than the suggested indexes.

https://github.com/mongolab/dex

Cassandra performance review

 

Original article available here

 

Four years ago, well before starting DataStax, I evaluated the then-current crop of distributed databases and explained why I chose Cassandra. In a lot of ways, Cassandra was the least mature of the options, but I chose to take a long view and wanted to work on a project that got the fundamentals right; things like documentation and distributed testscould come later.

 

2012 saw that validated in a big way, as the most comprehensive NoSQL benchmark to date was published at the VLDB conference by researchers at the University of Toronto. They concluded,

In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear in- creasing throughput from 1 to 12 nodes.

As a sample, here’s the throughput results from the mixed reads, writes, and (sequential) scans:

I encourage you to take a few minutes to skim the full results.

There are both architectural and implentation reasons for Cassandra’s dominating performance here. Let’s get down into the weeds and see what those are.

Architecture

Cassandra incorporates a number of architectural best practices that affect performance. None are unique to Cassandra, but Cassandra is the only NoSQL system that incorporates all of them.

Fully distributed: Every Cassandra machine handles a proportionate share of every activity in the system. There are no special cases like the HDFS namenode or MongoDB mongos that require special treatment or special hardware to avoid becoming a bottleneck. And with every node the same, Cassandra is far simpler to install and operate, which has long-term implications for troubleshooting.

Log-structured storage engine: A log-structured engine that avoids overwrites to turn updates into sequential i/o is essential both on hard disks (HDD) and solid-state disks (SSD). On HDD, because the seek penalty is so high; on SSD, to avoid write amplification and disk failure. This is why you see mongodb performance go through the floor as the dataset size exceeds RAM.

Tight integration with its storage engine: Voldemort and Riak support pluggable storage engines, which both limits them to a lowest-common-denominator of key/value pairs, and limits the optimizations that can be done with the distributed replication engine.

Locally-managed storage: HBase has an integrated, log-structured storage engine, but relies on HDFS for replication instead of managing storage locally. This means HBase is architecturally incapable of supporting Cassandra-style optimizations like putting the commitlog on a separate disk, or mixing SSD and HDD in a single cluster with appropriate data pinned to each.

Implementation

An architecture is only as good as its implementation. For the first years after Cassandra’s open-sourcing as an Apache project, every release was a learning experience. 0.3, 0.4, 0.5, 0.6, each attracted a new wave of users that exposed some previously unimportant weakness. Today, we estimate there are over a thousand production deployments of Cassandra, the most for any scalable database. Some are listed here. To paraphrase ESR, “With enough eyes, all performance problems are obvious.”

What are some implementation details relevant to performance? Let’s have a look at some of the options.

MongoDB

MongoDB can be a great alternative to MySQL, but it’s not really appropriate for the scale-out applications targeted by Cassandra. Still, as early members of the NoSQL category, the two do draw comparisons.

One important limitation in MongoDB is database-level locking. That is, only one writer may modify a given database at a time. Support for collection-level (a set of documents, analogous to a relational table) locking is planned. With either database- or collection-level locking, other writers or readers are locked out. Even a small number of writes can produce stalls in read performance.

Cassandra uses advanced concurrent structures to provide row-level isolation without locking. Cassandra eveneliminated the need for row-level locks for index updates in the recent 1.2 release.

A more subtle MongoDB limitation is that when adding or updating a field in a document, the entire document must be re-written. If you pre-allocate space for each document, you can avoid the associated fragmentation, but even with pre-allocation updating your document gets slower as it grows.

Cassandra’s storage engine only appends updated data, it never has to re-write or re-read existing data. Thus, updates to a Cassandra row or partition stay fast as your dataset grows.

Riak

Riak presents a document-based data model to the end user, but under the hood it maps everything to a key/value storage API. Thus, like MongoDB, updating any field in a document requires rewriting the whole thing.

However, Riak does emphasize the use of log-structured storage engines. Both the default BitCask backend and LevelDB are log-structured. Riak increasingly emphasizes LevelDB since BitCask does not support scan operations (which are required for indexes), but this brings its own set of problems.

LevelDB is a log-structured storage engine with a different approach to compaction than the one introduced by Bigtable. LevelDB trades more compaction i/o for less i/o at read time, which can be a good tradeoff for many workloads, but not all. Cassandra added support for leveldb-style compaction about a year ago.

LevelDB itself is designed to be an embedded database for the likes of Chrome, and clear growing pains are evident when pressed into service as a multi-user backend for Riak. (A LevelDB configuration for Voldemort also exists.) Basho cites “one stall every 2 hours for 10 to 30 seconds”, “cases that can still cause [compaction] infinite loops,” and no way to create snapshots or backups as of the recently released Riak 1.2.

HBase

HBase’s storage engine is the most similar to Cassandra’s; both drew on Bigtable’s design early on.

But despite a later start, Cassandra’s storage engine is far ahead of HBase’s today, in large part because building on HDFS instead of locally-managed storage makes everything harder for HBase. Cassandra added online snapshotsalmost four years ago; HBase still has a long ways to go.

HDFS also makes SSD support problematic for HBase, which is becoming increasingly relevant as SSD price/performance improves. Cassandra has excellent SSD support and even support for mixed SSD and HDD within the same cluster, with data pinned to the medium that makes the most sense for it.

Other differences that may not show up at benchmark time, but you would definitely notice in production:

HBase can’t delete data during minor compactions — you have to rewrite all the data in a region to reclaim disk space. Cassandra has deleted tombstones during minor compactions for over two years.

While you are running that major compaction, HBase gives you no way to throttle it and limit its impact on your application workload. Cassandra introduced this two years ago and continues to improve it. Dealing with local storage also lets Cassandra avoid polluting the page cache with sequential scans from compaction.

Compaction might seem like bookkeeping details, but it does impact the rest of the system. HBase limits you to two or three column families because of compaction and flushing limitations, forcing you to do sub-optimal things to your data model as a workaround.

Cassandra

I honestly think Cassandra is one to two years ahead of the competition, but I’m under no illusions that Cassandra itself is perfect. We have plenty of improvements to make still; from the recently released Cassandra 1.2 to our ticket backlog, there is no shortage of work to do.

Here are some of the areas I’d like to see Cassandra improve this year:

If working on an industry-leading, open-source database doing cutting edge performance work on the JVM sounds interesting to you, please get in touch.

Switch your databases to SSD Storage

http://highscalability.com/blog/2012/12/10/switch-your-databases-to-flash-storage-now-or-youre-doing-it.html

 

Switch Your Databases To Flash Storage. Now. Or You’re Doing It Wrong.

The economics of flash memory are staggering. If you’re not using SSD, you are doing it wrong. 

Some small applications fit entirely in memory – less than 100GB – great for in-memory solutions….If you have a dataset under 10TB, and you’re still using rotational drives, you’re doing it wrong.

With The Right Database, Your Bottleneck Is The Network Driver, Not Flash

Networks are measured in bandwidth (throughput), but if your access patterns are random and low latency is required, each request is an individual network packet. Even with the improvements in Linux network processing, we find an individual core is capable of resolving about 100,000 packets per second through the Linux core.

100,000 packets per second aligns well with the capability of flash storage at about 20,000 to 50,000 per device, and adding 4 to 10 devices fits well in current chassis. RAM is faster – in Aerospike, we can easily go past 5,000,000 TPS in main memory if we remove the network bottleneck through batching – but for most applications, batching can’t be cleanly applied.

This bottleneck still exists with high-bandwidth networks, since the bottleneck is the processing of network interrupts. As multi-queue network cards become more prevalent (not available today on many cloud servers, such as the Amazon High I/O Instances), this bottleneck will ease – and don’t think switching to UDP will help. Our experiments show TCP is 40% more efficient than UDP for small transaction use cases.

 

The Top Myths Of Flash

1. Flash is too expensive.

Flash is 10x more expensive than rotational disk. However, you’ll make up the few thousand dollars you’re spending simply by saving the cost of the meetings to discuss the schema optimizations you’ll need to try to keep your database together. Flash goes so fast that you’ll spend less time agonizing about optimizations.

2. I don’t know which flash drives are good.

Aerospike can help. We have developed and open-source a tool (Aerospike Certification Tool) that benchmarks drives for real-time use cases, and we’re providing our measurements for old drives. You can run these benchmarks yourself, and see which drives are best for real-time use.

3. They wear out and lose my data.

Wear patterns and flash are an issue, although rotational drives fail too. There are several answers. When a flash drive fails, you can still read the data. A clustered database and multiple copies of the data, you gain reliability – a server level of RAID. As drives fail, you replace them. Importantly, new flash technology is available every year with higher durability, such as this year’s Intel S3700 which claims each drive can be rewritten 10 times a day for 5 years before failure. Next year may bring another generation of reliability. With a clustered solution, simply upgrade drives on machines while the cluster is online.

4. I need the speed of in-memory

Many NoSQL databases will tell you that the only path to speed is in-memory. While in-memory is faster, a database optimized for flash using the techniques below can provide millions of transactions per second with latencies under a millisecond.

Techniques For Flash Optimization

Many projects work with main memory because the developers don’t know how to unleash flash’s performance. Relational databases only speed up 2x or 3x when put on a storage layer that supports 20x more I/Os. Following are three programming techniques to significantly improve performance with flash.

1. Go parallel with multiple threads and/or AIO

Different SSD drives have different controller architectures, but in every case there are multiple controllers and multiple memory banks—or the logical equivalent. Unlike a rotational drive, the core underlying technology is parallel.

You can benchmark the amount of parallelism where particular flash devices perform optimally with ACT, but we find the sweet spot is north of 8 parallel requests, and south of 64 parallel requests per device. Make sure your code can cheaply queue hundreds, if not thousands, of outstanding requests to the storage tier. If you are using your language’s asynchronous event mechanism (such as a co-routine dispatch system), make sure it is efficiently mapping to an OS primitive like asynchronous I/O, not spawning threads and waiting.

2. Don’t use someone else’s file system

File systems are very generic beasts. As databases, with their own key-value syntax and interfaces, they have been optimized for a particular use, such as multiple names for one object and hierarchical names. The POSIX file system interface supplies only one consistency guarantee. To run at the speed of flash, you have to remove the bottleneck of existing file systems.

NoSQL Benchmark

There is probably no perfect NoSQL database. Every database has its advantages and disadvantages that become more or less important depending on your preferences and the type of tasks your trying to achieve.

Altoros Systems as performed an independent and interesting benchmark to help you sort out the current prons and crons between different solution including: HBase,Cassandra,Riak and MongoDb

http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/tech/2012/102212-nosql-263595.html

What makes this research unique?

Often referred to as NoSQL, non-relational databases feature elasticity and scalability in combination with a capability to store big data and work with cloud computing systems, all of which make them extremely popular. NoSQL data management systems are inherently schema-free (with no obsessive complexity and a flexible data model) and eventually consistent (complying with BASE rather than ACID). They have a simple API, serve huge amounts of data and provide high throughput.

In 2012, the number of NoSQL products reached 120-plus and the figure is still growing. That variety makes it difficult to select the best tool for a particular case. Database vendors usually measure productivity of their products with custom hardware and software settings designed to demonstrate the advantages of their solutions. We wanted to do independent and unbiased research to complement the work done by the folks at Yahoo.

Using Amazon virtual machines to ensure verifiable results and research transparency (which also helped minimize errors due to hardware differences), we have analyzed and evaluated the following NoSQL solutions:

● Cassandra, a column family store
● HBase (column-oriented, too)
● MongoDB, a document-oriented database
● Riak, a key-value store

We also tested MySQL Cluster and sharded MySQL, taking them as benchmarks.

After some of the results had been presented to the public, some observers said MongoDB should not be compared to other NoSQL databases because it is more targeted at working with memory directly. We certainly understand this, but the aim of this investigation is to determine the best use cases for different NoSQL products. Therefore, the databases were tested under the same conditions, regardless of their specifics.

MS Research crush the world record for data sorting in 60 second

A new approach to managing data over a network has enabled a Microsoft Research team to set a speed record for sifting through, or “sorting,” a huge amount of data in one minute.

The team conquered what is known as the MinuteSort benchmark—a measure of data-crunching speed devised by the late Jim Gray, a renowned Microsoft Research scientist, and deemed the “World Cup” of data sorting. The MinuteSort benchmark measures how quickly data can be sorted starting and ending on disks. Sorting is a basic function in computing, demonstrating the ability of a network to move and organize data so it can be analyzed and used.

The team, led by Jeremy Elson in the Distributed Systems group at Microsoft Research Redmond, set the new sort benchmark by using a radically different approach to sorting called Flat Datacenter Storage (FDS). The team’s system sorted almost three times the amount of data (1,401 gigabytes vs. 500 gigabytes) with about one-sixth the hardware resources (1,033 disks across 250 machines vs. 5,624 disks across 1,406 machines) used by the previous record holder, a team from Yahoo! that set the mark in 2009.

Two Hundred Bytes for Everybody

To put things in perspective, in one minute, the Microsoft Research team sorted the equivalent of two 100-byte data records for every human being on the planet.

The record is significant because it points toward a new method for crunching huge amounts of data using inexpensive servers. In an age when information is increasing in enormous quantities, the ability to move and deploy it is important for everything from web searches to business analytics to understanding climate change.

In practice, heavy-duty sorting can be used by enterprises looking through huge data sets for a competitive advantage. The Internet also has made data sorting critical. Advertisements on Facebook pages, custom recommendations on Amazon, and up-to-the-second search results on Bing all result from sorting.

The award for the team’s achievement will be presented during the 2012SIGMOD/PODS Conference, an international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results. This year’s conference occurs in Scottsdale, Ariz., from May 20 to 24.

The record-setting MinuteSort team
The record-setting MinuteSort team: (from left) Jon Howell, Jeremy Elson, Ed Nightingale, Yutaka Suzue, Jinliang Fan, Johnson Apacible, and Rich Draves.

The team, formed and led by Elson, included Johnson ApacibleRich DravesJinliang Fan,Owen HofmannJon HowellEd NightingaleReuben Olinsky, and Yutaka Suzue.

Their approach was to take a fresh look at a relatively old model for sorting data. More than a decade ago, a network of computers would access data on a single file server, and each computer saw all of the data.

But that model didn’t scale up as data centers became larger. Researchers at Google tackled that problem in 2004, creating a data-management scheme called MapReduce. It worked by essentially sending computation to the data, rather than dragging data to a computer. It made possible computation across huge data sets using large numbers of cheap computers. In recent years, the Apache Software Foundation developed an open-source version of MapReduce dubbed Hadoop.

MapReduce and Hadoop greatly advanced the state of data sorting. But, Elson says, they still weren’t perfect.

“Some kinds of computations just can’t be expressed that way,” he says of the drag-computation-to-the-data model. “If you have two big data sets and you want to join them, you have to move the data somehow.”

Three years ago, Elson, Nightingale, and Howell had an insight into how new advances in network bandwidth could lead to a simpler model of data sorting—one in which every computer saw all of the data—while also scaling to handle massive data sets.

The solution was dubbed Flat Datacenter Storage. Elson compares FDS to an organizational chart. In a hierarchical company, employees report to a superior, then to another superior, and so on. In a “flat” organization, they basically report to everyone, and vice versa.

FDS takes advantage of another technology Microsoft Research helped develop, called full bisection bandwidth networks. If you were to draw an imaginary line through a collection of computers connected by a full bisection bandwidth network, every computer on one side of the line could send data at full speed to every computer on the other side of the line, and vice versa, no matter where the line is drawn.

Using full bisection networks, the FDS team built a system that could transfer data at two gigabytes per second on each computer for input, with another two gigabytes for output.

New Techniques Needed

“That’s 20 times as much bandwidth as most computers in data centers have today,” Elson says, “and harnessing it required novel techniques.”

With that, the team was ready to take on the MinuteSort challenge. The contest actually has two parts: an “Indy” category, in which systems can be customized for the task of sorting, and a “Daytona” category, in which systems must meet requirements for general-purpose computing—think super-sleek, open-wheel Indianapolis 500 cars versus Daytona 500 stock cars that look a little like what you see on the street.

In 2011, a team from the University of California, San Diego set a record in the Indy category, sorting 1,353 gigabytes of data in a minute. In the Daytona category, the record had been held by a team from Yahoo!, which sorted 500 gigabytes of data in a minute.

The Microsoft Research team blew past both marks. Moreover, the team beat the standing Indy-sort record using a Daytona-class system. This isn’t the first time that has happened, Elson says, but it is rare.

The record represents a total efficiency improvement of almost 16 times. Interestingly, Microsoft Research set the record using a remote file system, which is an unusual choice of architecture for sorting because it commonly is perceived to be slow. Whereas most sorting systems read data locally from disk, exchange data once over the network, and write data locally to disk, in a remote file system, data is read, exchanged, and written over the network, so each data record crosses the network three times. The team deliberately handicapped the system to demonstrate the phenomenal performance of the new FDS file-system architecture.

Thus far, the Microsoft Research team has worked with the Bing team to help Bing accelerate its search results. The Microsoft Research engineering team is partially funded by Bing and has been actively supported by Harry Shum, the Microsoft corporate vice president who leads Core Search Development.

Exciting Breakthrough

“We are very excited about the MinuteSort breakthroughs made by our Microsoft Research colleagues,” Shum says. ”I look forward to taking advantage of the FDS technology to further online infrastructure for Bing and for Microsoft—and delivering even faster results to our users.”

Nightingale, co-leader of the FDS project with Elson, is working with Bing to integrate FDS to improve Bing’s efficiency and speed.

Given the ubiquity of interest in managing “big data,” the Microsoft Research work is apt to find a home in several computing fields. It could be used in the biological sciences, managing gene sequencing or helping to create new classes of drugs, or it might help in stitching together aerial photographs to give people better imagery of the planet.

The ability to sort data rapidly also will aid machine learning—the design and development of algorithms that enable computers to create predictions based on data, such as sensor data or information from databases. Microsoft Research has a big stake in machine learning, in work ranging from language processing to security applications.

“Improving big-data performance has a wide range of implications across a huge number of businesses,” Elson says. “Almost any big-data problem now becomes more efficient, which, in many cases, will be the difference between the work being economically feasible or not.”

For now, there’s also a lot of celebrating going on.

“Our hands,” Howell laughs, “are bruised from high-fiving.”

 

By Douglas Gantenbein

May 21, 2012 9:00 AM PT

Hypertable getting a new website and smashing HBase in performance

Hypertable just get a new website, check it out @ http://www.hypertable.com/, the documentation section have been deeply reviewed.

A new website, but not only, according to highscalability.com and their benchmark, Hypertable delivers 2X better throughput in most tests — HBase fails 41 and 167 billion record insert tests, overwhelmed by garbage collection — Both systems deliver similar results for random read uniform test

 

Read the full performance test at:

http://highscalability.com/blog/2012/2/7/hypertable-routs-hbase-in-performance-test-hbase-overwhelmed.html

 

Cassandra 1.0 unleashed high performance improvements

According to Cassandra developer center latest blog post about performance, the 1.0 release of Cassandra unleashed the following performance improvements:

  • Reads performance increased by 400%  !!!
  • Writes performance increased by 40%
  • Networks/Non-local operation are 15% faster

 

The full articles is available here:

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance

 

 

 

 

Solving your performance issue when dealing with big data

  1. Foresee and understand your performance issue

When dealing with big data, you will face performance problems  with the most simple and basic operation as soon as the processing require the whole data sets to be analyzed.

It is the case for instance when:

  • You aggregate data, to deliver summary statistics: action such as “count”,”min”,”avg” etc…
  • You need to sort your data

This in mind, you can easily and quickly anticipated issues in advance and start thinking about solving the problem.

  1. Solving the performance issue using technical tools

Compression is often a key solution to many performance issue as its require CPU speed which is currently always faster than i/o disk and i/o networks, so compression allow to speed up disk access, data transfer over network and eventually allow to  keep reduced data in memory.

Statistics can often apply to fasten your algorithm and are not necessarily complex, maintaining values range (min,max) or values distribution might fasten your processing resolution path.

Caching, deterministic result are provided by process independant from the data or based on data which rarely changed and forwhich you can assume they won’t change during your process time

Avoid data type conversions, because it’s always resources consuming

Balance then loadparalellized the processing and use map reduce :)

 

  1. Solving the performance issue by giving-up or resigning

We tend to refuse such approach, but sometimes it is a good exercise to go back and review why we do the things we do.

Can i approximate without altering significantly the result ?

Can i use a representative data sample instead of the whole data ?

At least do not avoid to think this way, wondering if solving an easier problem or looking for approximate result can’t finally bring you very close to the solution.