According to EnterpriseDB’s recent benchmark, Postgres Outperforms MongoDB and Ushers in New Developer Reality
Potgres would outperform MongoDB performance but also the MongoDB data size requirement would be outperformed by by approx. 25%
EDB found that Postgres outperforms MongoDB in selecting, loading and inserting complex document data in key workloads involving 50 million records:
- Ingestion of high volumes of data was approximately 2.1 times faster in Postgres
- MongoDB consumed 33% more the disk space
- Data inserts took almost 3 times longer in MongoDB
- Data selection took more than 2.5 times longer in MongoDB than in Postgres
Find the full article here
The benchmark tools is available on GitHub: https://github.com/EnterpriseDB/pg_nosql_benchmark
I tried recently to explain how it is not one or the other: SQL, NoSQL once again is not the question, the choice to be made. But instead, it is the the underlying issues which has to be understood and used to drive your choice.
If you’re application can’t serve any longer its users, whatever how good and smart it used to work it is no longer working ….. So scaling, trough the techniques of clustering,sharding and distributed process had become a must. One requirement that few RDBMS have been able to implement. Obviously the historical reasons, the old ways, are responsible: traditionally the SQL database was running on a single machine (one single big server with the biggest cpu available and all the RAM you could have afford). Before scaling solutions were made available, performance issue tried to be solved using cache techniques(memcached was created in 2003) but is all the same problem, if your application and service stop to serve its users it is game over.
- ACID – transactional database
Most application does not need to support transaction, the ability for a single process to perform multiple data-manipulation and finally enforce this set of operations or cancel them all, at any step, those rolling back to the initial data situation(before your program starts). Such feature, is available for all programs(and related instance) accessing a database concurrently. Such magic and complex set of features ensure to provide so called consistency and integrity. As I said, most application does not need to support transaction. Most NoSQL databases are non-ACID and does not support transaction.
Traditional RDBMS have relied on the relational models which can be overly restrictive. A strong relational models, when modelling complex data, requires skills and time to be created, maintained and documented(in view of knowledge transfer). In practice the relational data model will limit your future development since you can’t easily change a relational models. The NoSQL solution provides different data structure such as document,graph and key-value which enable non-relational data models. To make a long story short ,the data model (relational or not) will not ease your designs (still highly critical) but it will eventually ease its implementations.
Jeudi 2 octobre 2014 de 18:00 à 20:00 – Luxembourg City, Luxembourg
18H00 : Introduction à Cassandra – Duy Hai DOAN (DataStax)
Cassandra est la base NoSQL orientée colonnes derrière les grandes entreprises comme Netflix, Sony Entertainment, Apple …
Une première session couvre la présentation générale de Cassandra et de son architecture. La deuxième session aborde le modèle de données et les bonnes pratiques de modélisation: comment passer du monde SQL au monde NoSQL avec Cassandra.
19H10 : Outillage de la solution Cassandra – Michaël Figuière (DataStax)
Présentation des outils pour aider le développeur Java à travailler efficacement avec Cassandra.
According to kdnuggets the Big Data related skills led the list of top paying technical skills (six-figure salaries) in 2013.
The study focus on technology professionals in the U.S. who enjoyed raises over the last year(2013).
Average U.S. tech salaries increased nearly three percent to $87,811 in 2013, up from $85,619 the previous year.Technology professionals understand they can easily find ways to grow their career in 2014, with two-thirds of respondents (65%) confident in finding a new, better position. That overwhelming confidence matched with declining salary satisfaction (54%, down from 57%) will keep tech-powered companies on edge about their retention strategies.
Companies are willing to pay hefty amounts to professionals with Big Data skills.
According to a report released on Jan 29, 2014 an average salary for a professional having knowledge and experience in programming language R was $115,531 in year 2013.
Other Big Data oriented skills such as NoSQL, MapReduce, Cassandra, Pig, Hadoop, MongoDB are among top 10 paying skills.
Statwing has published on its blog an amazing tools, based on a subset of its commercial solution.
But still, a great demonstration of data visualisation and online utility to explore open data.
The import wizard:
Sample player-dataset visualization:
Attribution + ShareAlike (BY-SA)
Wanna share some open data ? ensure the subsequent contribution will benefit everyone ?
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material
- for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give , provide a link to the license, and. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the as the original.
- No additional restrictions — You may not apply legal terms or that legally restrict others from doing anything the license permits.
More information available here: http://creativecommons.org/licenses/by-sa/4.0/
Sirius is a library for distributing and coordinating data updates amongst a cluster of nodes. It handles building an absolute ordering for updates that arrive in the cluster, ensuring that cluster nodes eventually receive all updates, and persisting the updates on each node. These updates are generally used to build in-memory data structures on each node, allowing applications using Sirius to have direct access to native data structures representing up-to-date data. Sirius does not, however, build these data structures itself — instead, the client application supplies a callback handler, which allows developers using Sirius to build whatever structures are most appropriate for their application.
Said another way: Sirius enables a cluster of nodes to keep developer-controlled in-memory data structures eventually consistent, allowing I/O-free access to shared information.
MongoDB 2.6 has been released with new majors features as primary target, but it also improve performance.
- efficient use of network resources
- oplog processing is 75% faster
- classes of scan, sort, $in and $all performance are significantly improved
- bulk operators for writes improve updates by as much as 5x.
- Text Search Integration
- Insert and Update Improvements
- A new write protocol integrates write operations with write concerns(The protocol also provides improved support for bulk operations)
- A new authorization model that provides the ability to create custom User-Defined Roles and the ability to specify user privileges at a collection-level granularity.
PostgreSQL has introduce jsonb.. a diamond in the crown of PostgreSQL 9.4.Based on an elegant hash opclass for GIN, which competes with MongoDB performance in contains operator .
Feature’s documentation : http://www.postgresql.org/docs/devel/static/datatype-json.html
Feature’s story: http://obartunov.livejournal.com/177247.html