Data story

Big Data top paying skills1

 According to kdnuggets the Big Data related skills led the list of top paying technical skills (six-figure salaries) in 2013.

The study focus on  technology professionals in the U.S. who enjoyed raises over the last year(2013).

Average U.S. tech salaries increased nearly three percent to $87,811 in 2013, up from $85,619 the previous year.Technology professionals understand they can easily find ways to grow their career in 2014, with two-thirds of respondents (65%) confident in finding a new, better position. That overwhelming confidence matched with declining salary satisfaction (54%, down from 57%) will keep tech-powered companies on edge about their retention strategies.

Companies are willing to pay hefty amounts to professionals with Big Data skills.


According to a report released on Jan 29, 2014 an average salary for a professional having knowledge and experience in programming language R was $115,531 in year 2013. 

Other Big Data oriented skills such as NoSQL, MapReduce, Cassandra, Pig, Hadoop, MongoDB are among top 10 paying skills. 

 

Source: kdnuggets

Visualize any public CSV on github in a few clicks1

Statwing has published on its blog an amazing tools, based on a subset of its commercial solution.

But still, a great demonstration of data visualisation and online utility to explore open data.

The import wizard:

http://blog.statwing.com/visualize-any-public-csv-on-github-in-a-few-clicks/

Sample player-dataset visualization:

https://www.statwing.com/open/datasets/2179937bfbd56f8b2731b2937bb1c2dfd92ee8fb#workspaces/15411

 

 

Attribution-ShareAlike 4.0 International1

Share-alike Attribution + ShareAlike (BY-SA)

 

 

Wanna share some open data ? ensure the subsequent contribution will benefit everyone ?

 

You are free to:

  • Adapt — remix, transform, and build upon the material
  • for any purpose, even commercially.
  • The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, andindicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
  • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

 

More information available here:  http://creativecommons.org/licenses/by-sa/4.0/

Sirius A distributed system library for managing application reference data1

Sirius is a library for distributing and coordinating data updates amongst a cluster of nodes. It handles building an absolute ordering for updates that arrive in the cluster, ensuring that cluster nodes eventually receive all updates, and persisting the updates on each node. These updates are generally used to build in-memory data structures on each node, allowing applications using Sirius to have direct access to native data structures representing up-to-date data. Sirius does not, however, build these data structures itself — instead, the client application supplies a callback handler, which allows developers using Sirius to build whatever structures are most appropriate for their application.

Said another way: Sirius enables a cluster of nodes to keep developer-controlled in-memory data structures eventually consistent, allowing I/O-free access to shared information.

https://github.com/Comcast/sirius

IBM’s new Power8 chip technology unveiled1

IBM Unveils Power8 Chip As Open Hardware. Google and other OpenPower Foundation partners express interest in IBM’s Power8 chip designs and server motherboard specs since Power8 has been designed with some specific big-data handling characteristics.It is, for example, an eight-threaded processor, meaning each of 12 cores in a CPU will coordinate the processing of eight sets of instructions at a time — a total of 96 processes. “processes” is to understood as a set of related instructions making up a discrete process within a program. By designating sections of an application that can run as a process and coordinate the results, a chip can accomplish more work than a single-threaded chip.

By licensing technology to partners, IBM is borrowing a tactic used by ARM in the market for chips used in smartphones and tablets. But the company faces an uphill battle.

More information:

http://openpowerfoundation.org/

http://bits.blogs.nytimes.com/

MongoDB 2.6 released1

MongoDB 2.6 has been released with new majors features as primary target, but it also improve performance.

Performance improvements:

  • efficient use of network resources
  • oplog processing is 75% faster
  • classes of scan, sort, $in and $all performance are significantly improved
  • bulk operators for writes improve updates by as much as 5x.

Features improvements:

  • Text Search Integration
  • Insert and Update Improvements
  • A new write protocol integrates write operations with write concerns(The protocol also provides improved support for bulk operations)
  • A new authorization model that provides the ability to create custom User-Defined Roles and the ability to specify user privileges at a collection-level granularity.

Full release note

PostgreSQL introduced jsonb support3

Binary JSON

PostgreSQL has introduce jsonb.. a diamond in the crown of PostgreSQL 9.4.Based on an elegant hash opclass for GIN, which competes with MongoDB performance in contains operator .

Feature’s documentation : http://www.postgresql.org/docs/devel/static/datatype-json.html

Feature’s story:  http://obartunov.livejournal.com/177247.html

hbase-0.98.0 has been released1

hbase-0.98.0 has been released

This release includes:

  • several new security features like cell visibility labels, cell ACLs, and transparent server side encryption.
  • significant performance improvements, such as a change to the write ahead log threading model that provides higher transaction throughput under high load, reverse scanners, MapReduce over snapshot files, and striped compaction

The complete list of changes in this release can be found in the release notes: http://goo.gl/y25W2h

What do you know about SQL performance?1

The 3-Minute Test: What do you know about SQL performance?

“SQL-Tuning is black magic like alchemy: it consists of obscure rules, understood only by a handful of insiders.”

That is a myth. SQL databases use well-known algorithms to deliver predictable performance. It is, however, easy to write SQL queries that cannot use the most efficient algorithm and thus deliver unexpected performance.

 http://use-the-index-luke.com/3-minute-test

 

 

Parallel programming1

parallel_programming

Follow LuxNoSQL on Twitter
 
Join the LuxNoSQL Community on LinkedIn