Visualize any public CSV on github in a few clicks

Statwing has published on its blog an amazing tools, based on a subset of its commercial solution.

But still, a great demonstration of data visualisation and online utility to explore open data.

The import wizard:

http://blog.statwing.com/visualize-any-public-csv-on-github-in-a-few-clicks/

Sample player-dataset visualization:

https://www.statwing.com/open/datasets/2179937bfbd56f8b2731b2937bb1c2dfd92ee8fb#workspaces/15411

 

 

Attribution-ShareAlike 4.0 International

Share-alike Attribution + ShareAlike (BY-SA)

 

 

Wanna share some open data ? ensure the subsequent contribution will benefit everyone ?

 

You are free to:

  • Adapt — remix, transform, and build upon the material
  • for any purpose, even commercially.
  • The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, andindicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
  • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

 

More information available here:  http://creativecommons.org/licenses/by-sa/4.0/

Sirius A distributed system library for managing application reference data

Sirius is a library for distributing and coordinating data updates amongst a cluster of nodes. It handles building an absolute ordering for updates that arrive in the cluster, ensuring that cluster nodes eventually receive all updates, and persisting the updates on each node. These updates are generally used to build in-memory data structures on each node, allowing applications using Sirius to have direct access to native data structures representing up-to-date data. Sirius does not, however, build these data structures itself — instead, the client application supplies a callback handler, which allows developers using Sirius to build whatever structures are most appropriate for their application.

Said another way: Sirius enables a cluster of nodes to keep developer-controlled in-memory data structures eventually consistent, allowing I/O-free access to shared information.

https://github.com/Comcast/sirius

IBM’s new Power8 chip technology unveiled

IBM Unveils Power8 Chip As Open Hardware. Google and other OpenPower Foundation partners express interest in IBM’s Power8 chip designs and server motherboard specs since Power8 has been designed with some specific big-data handling characteristics.It is, for example, an eight-threaded processor, meaning each of 12 cores in a CPU will coordinate the processing of eight sets of instructions at a time — a total of 96 processes. “processes” is to understood as a set of related instructions making up a discrete process within a program. By designating sections of an application that can run as a process and coordinate the results, a chip can accomplish more work than a single-threaded chip.

By licensing technology to partners, IBM is borrowing a tactic used by ARM in the market for chips used in smartphones and tablets. But the company faces an uphill battle.

More information:

http://openpowerfoundation.org/

http://bits.blogs.nytimes.com/

MongoDB 2.6 released

MongoDB 2.6 has been released with new majors features as primary target, but it also improve performance.

Performance improvements:

  • efficient use of network resources
  • oplog processing is 75% faster
  • classes of scan, sort, $in and $all performance are significantly improved
  • bulk operators for writes improve updates by as much as 5x.

Features improvements:

  • Text Search Integration
  • Insert and Update Improvements
  • A new write protocol integrates write operations with write concerns(The protocol also provides improved support for bulk operations)
  • A new authorization model that provides the ability to create custom User-Defined Roles and the ability to specify user privileges at a collection-level granularity.

Full release note

PostgreSQL introduced jsonb support

Binary JSON

PostgreSQL has introduce jsonb.. a diamond in the crown of PostgreSQL 9.4.Based on an elegant hash opclass for GIN, which competes with MongoDB performance in contains operator .

Feature’s documentation : http://www.postgresql.org/docs/devel/static/datatype-json.html

Feature’s story:  http://obartunov.livejournal.com/177247.html

hbase-0.98.0 has been released

hbase-0.98.0 has been released

This release includes:

  • several new security features like cell visibility labels, cell ACLs, and transparent server side encryption.
  • significant performance improvements, such as a change to the write ahead log threading model that provides higher transaction throughput under high load, reverse scanners, MapReduce over snapshot files, and striped compaction

The complete list of changes in this release can be found in the release notes: http://goo.gl/y25W2h

What do you know about SQL performance?

The 3-Minute Test: What do you know about SQL performance?

“SQL-Tuning is black magic like alchemy: it consists of obscure rules, understood only by a handful of insiders.”

That is a myth. SQL databases use well-known algorithms to deliver predictable performance. It is, however, easy to write SQL queries that cannot use the most efficient algorithm and thus deliver unexpected performance.

 http://use-the-index-luke.com/3-minute-test

 

 

Parallel programming

parallel_programming

VoltDB 4.0

VoltDB 4.0 is now available!

 The highlights of VoltDB v4.0 include:

  • Enhanced in-memory analytics capabilities with a host of new SQL support.
  • Greatly improved analytic read throughput performance.
  • Clusters can grow elastically, increasing both throughput and capacity, by adding nodes to running clusters without blocking ongoing operations.
  • Support for Groovy stored procedures, a message queue export connector, a MySQL migration utility and a host of other features.
  • Online training, free, at Volt University, along with Volt Vanguard Certification.

Here’s the details on what’s new in VoltDB v4.0.  You can download it here

Official announce from VoltDB blog: http://voltdb.com/announcing-voltdb-4-0-enhanced-in-memory-analytics-and-online-elasticity/

Enhanced In-Memory Analytics

VoltDB is renown for its ability to execute very fast writes – we’ve benchmarked writes into the millions of transactions per second range, on small clusters, running on bare metal as well as cloud instances.

But fast writes without fast reads are less useful. Since its first version, VoltDB has allowed for transactional reads to support writes as well as provide a window into fast changing data.

At ingestion or processing time, stored procedures can transactionally perform lookups and queries as data is coming into the system, allowing for richer writes at scale. Separately, global transactional reads can trigger events, support dashboards and even live decisioning on immediate data. Mixing complex reads and writes transactionally, and at scale, has traditionally separated VoltDB from other write-heavy systems.

In 4.0, VoltDB has added both features and improved performance of analytic-focused read queries. We’re focused on helping users understand their data as soon as they have it.

First, VoltDB delivers major new SQL capabilities, now supporting SQL UNION, self/outer/explicit JOIN, CASE, HAVING, SQL IN, Group-by column functions and materialized view group-by column functions. Our SQL support is approaching SQL-92 compatibility, while also adding non-standard features to support our key use cases. For example, VoltDB can now build a materialized view that aggregates the value of a JSON field by 5-minute time windows.

Second, we’ve removed some of the transactional overhead when running many kinds of global read queries, including ad-hoc SQL. These queries are still reading a live, fully serializable view of committed data, but they’re now up to 50x faster. This directly translates into more powerful dashboards, more consumers of analytics and richer decisioning.

Online Elastic Database

As a natively clustered database, VoltDB can scale to meet the needs of almost any high-velocity application. While some users have megabytes of state, others have terabytes. While some users process hundreds of operations per second, others process millions. But what if your business is growing and you want your VoltDB cluster to grow with it? Since we shipped VoltDB 1.0, users have been asking to add nodes to the cluster without any interruption in service, to rebalance data in the background while their apps continue to work for them.

We call this feature “elasticity” and it’s shipping in VoltDB 4.0. VoltDB can now seamlessly add nodes to a running cluster, increasing storage and throughput with each new node. Since all topology changes and data movement are transactional and durable, your data is protected while the cluster is expanding or rebalancing. We’ve also carefully engineered this feature so that most customers will see no impact to their workload during expansion.

Online Operations

In addition to the new online elasticity, VoltDB also supports online catalog and schema updates.  Tables can be added or dropped. Columns, indexes and materialized views can be added, dropped and modified.  And of course, stored procedures can be added, updated, or deleted.

VoltDB upgraded its network partition and fault detection handling.  In the real world, network partitions happen for a variety of reasons. Both hardware and software issues can cause a node to become unavailable, or unreachable to other nodes in a cluster. For version 4.0, VoltDB enhanced it’s fault detection and recovery functionality to remain available in the face even more kinds of failure.  The result is increased availability of VoltDB clusters where links might break, for example, when VoltDB nodes are running in different availability zones.

Groovy – Our First Non-Java Stored Procedure Language

In VoltDB 4.0 we’ve added the ability to code the procedure implementation in the DDL itself with inline Groovy scripts. Check out our Groovy Voter sample for a familiar example of VoltDB processing in Groovy.  We hope this is the first of many new stored procedure languages – please drop us a note if you have a favorite, or would like to contribute to this effort.

Integrations, Migrations and Miscellaneous Features

VoltDB v4.0 introduces a whole host of other features. I’ll run through them here quickly:

  • VoltDB Export allows you to transactionally push data from VoltDB into another system, similar to an ETL (extract, transform, load) process. In 3.x, we could export to systems using JDBC, as well as to flat files. VoltDB 4.0 adds a new Export connector to leverage message queues. Presently available as a Beta, you can now export to a message queue using Kafka. Feel free to contact us if you’d like early access.
  • We’ve added three new @Statistics selectors that can help identify performance hot spots in your application.  They are PROCEDUREPROFILE, identifying the percentage of execution time each procedure takes, PROCEDUREINPUT, identifying the breakdown of data flow into stored procedures, and PROCEDUREOUTPUT, identifying the result set data flow from stored procedures.  These new statistics help you to quickly identify what transactions are taking the most time or I/O in your application.
  • You can now get a leg up porting your MySQL to VoltDB.  Our new utility, fondly called Voltify, will extract your database schema from your MySQL database and create a VoltDB catalog automatically.  This utility, coupled with our high performance CSV loader, enables you to rapidly move your MySQL database to VoltDB.
  • The VoltDB JDBC driver has been enhanced to support parameterized ad hoc SQL statements, setting query timeouts, as well as additional metadata methods.

VoltDB Training

VoltDB recently rolled out a new offering, Volt University, offering free online training on VoltDB key concepts.  There are ten lessons, found at http://voltdb.com/resources/volt-university/tutorials/ that can help you come up to VoltDB-speed quickly. Additionally, should you wish formal Volt Vanguard certification, we are offering an online official certification course, which you can register for here: https://university.voltdb.com/

Download VoltDB 4.0

We here at VoltDB are very excited about the release and hope you are too.  You can download VoltDB 4.0 here
.