Visualize any public CSV on github in a few clicks

Statwing has published on its blog an amazing tools, based on a subset of its commercial solution.

But still, a great demonstration of data visualisation and online utility to explore open data.

The import wizard:

http://blog.statwing.com/visualize-any-public-csv-on-github-in-a-few-clicks/

Sample player-dataset visualization:

https://www.statwing.com/open/datasets/2179937bfbd56f8b2731b2937bb1c2dfd92ee8fb#workspaces/15411

 

 

Attribution-ShareAlike 4.0 International

Share-alike Attribution + ShareAlike (BY-SA)

 

 

Wanna share some open data ? ensure the subsequent contribution will benefit everyone ?

 

You are free to:

  • Adapt — remix, transform, and build upon the material
  • for any purpose, even commercially.
  • The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, andindicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
  • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

 

More information available here:  http://creativecommons.org/licenses/by-sa/4.0/

Wrangler the smartest ETL and much more

Wrangler is an interactive tool for data cleaning and transformation. A smart ETL software which allows to spend less time formatting and more time analyzing your data. Take the time to watch the below video, its really worth it.

Official website:  http://vis.stanford.edu/wrangler/

BigData – Key Figures

Back to basics, facts and key figures about the data:
  • Bad data or poor data quality costs US businesses $600 billion annually.
  • 247 billion e-mail messages are sent each day… up to 80% of them are spam.
  • Poor data or “lack of understanding the data” are cited as the #1 reasons for overrunning project costs.
  • 70% of data is created by individuals – but enterprises are responsible for storing and managing 80% of it. (source)
  • We can expect a 40-60 per cent projected annual growth in the volume of data generated, while media intensive sectors, including financial services, will see year on year data growth rates of over 120 per cent.
  • Every hour, enough information is consumed by internet traffic to fill 7 million DVDs.  Side by side, they’d scale Mount Everest 95 times.
  • The volume of data that businesses collect is exploding: in 15 of the US economy’s 17 sectors, for example, companies with upward of 1,000 employees store, on average, more information than the Library of Congress does (source).
  • 48 hours worth of video is posted on YouTube every hour of everyday (source).
  • Every month 30 billion pieces of content are shared on Facebook (source).
  • By 2020 the production of data will be 44 times what we produced in 2009. (source)
  • If an average Fortune 1000 company can increase the usability of its data by just 10%, the company could expect an increase of over 2 billion dollars. (Source: InsightSquared infographic)

Standard review – ISO 4217 – Currency

ISO 4217 is a standard published by the International Standards Organization, which delineates currency designators, country codes (alpha and numeric) and references to minor units in three tables:

An updated and freely data source for Country and Currency code is available here:

http://www.commondatahub.com/static/geography/currency/country_currency_codes

The ISO 4217 maintenance agency (MA), SIX Interbank Clearing, is responsible for maintaining the list of codes.

Definition “Gutmann method” term

The delete function in most operating system simply marks the space occupied by the file as reusable without immediately removing any of its contents. This allow to recover the data.

 

The Gutmann method is an algorithm for securely erasing the contents of computer hard drives.

 

About Peter Gutmann, he’s a professor at University of Auckland in New Zealand, specialising in network security.

He has published a number of well-known papers including:

  • Secure Deletion of Data from Magnetic and Solid-State Memory, a classic paper used as a reference by many disk-wiping utilities.
  • A Cost Analysis of Windows Vista Content Protection, with the rather memorable “executive executive summary”:

The Vista Content Protection specification could very well constitute the longest suicide note in history.

  • Software Generation of Practically Strong Random numbers

 

 

Technical overview

One standard way to recover data that has been overwritten on a hard drive is to capture and process the analog signal obtained from the drive’s read/write head prior to this analog signal being digitized. This analog signal will be close to an ideal digital signal, but the differences will reveal important information. By calculating the ideal digital signal and then subtracting it from the actual analog signal, it is possible to amplify the signal remaining after subtraction and use it to determine what had previously been written on the disk.

For example:

Analog signal:        +11.1  -8.9  +9.1 -11.1 +10.9  -9.1
Ideal Digital signal: +10.0 -10.0 +10.0 -10.0 +10.0 -10.0
Difference:            +1.1  +1.1  -0.9  -1.1  +0.9  +0.9
Previous signal:      +11    +11   -9   -11    +9    +9

This can then be done again to see the previous data written:

Recovered signal:     +11    +11   -9   -11    +9    +9
Ideal Digital signal: +10.0 +10.0 -10.0 -10.0 +10.0 +10.0
Difference:            +1    +1    +1    -1    -1    -1
Previous signal:      +10   +10   -10   -10   +10   +10

However, even when overwriting the disk repeatedly with random data it is theoretically possible to recover the previous signal. The permittivity of a medium changes with the frequency of the magnetic field. This means that a lower frequency field will penetrate deeper into the magnetic material on the drive than a high frequency one[citation needed]. So a low frequency signal will, in theory still be detectable even after it has been overwritten hundreds of times by a high frequency signal.

The patterns used are designed to apply alternating magnetic fields of various frequencies and various phases to the drive surface and thereby approximate degaussing the material below the surface of the drive