Definition | Data story

Definition: Availability1

Availability = uptime / (uptime + downtime)

Availability from a technical perspective is mostly about being fault tolerant. Because the probability of a failure occurring increases with the number of components, the system should be able to compensate so as to not become less reliable as the number of components increases.

For example, availability rate for a given service over an entire year mean the following:

Availability % How much downtime is allowed per year?
90% (“one nine”) More than a month
99% (“two nines”) Less than 4 days
99.9% (“three nines”) Less than 9 hours
99.99% (“four nines”) Less than an hour
99.999% (“five nines”) ~ 5 minutes
99.9999% (“six nines”) ~ 31 seconds

The 4 V’s of Big Data3

The 4 V’s of Big Data

The challenges associated with Big Data are the “4 V’s”: Volume, Velocity, Variety, and Value.

The "4 V's" of Big Data: Volume, Velocity, Variety, and Value.           <em>Source: Oracle.</em>

The “4 V’s” of Big Data: Volume, Velocity, Variety, and Value. Source: Oracle.

 

  • The Volume challenge exists because most businesses generate much more data than what their systems were designed to handle.
  • The Velocity challenge exists if a company’s data analysis or data storage runs slower than its data generation. This could be because of customer clicks on your website or thousands of sales transactions every second — a good problem to have.
  • The Variety challenge exists because of the need to process different types of data to produce the desired insights. This could include, for example, analyzing data from social networks, databases and customer service call records at the same time.
  • The Value challenge applies to deriving valuable insights from data, which is the most important of all V’s in my view. A company can usually collect all the data but the challenge is to ask the right questions to get value from it.

What exactly is big data?1

Explanations vary, of course, but we might agree that big data is high-volume, high-velocity, and high-variety information that requires new tools and skills to manage.

Definition “information” term1

Drucker (1988) wrote, information is data endowed with meaning and purpose

Definition “YAML” term1

YAML  is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail (RFC 2822). YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. It is available for several programming languages.

YAML is a recursive acronym for “YAML Ain’t Markup Language”. Early in its development, YAML was said to mean “Yet Another Markup Language”, but was retronymed to distinguish its purpose as data-oriented, rather than document markup.

 

Sample document

Data structure hierarchy is maintained by outline indentation.


Sample document

Data structure hierarchy is maintained by outline indentation.

---
receipt:     Oz-Ware Purchase Invoice
date:        2007-08-06
customer:
    given:   Dorothy
    family:  Gale

items:
    - part_no:   A4786
      descrip:   Water Bucket (Filled)
      price:     1.47
      quantity:  4

    - part_no:   E1628
      descrip:   High Heeled "Ruby" Slippers
      size:      8
      price:     100.27
      quantity:  1

bill-to:  &id001
    street: |
            123 Tornado Alley
            Suite 16
    city:   East Centerville
    state:  KS

ship-to:  *id001

specialDelivery:  >
    Follow the Yellow Brick
    Road to the Emerald City.
    Pay no attention to the
    man behind the curtain.
...

 

Standard review – ISO 4217 – Currency3

ISO 4217 is a standard published by the International Standards Organization, which delineates currency designators, country codes (alpha and numeric) and references to minor units in three tables:

An updated and freely data source for Country and Currency code is available here:

http://www.commondatahub.com/static/geography/currency/country_currency_codes

The ISO 4217 maintenance agency (MA), SIX Interbank Clearing, is responsible for maintaining the list of codes.

Definition “Kalman filter” term1

According to wikipedia, we found the following definition

The Kalman filter, also known as linear quadratic estimation (LQE), is an algorithm which uses a series of measurements observed over time, containing noise (random variations) and other inaccuracies, and produces estimates of unknown variables that tend to be more precise than those that would be based on a single measurement alone. More formally, the Kalman filter operatesrecursively on streams of noisy input data to produce a statistically optimal estimate of the underlying system state. The filter is named for Rudolf (Rudy) E. Kálmán, one of the primary developers of its theory.

The Kalman filter has numerous applications in technology. A common application is for guidance, navigation and control of vehicles, particularly aircraft and spacecraft. Furthermore, the Kalman filter is a widely applied concept in time series econometrics.

The algorithm works in a two-step process: in the prediction step, the Kalman filter produces estimates of the current state variables, along with their uncertainties. Once the outcome of the next measurement (necessarily corrupted with some amount of error, including random noise) is observed, these estimates are updated using a weighted average, with more weight being given to estimates with higher certainty. Because of the algorithm’s recursive nature, it can run in real time using only the present input measurements and the previously calculated state; no additional past information is required.

From a theoretical standpoint, the main assumption of the Kalman filter is that the underlying system is a linear dynamical system and that all error terms and measurements have a Gaussian distribution (often a multivariate Gaussian distribution). Extensions and generalizations to the method have also been developed, such as the Extended Kalman Filter and the Unscented Kalman filter which work on nonlinear systems. The underlying model is a Bayesian model similar to a hidden Markov model but where the state space of the latent variables is continuous and where all latent and observed variables have Gaussian distributions.

 

 

About JSON genesis2

Video from  IEEE Computing Conversations

Interview with Douglas Crockford about the development of JavaScript Object Notation (JSON)

 

  • Crockford is likeably humble about the origins of JSON. Rather than claiming he inventedJSON he instead says he discovered it:
“I don’t claim to have invented it, because it already existed in nature. I just saw it, recognized the value of it, gave it a name, and a description, and showed its benefits. I don’t claim to be the only person to have discovered it.”

 

  • Crockford tried very hard to strip unnecessary stuff from JSON so it stood a better chance of being language independent. When confronted with push back about JSON not being a “standard” Crockford registered json.org, put up a specification that documented the data format, and declared it as a standard.

 

  • Crockford wanted something that made his life easier. He needed JSON when building an application where a client written in JavaScript needed to communicate with a server written in Java.   He wanted something where the data serialization matched the data structures available to both programming language environments.

 

IETF working on a convention for HTTP access to JSON resources2

Internet draft is working on A Convention for HTTP Access to JSON Resources

Abstract

This document codifies a convention for accessing JSON representations of resources via HTTP.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

 

http://tools.ietf.org/html/draft-pbryan-http-json-resource-01

Steve Jobs1

“Many companies forget what it means to make great products. After initial success, sales and marketing people take over and the product people eventually make their way out.”

 

Follow LuxNoSQL on Twitter
 
Join the LuxNoSQL Community on LinkedIn