Idea: Divide access to a synchronization variable into two parts: an acquire and a release phase . Acquire forces a requester to wait until the shared data can
be accessed; release sends requester’ s local value to shared memory.
- Accesses to synchronization variables are sequentially consistent.
- No access to a synchronization variable is allowed to be performed until all previous writes have completed everywhere.
- No data access is allowed to be performed until all previous accesses to synchronization variables have been performed.
Basic idea: You don’t care that reads and writes of a series of operations are immediately known to other processes. You just want the effect of the series as a whole to be known.
Observation: Weak consistency implies that we need to lock and unlock data (implicitly or not).
Pipelined RAM Consistency
Writes done by a single process are received by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes .
Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order by different processes.
The result of any execution is the same as if the operations of all processes were executed in some sequential order, and the operations of each individual process appear in this sequence in the order specified by its program.
Note: We’ re talking about interleaved executions: there is some total ordering for all operations taken together .
r + w >= t, where
- r : read time
- w: write time
- t: minmal packet transmission time between nodes
Any read to a shared data item X returns the value stored by the most recent write operation on X.
Observation: It doesn’t make sense to talk about “the most recent” in a distributed environment.
- Assume all data items have been initialized to 0
- W(x)1: value 1 is written to x
- R(x)1: reading x returns the value 1
Note: Strict consistency is what you get in the normal sequential case, where your program does not interfere with any other program.
Consistency may be used to describe various different form of data-centric coherence models
We will start a series of “definition term” posts, focused on the various form of “consistency”, find hereafeter the summary:
Strong consistency models: Operations on shared data are synchronized:
- Strict consistency (related to time)
- Sequential consistency (what we are used to)
- Causal consistency (maintains only causal relations)
- PRAM consistency (maintains only individual ordering)
Weak consistency models: Synchronization occurs only when shared data is locked and unlocked:
- General weak consistency
- Release consistency
- Entry consistency
Observation: The weaker the consistency model, the easier it is to build a scalable solution.
Linked Data is about using the Web to connect related data that wasn’t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.”
Wikipedia article: http://en.wikipedia.org/wiki/Linked_Data
Linked data community homepage: http://linkeddata.org/home
Linked data visual representation:
Data modeling is the product of the database’s design process which aims to identify and organize the required data logically and physically.
A data model says what information is to be contained in a database, how the information will be used, and how the items in the database will be related to each other.
It can be difficult to change a database layout once code has been written and data inserted. A well thought-out data model reduces the need for such changes. Data modelling enhances application maintainability and future systems may re-use parts of existing models, which should lower development costs. A data modelling language is a mathematical formalism with a notation for describing data structures and a set of operations used to manipulate and validate that data. One of the most widely used methods for developing data models is the “entity-relationship model”.
A data model can be thought of as a diagram or flowchart that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, it’s an important step and shouldn’t be rushed. Well-documented models allow stake-holders to identify errors and make changes before any programming code
Data mining is sorting out data in order to identify patterns inside data set and establishing relationships.
Data mining include:
- Association - looking for patterns where one event is connected to another aka “path analysis” or where one event leads to another later event
- Classification - creating new patterns and resulting in a change in the way the data is organized
- Clustering – finding and visually documenting groups of facts not previously known
- Forecasting - discovering patterns in data that can lead to reasonable predictions about the future aka “predictive analytics”
Data mining standards are available here: http://www.dmg.org/