Time, Clocks and the Ordering of Events in a Distributed System

Written in 1978 by Leslie Lamport, this is a must read paper freely available hereafter:

http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf

Time, Clocks and the Ordering of Events in a Distributed System

Communications of the ACM 21, 7   (July 1978), 558-565.  Reprinted in several collections, including Distributed Computing: Concepts and Implementations, McEntire et al., ed.  IEEE Press, 1984.
PDF
Copyright © 1978 by the Association for Computing Machinery, Inc.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM’s Digital Library –http://www.acm.org/dl/.


Jim Gray once told me that he had heard two different opinions of this paper: that it’s trivial and that it’s brilliant.  I can’t argue with the former, and I am disinclined to argue with the latter.

The origin of this paper was a note titled The Maintenance of Duplicate Databases by Paul Johnson and Bob Thomas.  I believe their note introduced the idea of using message timestamps in a distributed algorithm.  I happen to have a solid, visceral understanding of special relativity (see [5]).  This enabled me to grasp immediately the essence of what they were trying to do.  Special relativity teaches us that there is no invariant total ordering of events in space-time; different observers can disagree about which of two events happened first.  There is only a partial order in which an event e1 precedes an event e2 iff e1 can causally affect e2.  I realized that the essence of Johnson and Thomas’s algorithm was the use of timestamps to provide a total ordering of events that was consistent with the causal order.  This realization may have been brilliant.  Having realized it, everything else was trivial.  Because Thomas and Johnson didn’t understand exactly what they were doing, they didn’t get the algorithm quite right; their algorithm permitted anomalous behavior that essentially violated causality.  I quickly wrote a short note pointing this out and correcting the algorithm.

It didn’t take me long to realize that an algorithm for totally ordering events could be used to implement any distributed system.  A distributed system can be described as a particular sequential state machine that is implemented with a network of processors.  The ability to totally order the input requests leads immediately to an algorithm to implement an arbitrary state machine by a network of processors, and hence to implement any distributed system.  So, I wrote this paper, which is about how to implement an arbitrary distributed state machine.  As an illustration, I used the simplest example of a distributed system I could think of–a distributed mutual exclusion algorithm.

This is my most often cited paper.  Many computer scientists claim to have read it.  But I have rarely encountered anyone who was aware that the paper said anything about state machines.  People seem to think that it is about either the causality relation on events in a distributed system, or the distributed mutual exclusion problem.  People have insisted that there is nothing about state machines in the paper.  I’ve even had to go back and reread it to convince myself that I really did remember what I had written.

The paper describes the synchronization of logical clocks.  As something of an afterthought, I decided to see what kind of synchronization it provided for real-time clocks.  So, I included a theorem about real-time synchronization.  I was rather surprised by how difficult the proof turned out to be.  This was an indication of what lay ahead in [62].

This paper won the 2000 PODC Influential Paper Award (later renamed the Edsger W. Dijkstra Prize in Distributed Computing).  It won an ACM SIGOPS Hall of Fame Award in 2007.

New Flashbook: DB2 10.5 with BLU Acceleration

A free ebook will be available for you to download in the coming days on this page

 

 

Just in time for IDUG, Paul Zikopoulos and his team of co-authors have created a new ebook for you to deepen your skills in regards to the latest release.  Here are some details about the flashbook:

Title:

DB2 10.5 with BLU Acceleration – New Dynamic In-Memory Analytics for the Era of Big Data

Authors:

Paul Zikopoulos, Matthew Huras, George Baklarz, Sam Lightstone, Aamer Sachedina

Technical editor: Roman B. Melnyk

Coverage includes:

  • Speed of Thought Analytics with new BLU Acceleration

  • Always Available Transactions with enhanced pureScale reliability

  • Unprecedented Affordability with optimization for SAP workloads

  • Future Proof Versatility with business grade NoSQL and mobile database for greater application flexibility

About the book:

If big data is an untapped natural resource, how can you find the gold dust hidden within?  Leaders realize that big data means all data, and are moving quickly to understand both structured and unstructured application data.  However, analyzing this data without impacting the performance and reliability of essential business applications can prove costly and complex.

In the new era of big data, businesses require data systems that can blend always available transactions with speed of thought analytics.  DB2 10.5 with new BLU Acceleration provides this speed, simplicity and cost efficiency while providing the ability to build next-generation applications with NoSQL features.

With this book, you’ll learn about the power and flexibility of multi-workload, multi-platform database software.  Use the comprehensive knowledge from this book to get started with the latest DB2 release by downloading the trial version.  Visit ibm.com/developerworks/downloads/im/db2/

Probability, The Analysis of Data

Probability, The Analysis of Data – Volume 1

is a free book available online, it provides educational material in the area of data analysis.

http://www.theanalysisofdata.com/probability/0_2.html

  • The project features comprehensive coverage of all relevant disciplines including probability, statistics, computing, and machine learning.
  • The content is almost self-contained and includes mathematical prerequisites and basic computing concepts.
  • The R programming language is used to demonstrate the contents. Full code is available, facilitating reproducibility of experiments and letting readers experiment with variations of the code.
  • The presentation is mathematically rigorous, and includes derivations and proofs in most cases.
  • HTML versions are freely available on the website http://theanalysisofdata.com. Hardcopies are available at affordable prices.

PostgreSQL Magazine Issue #01

You can have access PostgreSQL Magazine Issue #01 in various ways :

Harvard released metadata for 12Millions library book into the public domain

Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. The metadata, in the standard MARC21 format, is available for bulk download from Harvard. The University also provided the data to the Digital Public Library of America’s prototype platform for programmatic access via an API.

Official press release

Harvard’s Open Metadata policy

API details

Good reading to learn DB Architecture

Berkeley DB was the original luxury embedded database widely used by applications as their core database engine. NoSQL before NoSQL was cool.

There’s a great writeup for the architecture behind Berkeley DB in the book The Architecture of Open Source Applications. If you want to understand more about how a database works or if you are pondering how to build your own, it’s rich in detail, explanations, and lessons. Here’s the Berkeley DB chapter from the book. It covers topics like: Architectural Overview; The Access Methods: Btree, Hash, Recno, Queue; The Library Interface Layer; The Buffer Manager: Mpool; Write-ahead Logging; The Lock Manager: Lock; The Log Manager: Log; The Transaction Manager: Txn.

 

The little Redis book

The Little Redis Book is a free book introducing Redis, it has been written by Karl Seguin, with Perry Neal‘s assistance.

Redis is wonderfully simple, which makes it awesome to use, but I thought it would turn any book into little more than reference material. Well, I decided to give it a try and hopefully you’ll agree with me that The Little Redis Book is a solid addition to the Little family.

You can download the PDF version here. It comes in at 29-pages. I hope this helps people who are new to Redis. I also hope there’s maybe one or two useful things in here for developers already familiar with it.

The author,Karl Seguin, also wrote The Little MongoDB Book.

Links:

Big Data Now: Current Perspectives from O'Reilly Radar

This collection represents the full spectrum of data-related content we’ve published on O’Reilly Radar over the last year. Mike Loukides kicked things off in June 2010 with “What is data science?” and from there we’ve pursued the various threads and themes that naturally emerged. Now, roughly a year later, we can look back over all we’ve covered and identify a number of core data areas:

Data issues — The opportunities and ambiguities of the data space are evident in discussions around privacy, the implications of data-centric industries, and the debate about the phrase “data science” itself.

The application of data: products and processes – A “data product” can emerge from virtually any domain, including everything from data startups to established enterprises to media/journalism to education and research.

Data science and data tools — The tools and technologies that drive data science are of course essential to this space, but the varied techniques being applied are also key to understanding the big data arena.

The business of data – Take a closer look at the actions connected to data — the finding, organizing, and analyzing that provide organizations of all sizes with the information they need to compete.

 

Pages:137

ISBN:978-1-4493-1518-4

ISBN 10:1-4493-1518-6

http://shop.oreilly.com/product/0636920022640.do

HBase: The Definitive Guide 1st edition

“HBase: The Definitive Guide” by Lars George

will be soon available, you can order here: http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100/

Product Description

If your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can fulfill your needs. As the open source implementation of Google’s BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. HBase: The Definitive Guide provides the details you require, whether you simply want to evaluate this high-performance, non-relational database, or put it into practice right away.

HBase’s adoption rate is beginning to climb, and several IT executives are asking pointed questions about this high-capacity database. This is the only book available to give you meaningful answers.

  • Learn how to distribute large datasets across an inexpensive cluster of commodity servers
  • Develop HBase clients in many programming languages, including Java, Python, and Ruby
  • Get details on HBase’s primary storage system, HDFS—Hadoop’s distributed and replicated filesystem
  • Learn how HBase’s native interface to Hadoop’s MapReduce framework enables easy development and execution of batch jobs that can scan entire tables
  • Discover the integration between HBase and other facets of the Apache Hadoop project

About the Author

Lars George has been involved with HBase since 2007, and became a full HBase committer in 2009. He has spoken at various Hadoop User Group meetings, as well as large conferences such as FOSDEM in Brussels. He also started the Munich OpenHUG meetings. He now works closely with Cloudera to support Hadoop and HBase in and around Europe through technical support, consulting work, and training.

 

 

 

A few readings – O'Reilly definitive guide

Some O’Reilly definitive guide available on amazon.com: