Google's F1 – distributed scalable RDBMS

 

Google has moved its advertising services from MySQL to a new database, created in-house, called F1. The new system combines the best of NoSQL and SQL approaches.

 

According to Google Research, many of the services that are critical to Google’s ad business have historically been backed by MySQL, but Google has recently migrated several of these services to F1, a new RDBMS developed at Google. The team at Google Research says that F1 gives the benefits of NoSQL systems (scalability, fault tolerance, transparent sharding, and cost benefits) with the ease of use and transactional support of an RDBMS.

 

Google Research has developed F1 to provide relational database features such as a parallel SQL query engine and transactions on a highly distributed storage system that scales on standard hardware.

 

 

 

 

The store is dynamically sharded, supports replication across data centers while keeping transactions consistent, and can deal with data center outages without losing data. The downside of keeping the transactions consistent means F1 has higher write latencies compared to MySQL, so the team restructured the database schemas and redeveloped the applications so the effect of the increased latency is mainly hidden from external users. Because F1 is distributed, Google says it scales easily and can support much higher throughput for batch workloads than a traditional database.

 

The database is sharded by customer, and the applications are optimized using shard awareness. When more power is needed, the database can grow by adding shards. The use of shards in this way has some drawbacks, including the difficulties of rebalancing shards, and the fact you can’t carry out cross-shard transactions or joins.

 

 

 

 

 

F1 has been co-developed with a new lower-level storage system called Spanner. This is described as a descendant of Google’s Bigtable, and as the successor to Megastore. Megastore is the transactional indexed record manager built by Google on top of its BigTable NoSQL datastore. Spanner offers synchronous cross-datacenter replication (with Paxos, the algorithm for fault tolerant distributed systems). It provides snapshot reads, and does multiple reads followed by a single atomic write to ensure transaction consistency.

 

F1 is based on sharded Spanner servers, and can deal with parallel reads with SQL or Map-Reduce. Google has deployed it using five replicas spread across the country to survive regional disasters. Reads are much slower than MySQL, taking between 5 and 10ms.

 

The SQL parallel query engine was developed from scratch to hide the remote procedure call (RPC) latency and to allow parallel and batch execution. The latency is dealt with by using a single read phase and banning serial reads, though you can carry out asynchronous reads in parallel. Writes are buffered at the client, and sent as one RPC. Object relational mapping calls are also handled carefully to avoid those that are problematic to F1.

 

The research paper on F1, presented at SIGMOD 2012, cites serial reads and for loops that carry out one query per iteration as particular avoidance points, saying that while these hurt performance in all databases, they are disastrous on F1. In view of this, the client library is described as very lightweight ORM – it doesn’t really have the “R”. It never uses relational or traversal joins, and all objects are loaded explicitly.

 

 

 

Javascript Object Signing and Encryption (jose)

New versions of the JSON Object Signing and Encryption (JOSE) specifications are now available that incorporate working group feedback since publication of the initial versions. They are:

  • JSON Web Signature (JWS) – Digital signature/HMAC specification
  • JSON Web Encryption (JWE) – Encryption specification
  • JSON Web Key (JWK) – Public key specification
  • JSON Web Algorithms (JWA) – Algorithms and identifiers specification

The most important changes are:

  • Added a separate integrity check for encryption algorithms without an integral integrity check.
  • Defined header parameters for including JWK public keys and X.509 certificate chains directly in the header.

See the Document History section in each specification for a more detailed list of changes.

Corresponding versions of the JSON Serialization specs, which use these JOSE drafts, are also available. Besides using JSON Serializations of the cryptographic results (rather than Compact Serializations using a series of base64url encoded values), these specifications also enable multiple digital signatures and/or HMACs to applied to the same message and enable the same plaintext to be encrypted to multiple recipients. They are:

  • JSON Web Signature JSON Serialization (JWS-JS)
  • JSON Web Encryption JSON Serialization (JWE-JS)

Draft 08 of the JSON Web Token (JWT) specification has been published. It uses the -01 versions of the JOSE specifications and also contains these changes:

  • Removed language that required that a JWT must have three parts. Now the number of parts is explicitly dependent upon the representation of the underlying JWS or JWE.
  • Moved the “alg”:“none” definition to the JWS spec.
  • Registered the application/jwt MIME Media Type.
  • Clarified that the order of the creation and validation steps is not significant in cases where there are no dependencies between the inputs and outputs of the steps.
  • Corrected the Magic Signatures and Simple Web Token (SWT) references.

These specifications are available at:

HTML formatted versions are available at:

DeNSo DB is out

DensoDB is a new NoSQL document database. Written for .Net environment in c# language.
It’s simple, fast and reliable. More details on github https://github.com/teamdev/DensoDB

You can use it in three different ways:

  1. InProcess: No need of service installation and communication protocol. The fastest way to use it. You have direct access to the DataBase memory and you can manipulate objects and data in a very fast way.
  2. As a Service: Installed as Windows Service, it can be used as a network document store.You can use rest service or wcf service to access it. It’s not different from the previuos way to use it but you have a networking protocol and so it’s not fast as the previous one.
  3. On a Mesh: mixing the previous two usage mode with a P2P mesh network, it can be easily syncronizable with other mesh nodes. It gives you the power of a distributed scalable fast database, in a server or server-less environment.

You can use it as a database for a stand alone application or in a mesh to share data in a social application. The P2P protocol for your application and syncronization rules will be transparet for you, and you’ll be able to develop all yor application as it’s stand-alone and connected only to a local DB.

Features

  1. Journaled
  2. Built using Command Query Responsibility Segregation pattern in mind.
  3. Store data as Bson-like Documents.
  4. Accessible via Rest
  5. Accessible via WCF
  6. Peer to peer syncronization and event propagation enabled.
  7. Pluggable via Server Plugin.
  8. Use Linq syntax for queries.

The Theory

A document database is a Database where data are stored as they are. Usually there is no need to normalize or change their structure, so it’s optimal for domain driven design, because you’ll not need to think about how to persist your domain.

A document is stored in BSon-like format A document is self contained, and all you needed to understand it’s information are in the document itself. So you will not need additional query to understand foreign key or lookup table datas.

You can have collections of document. Collection are not tables, you can think to collections as sets of documents, categorized in the same way. A collection is schema-free and you can store different kind of documents in the same collection.

Every document in every collection is classified using a Unique Identifier. The DB will use the UID to store and to access every document directly.

You can store your data in DeNSo using a Bson document or, more easly you can store your entities in DeNSo.

DeNSo has a Bson De/Serializer able to traduce virtually every entity in a BSon Document and vice-versa.

 

 

https://github.com/teamdev/DensoDB

MongoDb Architecture explained

Great article on MongoDB Architecture from http://horicky.blogspot.fr/2012/04/mongodb-architecture.html

MongoDb Architecture

NOSQL has become a very heated topic for large web-scale deployment where scalability and semi-structured data driven the DB requirement towards NOSQL. There has been many NOSQL products evolving in over last couple years. In my past blogs, I have been covering the underlying distributed system theory of NOSQL, as well as some specific products such as CouchDB and Cassandra/HBase.

Last Friday I was very lucky to meet with Jared Rosoff from 10gen in a technical conference and have a discussion about the technical architecture of MongoDb. I found the information is very useful and want to share with more people.

One thing I am very impressed by MongoDb is that it is extremely easy to use and the underlying architecture is also very easy to understand.

Here are some simple admin steps to start/stop MongoDb server

# Install MongoDB
mkdir /data/lib

# Start Mongod server
.../bin/mongod # data stored in /data/db

# Start the command shell
.../bin/mongo
> show dbs
> show collections

# Remove collection
> db.person.drop()

# Stop the Mongod server from shell
> use admin
> db.shutdownServer()

Major difference from RDBMS
MongoDb differs from RDBMS in the following way

  • Unlike an RDBMS record which is “flat” (a fixed number of simple data type), the basic unit of MongoDb is “document” which is “nested” and can contain multi-value fields (arrays, hash).
  • Unlike RDBMS where all records stored in a table must be confined to the table schema, documents of any structure can be stored in the same collection.
  • There is no “join” operation in the query. Overall, data is encouraged to be organized in a more denormalized manner and the more burden of ensuring data consistency is pushed to the application developers
  • There is no concept of “transaction” in MongoDb. “Atomicity” is guaranteed only at the document level (no partial update of a document will occurred).
  • There is no concept of “isolation”, any data read by one client may have its value modified by another concurrent client.

By removing some of those features that a classical RDBMS will provide, MongoDb can be more light-weight and be more scalable in processing big data.

Query processing
MongoDb belongs to the type of document-oriented DB. In this model, data is organized as JSON document, and store into a collection. Collection can be thought for equivalent to Table and Document is equivalent to records in RDBMS world.

Here are some basic example.

# create a doc and save into a collection
> p = {firstname:"Dave", lastname:"Ho"}
> db.person.save(p)
> db.person.insert({firstname:"Ricky", lastname:"Ho"})

# Show all docs within a collection
> db.person.find()

# Iterate result using cursor
> var c = db.person.find()
> p1 = c.next()
> p2 = c.next()

To specify the search criteria, an example document containing the fields that needs to match against need to be provided.

> p3 = db.person.findone({lastname:"Ho"})

Notice that in the query, the value portion need to be determined before the query is made (in other words, it cannot be based on other attributes of the document). For example, lets say if we have a collection of “Person”, it is not possible to express a query that return person whose weight is larger than 10 times of their height.

# Return a subset of fields (ie: projection)
> db.person.find({lastname:"Ho"}, {firstname:true})

# Delete some records
> db.person.remove({firstname:"Ricky"})

To speed up the query, index can be used. In MongoDb, index is stored as a BTree structure (so range query is automatically supported). Since the document itself is a tree, the index can be specified as a path and drill into deep nesting level inside the document.

# To build an index for a collection
> db.person.ensureIndex({firstname:1})

# To show all existing indexes
> db.person.getIndexes()

# To remove an index
> db.person.dropIndex({firstname:1})

# Index can be build on a path of the doc.
> db.person.ensureIndex({"address.city":1})

# A composite key can be used to build index
> db.person.ensureIndex({lastname:1, firstname:1})
Index can also be build on an multi-valued attribute such as an array. In this case, each element in the array will have a separate node in the BTree.

Building an index can be done in both offline foreground mode or online background mode. Foreground mode will proceed much faster but the DB cannot be access during the build index period. If the system is running in a replica set (describe below), it is recommended to rotate each member DB offline and build the index in foreground.

When there are multiple selection criteria in a query, MongoDb attempts to use one single best index to select a candidate set and then sequentially iterate through them to evaluate other criteria.

When there are multiple indexes available for a collection. When handling a query the first time, MongoDb will create multiple execution plans (one for each available index) and let them take turns (within certain number of ticks) to execute until the fastest plan finishes. The result of the fastest executor will be returned and the system remembers the corresponding index used by the fastest executor. Subsequent query will use the remembered index until certain number of updates has happened in the collection, then the system repeats the process to figure out what is the best index at that time.

Since only one index will be used, it is important to look at the search or sorting criteria of the query and build additional composite index to match the query better. Maintaining an index is not without cost as index need to be updated when docs are created, deleted and updated, which incurs overhead to the update operations. To maintain an optimal balance, we need to periodically measure the effectiveness of having an index (e.g. the read/write ratio) and delete less efficient indexes.

Storage Model
Written in C++, MongoDB uses a memory map file that directly map an on-disk data file to in-memory byte array where data access logic is implemented using pointer arithmetic. Each document collection is stored in one namespace file (which contains metadata information) as well as multiple extent data files (with an exponentially/doubling increasing size).

 

 

 

 

 

The data structure uses doubly-linked-list extensively. Each collection of data is organized in a linked list of extents each of which represents a contiguous disk space. Each extent points to a head/tail of another linked list of docs. Each doc contains a linked list to other documents as well as the actual data encoded in BSON format.

Data modification happens in place. In case the modification increases the size of record beyond its originally allocated space, the whole record will be moved to a bigger region with some extra padding bytes. The padding bytes is used as an growth buffer so that future expansion doesn’t necessary require moving the data again. The amount of padding is dynamically adjusted per collection based on its modification statistics. On the other hand, the space occupied by the original doc will be free up. This is kept tracked by a list of free list of different size.

As we can imagine holes will be created over time as objects are created, deleted or modified, this fragmentation will hurt performance as less data is being read/write per disk I/O. Therefore, we need to run the “compact” command periodically, which copy the data to a contiguous space. This “compact” operation however is an exclusive operation and has to be done offline. Typically this is done in a replica setting by rotating each member offline one at a time to perform the compaction.

Index are implemented as BTree. Each BTree node contains a number of keys (within this node), as well as pointers to left children BTree nodes of each key.

Data update and Transaction
To update an existing doc, we can do the following

var p1 = db.person.findone({lastname:"Ho"})
p1["address"] = "San Jose"
db.person.save(p1)

# Do the same in one command
db.person.update({lastname:"Ho"},
 {$set:{address:"San Jose"}},
 false,
 true)

Write by default doesn’t wait. There are various wait options that the client can specified what conditions to wait before the call returns (this can also achievable by a followup “getlasterror” call), such as where the changes is persisted on disk, or changes has been propagated to sufficient members in the replica set. MongoDb also provides a sophisticated way to assign tags to members of replica set to reflect their physical topology so that customized write policy for each collection can be made based on their reliability requirement.

In RDBMS, “Serializability” is a very fundamental concept about the net effect of concurrently executing work units is equivalent to as if these work units are arrange in some order of sequential execution (one at a time). Therefore, each client can treat as if the DB is exclusively available. The underlying implementation of DB server many use LOCKs or Multi-version to provide the isolation. However, this concept is not available in MongoDb (and many other NOSQL as well)

In MongoDb, every data you read should be treated as a snapshot of the past, which means by the time you look at it, it may have been changed in the DB. Therefore, if you are making a modification based on some previously read data, by the time you send the modification request, the condition where your modification is based on may have changed. If this is not acceptable for your application’s consistency requirement, you may need to re-validate the condition at the time you request the modification (ie: a “conditional_modify” should be made).

Under this scheme, a “condition” is attached along with the modification request so that the DB server can validate the condition before applying the modification. (of course, the condition checking and modification must be atomic so no update can happen in between). In MongoDb, this can be achieved by the “findAndModify” call.

var account = db.bank.findone({id:1234})
var old_bal = account['balance']
var new_bal = old_bal + fund

# Pre-condition is specified in search criteria
db.bank.findAndModify({id:1234, balance:old_bal},
 {$set: {balance: new_bal}})

# Check if the prev command successfully
var success =
 db.runCommand({getlasterror:1,j:true})

if (!success) {
 #retry_from_beginning
}

The concept of “transaction” is also missing in MongoDb. While MongoDb guarantee each document will be atomically modified (so no partial update will happen within a doc), but if the update modifies multiple documents, then there are no guarantee on the atomicity across documents.

Therefore, it is the application developers responsibility to implement the multi-update atomicity across multiple documents. We describe a common design pattern to achieve that. This technique is not specific to MongoDb and applicable to other NOSQL store, which can at least guarantee atomicity at the single record level.

The basic idea is to first create a separate document (called transaction) that links together all the documents that you want to modify. And then create a reverse link from each document (to be modified) back to the transaction. By carefully design the sequence of update in the documents and the transaction, we can achieve the atomicity of modifying multiple documents.

 

 

MongoDb’s web site has also described a similar technique here (based on the same concept but the implementation is slightly different).

Evernote bucking the NoSQL trend hanging to MySQL

In a recent blog post, Evernote bear on its SQL technical choice, MySQL actually, and details its main arguments:

Atomicity: If an API call succeeds, then 100% of the changes are completed, and if an API call fails, then none of them are committed. This means that if we fail trying to store the fourth image in your Note, there isn’t a half-formed Note in your account and incorrect monthly upload allowance calculations to charge you for the broken upload.

Consistency: At the end of any API call, the account is in a fully usable and internally consistent state. Every Note has a Notebook and none of them are “dangling.” The database won’t let us delete a Notebook that still has Notes within it, thanks to the FOREIGN KEY constraint.

Durability:  When the server says that a Notebook was created, the client can assume that it actually exists for future operations (like the createNote call). The change is durable so that the client knows that it has a consistent reflection of the state of the service at all times.

 

Read more:

http://blog.evernote.com/tech/2012/02/23/whysql/

Good reading to learn DB Architecture

Berkeley DB was the original luxury embedded database widely used by applications as their core database engine. NoSQL before NoSQL was cool.

There’s a great writeup for the architecture behind Berkeley DB in the book The Architecture of Open Source Applications. If you want to understand more about how a database works or if you are pondering how to build your own, it’s rich in detail, explanations, and lessons. Here’s the Berkeley DB chapter from the book. It covers topics like: Architectural Overview; The Access Methods: Btree, Hash, Recno, Queue; The Library Interface Layer; The Buffer Manager: Mpool; Write-ahead Logging; The Lock Manager: Lock; The Log Manager: Log; The Transaction Manager: Txn.

 

SSTable and Log Structured Storage: LevelDB

SSTable and Log Structured Storage: LevelDB

By  on February 06, 2012 originally posted here

If Protocol Buffers is the lingua franca of individual data record at Google, then the Sorted String Table (SSTable) is one of the most popular outputs for storing, processing, and exchanging datasets. As the name itself implies, an SSTable is a simple abstraction to efficiently store large numbers of key-value pairs while optimizing for high throughput, sequential read/write workloads.

Unfortunately, the SSTable name itself has also been overloaded by the industry to refer to services that go well beyond just the sorted table, which has only added unnecessary confusion to what is a very simple and a useful data structure on its own. Let’s take a closer look under the hood of an SSTable and how LevelDB makes use of it.

SSTable: Sorted String Table

Imagine we need to process a large workload where the input is in Gigabytes or Terabytes in size. Additionally, we need to run multiple steps on it, which must be performed by different binaries – in other words, imagine we are running a sequence of Map-Reduce jobs! Due to size of input, reading and writing data can dominate the running time. Hence, random reads and writes are not an option, instead we will want to stream the data in and once we’re done, flush it back to disk as a streaming operation. This way, we can amortize the disk I/O costs. Nothing revolutionary, moving right along.

 

A “Sorted String Table” then is exactly what it sounds like, it is a file which contains a set of arbitrary, sorted key-value pairs inside. Duplicate keys are fine, there is no need for “padding” for keys or values, and keys and values are arbitrary blobs. Read in the entire file sequentially and you have a sorted index. Optionally, if the file is very large, we can also prepend, or create a standalone key:offset index for fast access. That’s all an SSTable is: very simple, but also a very useful way to exchange large, sorted data segments.

SSTable and BigTable: Fast random access?

Once an SSTable is on disk it is effectively immutable because an insert or delete would require a large I/O rewrite of the file. Having said that, for static indexes it is a great solution: read in the index, and you are always one disk seek away, or simply memmap the entire file to memory. Random reads are fast and easy.

Random writes are much harder and expensive, that is, unless the entire table is in memory, in which case we’re back to simple pointer manipulation. Turns out, this is the very problem that Google’s BigTable set out to solve: fast read/write access for petabyte datasets in size, backed by SSTables underneath. How did they do it?

SSTables and Log Structured Merge Trees

We want to preserve the fast read access which SSTables give us, but we also want to support fast random writes. Turns out, we already have all the necessary pieces: random writes are fast when the SSTable is in memory (let’s call it MemTable), and if the table is immutable then an on-disk SSTable is also fast to read from. Now let’s introduce the following conventions:

 

  1. On-disk SSTable indexes are always loaded into memory
  2. All writes go directly to the MemTable index
  3. Reads check the MemTable first and then the SSTable indexes
  4. Periodically, the MemTable is flushed to disk as an SSTable
  5. Periodically, on-disk SSTables are “collapsed together”

What have we done here? Writes are always done in memory and hence are always fast. Once the MemTablereaches a certain size, it is flushed to disk as an immutable SSTable. However, we will maintain all the SSTable indexes in memory, which means that for any read we can check the MemTable first, and then walk the sequence of SSTable indexes to find our data. Turns out, we have just reinvented the “The Log-Structured Merge-Tree” (LSM Tree), described by Patrick O’Neil, and this is also the very mechanism behind “BigTable Tablets“.

LSM & SSTables: Updates, Deletes and Maintenance

This “LSM” architecture provides a number of interesting behaviors: writes are always fast regardless of the size of dataset (append-only), and random reads are either served from memory or require a quick disk seek. However, what about updates and deletes?

Once the SSTable is on disk, it is immutable, hence updates and deletes can’t touch the data. Instead, a more recent value is simply stored in MemTable in case of update, and a “tombstone” record is appended for deletes. Because we check the indexes in sequence, future reads will find the updated or the tombstone record without ever reaching the older values! Finally, having hundreds of on-disk SSTables is also not a great idea, henceperiodically we will run a process to merge the on-disk SSTables, at which time the update and delete records will overwrite and remove the older data.

SSTables and LevelDB

Take an SSTable, add a MemTable and apply a set of processing conventions and what you get is a nice database engine for certain type of workloads. In fact, Google’s BigTable, Hadoop’s HBase, and Cassandra amongst others are all using a variant or a direct copy of this very architecture.

Simple on the surface, but as usual, implementation details matter a great deal. Thankfully, Jeff Dean and Sanjay Ghemawat, the original contributors to the SSTable and BigTable infrastructure at Google released LevelDB earlier last year, which is more or less an exact replica of the architecture we’ve described above:

  • SSTable under the hood, MemTable for writes
  • Keys and values are arbitrary byte arrays
  • Support for Put, Get, Delete operations
  • Forward and backward iteration over data
  • Built-in Snappy compression

Designed to be the engine for IndexDB in WebKit (aka, embedded in your browser), it is easy to embedfast, and best of all, takes care of all the SSTable and MemTable flushing, merging and other gnarly details.

Working with LevelDB: Ruby

LevelDB is a library, not a standalone server or service – although you could easily implement one on top. To get started, grab your favorite language bindings (ruby), and let’s see what we can do:

require ‘leveldb’ # gem install leveldb-ruby db = LevelDB::DB.new “/tmp/db” db.put “b”, “bar” db.put “a”, “foo” db.put “c”, “baz” puts db.get “a” # => foo db.each do |k,v| p [k,v] # => [“a”, “foo”], [“b”, “bar”], [“c”, “baz”] end db.to_a # => [[“a”, “foo”], [“b”, “bar”], [“c”, “baz”]]

We can store keys, retrieve them, and perform a range scan all with a few lines of code. The mechanics of maintaining the MemTables, merging the SSTables, and the rest is taken care for us by LevelDB – nice and simple.

LevelDB in WebKit and Beyond

SSTable is a very simple and useful data structure – a great bulk input/output format. However, what makes the SSTable fast (sorted and immutable) is also what exposes its very limitations. To address this, we’ve introduced the idea of a MemTable, and a set of “log structured” processing conventions for managing the many SSTables.

All simple rules, but as always, implementation details matter, which is why LevelDB is such a nice addition to the open-source database engine stack. Chances are, you will soon find LevelDB embedded in your browser, on your phone, and in many other places. Check out the LevelDB source, scan the docs, and take it for a spin.

 

The Time For NoSQL Is Now

The Time For NoSQL Is Now is the conclusion made in a recent article by Andrew C. Oliver, practicing Oracle since 1998 and which thought Oracle was magical.

 

From its long experience he also share the following conclusion:

  • NoSQL solution requires less transformation which brings
    •  better the performance of the system
    • more automatic scalability
    • fewer bugs
  • In the end, applications almost always bottleneck on the DB
  • I truly think this is one technology that isn’t just marketing

 

You can read the full article available here:

http://osintegrators.com/node/76

 

Submitted by acoliver on Wed, 01/25/2012 – 11:08

In 1998 I had my first hardcore introduction to Oracle as well as HP/UX. I’d been working with SQL Server. I thought Oracle was magical. Unlike SQL Server there were so many knobs to turn! I could balance the crap out of the table spaces and rollback segments and temp segments. Nothing would ever have to be contending for IO on the same disk again (if only someone would buy enough disks)! HP/UX oddly led me to Linux. I had a graphical workstation and a Windows machine on my desk. The graphical workstation rarely remained in working order. A co-worker who had quit Red Hat, (before the IPO!) to join me at Ericsson USA, introduced me to Red Hat Linux 5.0. The great thing about it was that the X-Windows implementation allowed me to connect to the HP/UX box’s X-Windows with no problems. Eventually, I came to prefer Unix and Linux to Windows because they never seemed to crash and on relatively old hardware they performed better.

At some point, due to turnover, I became the final person around who knew how to care and feed Oracle. I knew what all the mystical ORA- numbers meant. Oracle was far better/faster for reporting than SQL Server. However, in normal business applications, not only was SQL Server cheaper but it didn’t need constant attention, care and feeding. Over the next 10 years, I watched the industry homogenize. Sure, SQL Server is around and far better than the one I used, but every large company I’ve worked with has an Oracle site license. Even the “Microsoft Shop” tends to have Oracle around for the big stuff.

I’ve made hundreds of thousands if not a million dollars tuning Oracle, caring for Oracle and scaling Oracle. I’ve written ETL processes, I’ve tuned queries, I’ve done PL/SQL procedures that call into other systems and even into caches to squeak a little more performance. I’ve scaled systems with millions of users on Oracle. The trouble with Oracle is that it is both frequently misused and that it really isn’t very scalable. I’m not saying to can’t make it scale, just saying it is expensive to do (the license costs are the tip of the iceberg) and that it is difficult work to do so.

The problem is not transactions so much as the relational model combined with the process model. You can work around these with flatter table structures and by using different instances for different data (think of the lines for voting A-N, O-R, S-Z), but that is work! Oracle RAC sorta makes things better on the read side, but with all new headaches and often with worse write performance. You can add a cache like Gemfire or Infinispanand indeed these are good solutions for the interim where you can’t fix the Oracle. However, fundamentally you’re trying to fix a broken foundation with struts and support beams. You can shift the weight, but really you’re not addressing the actual problem.

NoSQL databases, especially document databases like MongoDB or Couch, structure the data the way your system uses it. They shard the data automatically, they store your actual JSON structure without all of the transformation like Hibernate or JPA. The less transformation the better the performance of the system, the more automatic the scalability concerns, the fewer bugs! I don’t usually jump on bandwagons: the new language of the week (all compile to the same thing, none have significant productivity gains when you have a team size of more than say 2), whooo touchpads (I want a keyboard), filesystem of the day (I kept ext3 until I switched to an SSD drive), however, this particular bandwagon truly is a game-changer and completely necessary for cloud/Internet scale applications.

Yes SQL databases are completely fine for a departmental IT system. While Open Software Integrators does a lot of varied work with applications that don’t need to handle 8 million users, I don’t personally work on those applications. Most of the systems I work on are at least dealing with thousands to hundreds of thousands and in some cases millions of users. In the end, applications almost always bottleneck on the DB and most of what I do for customers is actually DB consulting in the end. The transition to NoSQL databases will take time. We still don’t have TOAD, Crystal Reports, query language standardization and other essential tools needed for mass adoption. There will be missteps (i.e. I may need a different type of database for reporting than for my operational system), but I truly think this is one technology that isn’t just marketing.

 

The life happens in real-time

The life happens in real-time.

From breaking news to breaking servers, real-world events require petabytes of data to be collected, analyzed and acted on as those events happen. Until recently, real-time data processing was an exotic practice, relegates to narrow domains like time-series analysis for financial markets. For most of us performing analytics — and particularly web analytics — batch processing has been the dominant approach.

This This paradigm is shifting: the rise of NoSQL databases

Businesses are struggling to cope with and leverage an explosion of complex and connected data. This need is driving many companies to adopt scalable, high performance NoSQL databases – a new breed of database solutions – in order to expand and enhance their data management strategies. Traditional “relational” databases will not be able to keep pace with “big data” demands as they were not designed to manage the types of relationships that are so essential in today’s applications.

According to a recent article an Evans Data survey of 1,200 developers, 400 of which are enterprise developers, found that 56% of enterprise developers already use schemaless databases and 63% plan to use one in the next two years.

Why IT managers should look into Hadoop

Why IT managers should actively look into Hadoop ?

Apache Hadoop, as the open source version is known, is a data analysis application framework  that allows large data sets to be processed across clusters of computers. It uses massively parallel computing power to tear through the data with incredible speed, compared to traditional data analysis. Using Hadoop, businesses can analyze petabytes of unstructured data far more easily than using other applications. And that’s what makes it so invaluable to a growing number of businesses, from JP Morgan Chase, to EBay, Google, and Yahoo, all of which deal every day with unbelievably large amounts of data.

“Hard drives have been getting larger at a phenomenal rate and processors are always getting phenomenally faster,” Cutting says. “The question became, ‘What are we going to do with all of this ever-more-data that’s being generated?’ Classic enterprise technology wasn’t designed to handle all of this and doesn’t really take advantage of this. And enterprises want those capabilities because there’s lots of interesting data out there that they’ve been throwing away.”

In the past, companies couldn’t financially afford to keep all the data their customers and sales generated, so over time the data was archived or deleted. Now, though, cheaper storage and faster processing capabilities, matched with efficient analysis tools like Hadoop, allow large companies to save all of their valuable data. Over time, they can find new uses for it and drive sales, profits, and marketing.

“Retailers may not keep every transaction that’s ever happened in the past, but now [they] can afford to,” Cutting says. “Now they can go back and see things in the data about seasonal sales, sales based on location and demographics, and more. You can see what someone is buying in a city like Atlanta this year and compare it to last year’s hot-selling goods and you can make a special offer to them,” all using Hadoop. “You have the data and you can make much better predictions.”

For businesses of any size, this can be a huge boon – but especially so for bigger companies.

Cutting, who today is the chief architect at Cloudera, says Hadoop is still a young and evolving technology. It is still ripe for additional enterprise-needed features, such as versions that are custom-built and aimed at specific industry verticals, including financial services and medical informatics companies. “We’re beginning to see a little of that now, and I think we’ll see a lot more,” he adds.

Devise a Hadoop strategy for your enterprise

If huge quantities of unstructured data are hampering your company’s ability to get things done efficiently with your existing database and data analysis tools, you aren’t alone. Taking a hard look at Hadoop’s capabilities could go a long way toward helping you resolve your corporate data roadblocks. Hadoop doesn’t replace your existing databases, but adds powerful resources to your data handling toolbox.