Understanding Cloudera by using its VirtualBox Demo

Cloudera is the Apache packaging solution to deploy the integrated solution including Hadoop, Sqoop, Pig, Hive, HBase, ZooKeeper, Oozie, Hume, Flume, and Whirr, The Cloudera VirtualBox Demo will bring you all this platform configured and ready to experiment with in less than 5 minutes.

Making it easy for users to experiment with these tools increases the chances for adoption.

CDH Mac OS X VirtualBox VM



Running MongoDB on the Cloud

10gen, creators of MongoDB, publish a video about an Running MongoDB on the Cloud(EC2 and EBS).


In this video Jared Rosoff covers topics like scaling and performance characteristics of running MongoDB in the cloud and he also shares some best practices when using Amazon EC2.


Watch from 10gen’s website here


Couchbase Server 2.0 Tour and Demo session from CouchConf

During the CouchConf San Francisco, Couchbase Server 2.0 has been  announced (which integrates Apache CouchDB, Membase and Memcached into a single, powerful NoSQL database solution).

If you missed the demo at CouchConf (or if you were there and just want to see it again), here is the video of the presentation :


Running time: 42:26

NoSql tapes top 3

@nosqltapes served over 4,000 hours of video, including:
Top 3 most-shared:
Top 3 most often finished:
  1. Cloudant nosqltap.es/5
  2. HBase nosqltap.es/24
  3. Neo4J nosqltap.es/38
Top 3 most played:
  1. Graphs nosqltap.es/17
  2. MapReduce nosqltap.es/8
  3. Dynamo nosqltap.es/15

MongoDB schema design basics

Because data design is fundamental and because NoSQL database break the Normal form database normalization.You need to review and explore the recommended ways to model your data with your NoSQL solution.

10gen’s Richard Kreuter offer a nice presentation on MongoDB schema design: “Schema Design Basics: White Board Session”:



NoSQL tapes #23

NoSQL tapes #23 is out;

Jonathan Ellis on cassandra, datastaxapache cassandra project chair Jonathan Ellis talks about the dynamo–bigtable hybrid. from its origins at facebook to the creation of support company datastax (known as riptano at the time). he also details the design tradeoffs, internals as well as his take on high-profile cassandra deployment stories.


MongoDB adopted by Craigslist

According to this MongoDB blog post:  http://blog.mongodb.org/post/5545198613/mongodb-live-at-craigslist

MongoDB is now live at Craigslist, where it is being used to archive billions of records.
The NoSQL data store is now being used to archive billions of records at Craigslist, the popular classifieds and job posting community that serves 570 cities in 50 countries.

Every post in the history of the site was previously held in a large MySQL cluster. Since Craigslist had a variety of database needs moving forward, ranging from wanting to add new machines without downtime to routing around dead machines without clients failing, the development team decided to initiate a major migration to a NoSQL solution. Mongo DB was the solution they chose.

Here are some basic numbers about the Craigslist MongoDB cluster from Jeremy Zawodny, one of the site’s software engineers:

We’re sizing the install for around 5 billion documents. That’s from the initial 2 billion document import we need to do plus room to grow for a few years to come. Average document size is right around 2KB. (Five billion 2KB documents is 10TB of data.) We’re getting our feet wet with MongoDB so this particular task isn’t high throughput or growing in unpredictable ways.

We can put data into MongoDB faster than we can get it out of MySQL during the migration.

Zawodny explains the evolution of data storage at Craigslist and how MongoDB will fit into the future of the site’s infrastructure and explainwhy  Craigslist chose MongoDB over other data stores in this video.


The Secrets of Building Realtime Big Data Systems

Nathan Marz, lead engineer at BackType, has posted slides from a presentation that discusses the “secrets” to building real-time data systems.

Couple of interesting tidbits from the presentation:

  • Essentials of a data system
    • Robust to machine failure and error
    • Low latency reads and updates
    • Scalable
    • General
    • Extensible
    • Allows ad-hoc analysis
    • Minimal maintenance
    • Debuggable
  • Batch layer is used for a majority of historical data
  • Speed layer is used for data that has not quite made it to the batch layer
  • Speed layer is transient data that eventually is overridden by the batch layer



NoSQL Tapes – recent history and current state of nosql

Recent history and current state of nosql: video discussion with @fastip founderNoSQL Tapes Vol. 9: Benjamin Black on NoSQL,Cloud Computing & Fast IP
Benjamin Black shares his thoughts on NoSQL,Cloud Computing and sheds some light on his new company: FASTIP

The video is available on nosqltapes website:

NoSQL Tapes Vol2

NoSQL Tapes vol. 2: D. MERRIMAN & E. HOROWITZ on the origins of MongoDB

10GEN FOUNDERS DWIGHT MERRIMAN & ELIOT HOROWITZ open up about the vision behind mongoDB ; and share their thoughts on NoSQL, as a movement & industry.
Shot on september 13,2010 in NY City.




More NoSQL’s tape video available here