Hue 2.4 unleashed the power of Hadoop, in this version you can now search across Hadoop data just like you would do keyword searches with Google or Yahoo! In addition, a wizard lets you tweak the result snippets and tailors the search experience to your needs.
The new Hue Search app uses the regular Solr API underneath the hood, yet adds a remarkable list of UI features that makes using search over data stored in Hadoop a breeze. It integrates with the other Hue apps like File Browser for looking at the index file in a few clicks.
Here’s a video demoing queries and results customization. The demo is based on Twitter Streaming data collected with Apache Flume and indexed in real time:
Perhaps you’ve heard about the next generation of databases roughly classified as NoSQL databases? These databases are generally much better than RDBMS at scaling, performance, and ease-of-development (e.g. in NoSQL the object-relational impedance mismatch usually disappears). Unfortunately, many talks on NoSQL are very academic and general. Not this one. This session will introduce the ideas around the so-called NoSQL movement, and we’ll learn how to leverage MongoDB (a popular open source NoSQL db) to build .NET applications using LINQ as the data access language. We’ll build out a .NET application using LINQ and MongoDB in a series of interactive demos using Visual Studio 2012 and C#.
Drake is a text-based command line data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs. It automatically resolves dependencies and provides a rich set of options for controlling the workflow. It supports multiple inputs and outputs and has HDFS support built-in.
We use Drake at Factual on various internal projects. It serves as a primary way to define, run, and manage data workflow. Some core benefits we’ve seen:
Non-programmers can run Drake and fully manage a workflow
Encourages repeatability of the overall data building process
Encourages consistent organization (e.g., where supporting scripts live, and how they’re run)
Precise control over steps (for more effective testing, debugging, etc.)
Unifies different tools in a single workflow (shell commands, Ruby, Python, Clojure, pushing data to production, etc.)
The following video provides a crash course in nine key databases: Postgres, CouchDB, MarkLogic, Riak, VoltDB, MongoDB, Neo4j, HBase and Redis. All in just 45 minutes.
Miles Pomeroy, Chad Maughan, and Jonathan Geddes run through each database in five minutes each, thus the title, “9 Databases in 45 minutes.” The video proceeds in efficient and occasionally amusing fashion, where a countdown clock and a gong keep the presentations terse.
Crockford is likeably humble about the origins of JSON. Rather than claiming he inventedJSON he instead says he discovered it:
“I don’t claim to have invented it, because it already existed in nature. I just saw it, recognized the value of it, gave it a name, and a description, and showed its benefits. I don’t claim to be the only person to have discovered it.”
Crockford tried very hard to strip unnecessary stuff from JSON so it stood a better chance of being language independent. When confronted with push back about JSON not being a “standard” Crockford registered json.org, put up a specification that documented the data format, and declared it as a standard.