Kafka is a Message Queue developed by LinkedIn that persists messages to disk in a very performant manner. It provides the functionality of a messaging system, with a very unique design.
The 0.8.0 bring new features such as:
- [KAFKA-50] – kafka intra-cluster replication support
- [KAFKA-188] – Support multiple data directories
- [KAFKA-202] – Make the request processing in kafka asynchonous
- [KAFKA-203] – Improve Kafka internal metrics
- [KAFKA-235] – Add a ‘log.file.age’ configuration parameter to force rotation of log files after they’ve reached a certain age
- [KAFKA-429] – Expose JMX operation to set logger level dynamically
- [KAFKA-475] – Time based log segment rollout
- [KAFKA-545] – Add a Performance Suite for the Log subsystem
- [KAFKA-546] – Fix commit() in zk consumer for compressed messages
Downloads version 0.8.0 here: http://kafka.apache.org/downloads.html
Dex, the Index Bot
Dex is a MongoDB performance tuning tool that compares queries to the available indexes in the queried collection(s) and generates index suggestions based on simple heuristics. Currently you must provide a connection URI for your database.
Dex uses the URI you provide as a helpful way to determine when an index is recommended. Dex does not take existing indexes into account when actually constructing its ideal recommendation.
Currently, Dex only recommends complete indexes, not partial indexes. Dex ignores partial indexes that may be used by the query in favor of a better index, if one is not found. Dex recommends partially-ordered indexes according to a rule of thumb:
Your index field order should first answer:
- Equivalent value checks
- Sort clauses
- Range value checks ($in, $nin, $lt/gt, $lte/gte, etc.)
Note that your data cardinality may warrant a different order than the suggested indexes.
“What’s going to change in the next 10 years?” is a very interesting question buta very common one.
A more important question is ‘What’s not going to change in the next 10 years?‘ because you can build a business strategy around the things that are stable in time.
Amazon’s CEO, Jeff Bezos.
In the retail business for instance, we know that customers want low prices, and I know that’s going to be true 10 years from now.
They want fast delivery; they want vast selection. It’s impossible to imagine a future 10 years from now where a customer comes up and says, ‘Jeff I love Amazon; I just wish the prices were a little higher,’ [or] ‘I love Amazon; I just wish you’d deliver a little more slowly.’ Impossible. And so the effort we put into those things, spinning those things up, we know the energy we put into it today will still be paying off dividends for our customers 10 years from now. When you have something that you know is true, even over the long term, you can afford to put a lot of energy into it.”
Those principles applied to our IT world;
- users want fast and reliable result
- operators want stable and secure platform
- developers want to focus on architecture and algorithm, not debugging the user interface anymore
- managers want the cost for investments and maintenance to go down
Mesos 0.13 has been released and fix many bugs and include the following improvment:
- [MESOS-46] – Refactor MasterTest to use fixture
- [MESOS-134] – Add Python documentation
- [MESOS-140] – Unrecognized command line args should fail the process
- [MESOS-242] – Add more tests to Dominant Share Allocator
- [MESOS-305] – Inform the frameworks / slaves about a master failover
- [MESOS-346] – Improve OSX configure output when deprecated headers are present.
- [MESOS-360] – Mesos jar should be built for java 6
- [MESOS-409] – Master detector code should stat nodes before attempting to create
- [MESOS-472] – Separate ResourceStatistics::cpu_time into ResourceStatistics::cpu_user_time and ResourceStatistics::cpu_system_time.
- [MESOS-493] – Expose version information in http endpoints
- [MESOS-503] – Master should log LOST messages sent to the framework
- [MESOS-526] – Change slave command line flag from ‘safe’ to ‘strict’
- [MESOS-602] – Allow Mesos native library to be loaded from an absolute path
- [MESOS-603] – Add support for better test output in newer versions of autools
Download the most recent stable release: 0.13.0. (Release Notes)
= uptime / (uptime + downtime)
Availability from a technical perspective is mostly about being fault tolerant. Because the probability of a failure occurring increases with the number of components, the system should be able to compensate so as to not become less reliable as the number of components increases.
For example, availability rate for a given service over an entire year mean the following:
||How much downtime is allowed per year?
|90% (“one nine”)
||More than a month
|99% (“two nines”)
||Less than 4 days
|99.9% (“three nines”)
||Less than 9 hours
|99.99% (“four nines”)
||Less than an hour
|99.999% (“five nines”)
||~ 5 minutes
|99.9999% (“six nines”)
||~ 31 seconds
In five years, Apache Cassandra has grown into one of the most widely used NoSQL databases in the world and serves as the backbone for some of today’s most popular applications including as Facebook,Netflix,Twitter.
This newest version, Cassandra 2.0 just announced, includes multiple new features. But perhaps the biggest of them is that “Cassandra 2.0 makes it easier than ever for developers to migrate from relational databases and become productive quickly.”
New features and improvements include:
- Lightweight transactions allow ensuring operation linearizability similar to the serializable isolation level offered by relational databases, which prevents conflicts during concurrent requests
- Triggers, which enable pushing performance-critical code close to the data it deals with, and simplify integration with event-driven frameworks like Storm
- CQL enhancements such as cursors and improved index support
- Improved compaction, keeping read performance from deteriorating under heavy write load
- Eager retries to avoid query timeouts by sending redundant requests to other replicas if too much time elapses on the original request
- Custom Thrift server implementation based on LMAX Disruptor that achieves lower message processing latencies and better throughput with flexible buffer allocation strategies
Choosing a shard key can be difficult, and the factors involved largely depend on your use case.
In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every decision. This presentation goes through those tradeoffs, as well as the different types of shard keys available in MongoDB, such as hashed and compound shard keys
The Mongo-Hadoop Adapter 1.1 have been released, it makes easy to use Mongo databases, or mongoDB backup files in .bson format, as the input source or output destination for Hadoop Map/Reduce jobs. By inspecting the data and computing input splits, Hadoop can process the data in parallel so that very large datasets can be processed quickly.
The Mongo-Hadoop adapter also includes support for Pig and Hive, which allow very sophisticated MapReduce workflows to be executed just by writing very simple scripts.
- Pig is a high-level scripting language for data analysis and building map/reduce workflows
- Hive is a SQL-like language for ad-hoc queries and analysis of data sets on Hadoop-compatible file systems.
Hadoop streaming is also supported, so map/reduce functions can be written in any language besides Java. Right now the Mongo-Hadoop adapter supports streaming in Ruby, Node.js and Python.
How it Works
How the Hadoop Adapter works
- The adapter examines the MongoDB Collection and calculates a set of splits from the data
- Each of the splits gets assigned to a node in Hadoop cluster
- In parallel, Hadoop nodes pull data for their splits from MongoDB (or BSON) and process them locally
- Hadoop merges results and streams output back to MongoDB or BSON