Twemproxy v 0.3.0 has been released

twemproxy v0.3.0 is out: bug fixes and support for smartos (solaris) / bsd (macos)

twemproxy (pronounced “two-em-proxy”), aka nutcracker is a fast and lightweight proxy for memcached and redis protocol. It was primarily built to reduce the connection count on the backend caching servers.

Features

  • Fast.
  • Lightweight.
  • Maintains persistent server connections.
  • Keeps connection count on the backend caching servers low.
  • Enables pipelining of requests and responses.
  • Supports proxying to multiple servers.
  • Supports multiple server pools simultaneously.
  • Shard data automatically across multiple servers.
  • Implements the complete memcached ascii and redis protocol.
  • Easy configuration of server pools through a YAML file.
  • Supports multiple hashing modes including consistent hashing and distribution.
  • Can be configured to disable nodes on failures.
  • Observability through stats exposed on stats monitoring port.
  • Works with Linux, *BSD, OS X and Solaris (SmartOS)

 

More details and source code available here: https://github.com/twitter/twemproxy

Twitter's fatcache available on GitHub

fatcache is memcache on SSD. Think of fatcache as a cache for your big data.

Overview

There are two ways to think of SSDs in system design. One is to think of SSD as an extension of disk, where it plays the role of making disks fast and the other is to think of them as an extension of memory, where it plays the role of making memory fat. The latter makes sense when persistence (non-volatility) is unnecessary and data is accessed over the network. Even though memory is thousand times faster than SSD, network connected SSD-backed memory makes sense, if we design the system in a way that network latencies dominate over the SSD latencies by a large factor.

To understand why network connected SSD makes sense, it is important to understand the role distributed memory plays in large-scale web architecture. In recent years, terabyte-scale, distributed, in-memory caches have become a fundamental building block of any web architecture. In-memory indexes, hash tables, key-value stores and caches are increasingly incorporated for scaling throughput and reducing latency of persistent storage systems. However, power consumption, operational complexity and single node DRAM cost make horizontally scaling this architecture challenging. The current cost of DRAM per server increases dramatically beyond approximately 150 GB, and power cost scales similarly as DRAM density increases.

Fatcache extends a volatile, in-memory cache by incorporating SSD-backed storage.

SSD-backed memory presents a viable alternative for applications with large workloads that need to maintain high hit rate for high performance. SSDs have higher capacity per dollar and lower power consumption per byte, without degrading random read latency beyond network latency.

Fatcache achieves performance comparable to an in-memory cache by focusing on two design criteria:

  • Minimize disk reads on cache hit
  • Eliminate small, random disk writes

The latter is important due to SSDs’ unique write characteristics. Writes and in-place updates to SSDs degrade performance due to an erase-and-rewrite penalty and garbage collection of dead blocks. Fatcache batches small writes to obtain consistent performance and increased disk lifetime.

SSD reads happen at a page-size granularity, usually 4 KB. Single page read access times are approximately 50 to 70 usec and a single commodity SSD can sustain nearly 40K read IOPS at a 4 KB page size. 70 usec read latency dictates that disk latency will overtake typical network latency after a small number of reads. Fatcache reduces disk reads by maintaining an in-memory index for all on-disk data.

https://github.com/twitter/fatcache

DuckDuckGo serves 1 Million searches a day

duckduckgoHighscalability.com has published an interview with Gabriel Weinberg, founder of Duck Duck Go and general all around startup guru, on what DDG’s architecture looks like in 2012. You fill find detail on how they use memcached, postgreSql and many other great peace of software to serves 1 million search a day !

Can’t searching the Open Web provide all this data? No really. This is structured data with semantics. Not an HTML page. You need a search engine that’s capable of categorizing, mapping, merging, filtering, prioritizing, searching, formatting, and disambiguating richer data sets and you can’t do that with a keyword search. You need the kind of smarts DDG has built into their search engine. One problem of course is now that data has become valuable many grown ups don’t want to share anymore.

 

 

The full article on highscalability.com

Main security tools NMAP integrating detection for NoSQL solution

Nmap (“Network Mapper”) is a free and open source utility for network exploration or security auditing. Many systems and network administrators also find it useful for tasks such as network inventory. Its also used to determine what hosts are available on the network, what services (application name and version) those hosts are offering.

Since the 2nd of January 2012, scripts for NMAP are available to handle few NoSQL solutions:

http://seclists.org/nmap-dev/2012/q1/11

So using NMAP and those latest scripts you’ll be able to detect on your network the server running riak, memcached or redis as well as the installed version.

 

 

 

MySQL 5.6.2 bring new NoSQL Interface via memcached

Architects and strategists can start looking ahead to the exciting new thing with SQL in its name: MySQL 5.6, builds on the momentum of 5.5, and Oracle’s investment and commitment to MySQL, by delivering better performance and scalability. Trying to make the bridge between relationnal SQL and NoSQL movement.

This release focused on:

  • Optimizer improvements for all-around query performance.
  • InnoDB improvements for higher transactional throughput.
  • New NoSQL-style memcached APIs.
  • Partitioning improvements for querying and managing huge tables.
  • Replication improvements covering many aspects.
  • Better performance monitoring by expanding the data available through the PERFORMANCE_SCHEMA.

 

MySQL 5.6.2 is available for download

What’s new in MySQL 5.6.2 is available here