Hypertable 0.9.7.2 released

Hypertable version 0.9.7.2 has been released.

Download here:  http://hypertable.com/download/0972/ 

Changes:

  • Fixed bug in maintenance queue causing worker thread count to drop to 1 over time
  • Fixed CellStore concurrency and divide-by-zero problems
  • issue 1038: Stop connect retry attempts to recovered server
  • Added support for .cellstore.index pseudo-table
  • Schedule log cleanup compactions ahead of merging compactions
  • Rotate starting point in maintenance scheduler to avoid compaction starvation
  • Removed absolute path in jrun script
  • Implemented pure virtual function for older CellStore format
  • issue 991: added refresh_table to the Thrift interface
  • issue 999: needs_compaction bit not getting cleared
  • htpkg – don’t build thriftbroker-only packages by default
  • Maintenance scheduler overhaul; Pause app queue if prune threshold exceeded by 20%
  • Improved CommitLog purge break log message
  • Only wait for system range recovery in RangeServer::phantom_ methods
  • Got rid of exception output for INFO messages (HT_INFO_OUT)
  • Removed dependency to log4cpp
  • Added doxygen comments to src/cc/Common
  • Added Hypertable.RangeServer.Maintenance.LowMemoryPrioritization property
  • Finished adding doxygen comments to AsyncComm
  • Fixed balance plan server assignment problem
  • Fixed hang problem in ~TableScanner on scanner errors
  • Fixed handling of missing transfer log directory in PhantomRange
  • Fixed problem with explicitly supplied timestamps and TIME_ORDER DESC
  • Changed “tar -xjv” to “tar xjv” in Capfiles
  • Checked in auto-generated code from recent Thrift changes
  • Added two-serial-failover test
  • Re-enable load balancer
  • Got rid of deadlock in AsyncComm

Hypertable getting a new website and smashing HBase in performance

Hypertable just get a new website, check it out @ http://www.hypertable.com/, the documentation section have been deeply reviewed.

A new website, but not only, according to highscalability.com and their benchmark, Hypertable delivers 2X better throughput in most tests — HBase fails 41 and 167 billion record insert tests, overwhelmed by garbage collection — Both systems deliver similar results for random read uniform test

 

Read the full performance test at:

http://highscalability.com/blog/2012/2/7/hypertable-routs-hbase-in-performance-test-hbase-overwhelmed.html

 

Hypertable 0.9.5.3 has been released

Hypertable 0.9.5.3 has been released, and can be downloaded from http://www.hypertable.com/download/

In this new version, you will find the following changes:

  • Added support for OFFSET, CELL_OFFSET
  • Fixed bug in TableInfo
  • Fixed bug in Monitoring class where the last and current server sets were incorrectly compared.
  • Added support for TIME_ORDER DESC and HyperAppHelper::create_cell_unique for the PHP microblog sample
  • Added support for chronological timestamps and unique values
  • Andy’s Hyperspace performance improvements
  • Added support for .tar.bz2 to install_pkg Capfile task
  • Changed Mac package name to include OSX version number
  • Fixed missing library dependency on Mac OSX Lion
  • Added OFFLOAD algorithm to basic load balancer, to move ranges of a list of servers.
  • Updated performance test with latest HBase and Zookeeper
  • Added regression test for issue #719
  • Fixed issue #720: LOAD DATA INTO FILE now skips empty values
  • Fixed CELL_LIMIT_CF prob in ThriftBroker; Updated documentation
  • Prefetch schemas from Hyperspace to speed up local recovery in RangeServer
  • Fixed issue #718 – cpp thrift client is now installed in /lib/cpp

New Features in Hypertable 0.9.5.3

0.9.5.3 is a patch release, but it also includes a few new features that are worth mentioning.

Pagination with OFFSET and CELL_OFFSET

With this release you can specify an OFFSET or CELL_OFFSET option to a SELECT clause to skip a number of rows or cells in the query. This is often used in combination with LIMIT and CELL_LIMIT to implement pagination, i.e. when you only display the first 20 results of a query in a web page and then let users navigate to the next or previous pages.

SELECT * FROM table OFFSET 40 LIMIT 20;

The OFFSET and CELL_OFFSET options are also available for the C++ API (ScanSpecBuilder::set_row_offset and ScanSpecBuilder::set_cell_offset) and the Thrift APIs.

Chronological Timestamps

By default, Hypertable sorts Timestamps in reverse chronological order (newest on top). Many users have expressed wishes to reverse this order, therefore we added a new column family option TIME_ORDER DESC. A SELECT of a column with this option will always return the oldest values of each cell. See below how you can use this behavior to create unique user IDs or to simulate “AUTO_INCREMENT” fields.

CREATE TABLE test (cf1 TIME_ORDER DESC);

For completeness’ sake we also added TIME_ORDER ASC which specifies the default behavior.

Unique Values (for a scalable “AUTO_INCREMENT”)

A common usage scenario is to create unique IDs, i.e. for users signing up to a web page, for items being stored in your catalogue etc. Traditional SQL databases have options like “AUTO_INCREMENT” which automatically assign a new ID to a row whenever you insert one. Usually they use a counter which is then incremented. Such a counter would be inefficient in a distributed environment since it always has to be synchronized with the other nodes. Therefore we came up with a new design, one that scales and does not require synchronization.

The first step is to create a column for the unique values. A unique value can never be overwritten once it was created, therefore we use TIME_ORDER DESC in combination with MAX_VERSIONS 1 (because we’re not interested in other values than the oldest one).

CREATE TABLE user_profiles (user_ids TIME_ORDER DESC MAX_VERSIONS 1);

The actual insert operation consists of two parts: first insert a unique key; second verify that it was really inserted (to make sure that no other node in the cluster has inserted an identical key in the meantime).

The following HQL tries to create a unique User ID for user “alice”.

INSERT INTO user_profiles VALUES (“alice”, “user_ids”, “random_unique_id”);
SELECT user_ids FROM user_profiles WHERE ROW = “alice”;
# now verify that the SELECT returned cell value “random_unique_id”

For convenience we added a new HQL function GUID() which creates globally unique IDs and which can be used to create row keys or cell values:

INSERT INTO user_profiles VALUES (“alice”, “user_ids”, GUID());

For even more convenience we created a new helper library (HyperAppHelper) which can create GUIDs and insert unique values. The functions are declared in HyperAppHelper/Unique.h. These functions are also exported to the Thrift interface. Our PHP microblogging sample, which implements much of Twitter’s functionality, uses this function when new users sign up.

Hypertable 0.9.5.1 released

Hypertable version 0.9.5.1 has been released and is no available for download.

Quick overview:

  • Snappy compression
  • Thrift 0.7.0
  • Stability fixes
  • Over 100 other commits.

Download: http://www.hypertable.com/download/

A Genome Sequence Analysis System Built With Hypertable

 

 

 

 

Great presentation made by ,Doug Judd, CEO of Hypertable Inc, during the NoSql NOW conference.
A Genome Sequence Analysis System Built With Hypertable:
http://www.slideshare.net/hypertable/genome-sequencing-with-hypertable

 

 

Hypertable 0.9.5.0 released

Hypertable 0.9.5.0 has been released and can be downloaded here.

Changes:

  • Added Asynchronous Mutators Added Load Balancer
  • Fixed RSML write race condition on shutdown
  • Added FLAG_DELETE_CELL_VERSION to allow a specific version of a cell to be deleted.
  • Fixed problem with COUNTER support in MergeScanner causing corrupt keys
  • Fixed CellStoreV5 memory tracking
  • Fixed ApplicationHandler leak on timeout; RangeServer.Scanner.Ttl default to 1800s
  • Fixed Event object leak in Comm layer
  • Improved CellStore memory tracking; Fixed merging compaction scheduling; dropped MIXED prioritization
  • Added CellStore count monitoring graph for RangeServers
  • Added scanner_count graphs to monitoring interface
  • Added heap_size, heap_slack, and tracked_memory RangeServer monitoring graphs
  • Fixed bug in RangeServer where the server would crash while processing some queries.
  • Fixed bug whereby TableMutatorAsync timeout errors were being dropped.
  • [Issue 527] Partially implemented. Thrift API partially incomplete
  • Added paging statistics graphs to RangeServer monitoring page
  • Fixed support for DELETE records in LoadDataSource.cc
  • Fix to unacknowledged move cache patch
  • [issue 637] Fixed: problem with delete cell with timestamp
  • Fixed a couple of minor bugs in metalog_dump tool.
  • Added METADATA SYNC command to rsclient
  • Added assert(timeout!=0) in Comm layer; Added/Improved some log messages
  • Added Hypertable.LoadBalancer.Enable property to allow for disabling balancer
  • Added removal of Range entity from RSML at end of relinquish operation
  • Added support for move compactions; Added COMPACT command to rsclient
  • Added unacknowledged_move cache to Master to avoid repeated move
  • Made changes to LoadBalancer to ignore RangeServers that are not live yet.
  • Reduced Range split size for Balance-Mechanics tests to avoid running out of file handles.
  • Improved bloom filter regression tests
  • [issue 285] ThriftBroker requires a restart after hypertable server processes are restarted
  • Skip over RS_METRICS entries where version != 2
  • Fixed SystemInfo network rx/tx calculation
  • issue 639: Changed sort order to be numeric on RangeServer Monitoring page
  • Fixed race condition in TableMutatorAsync as well as bug in async_api_test.
  • Fixed couple of minor test issues.
  • Modified LoadMetricsRange to delete entries for “old” range after a split.
  • issue 630: use hostname instead of IP on RS page; Improved graph colors
  • Improved merge algorithm by merging long runs of CellStores < TargetSize.Minimum
  • Added complete AccessGroup information to RangeServer::dump
  • issue 636: Fixed bogus RangeServer shutdown error message
  • Eliminated superfluous memory allocations during log replay
  • Added –heapprofile to start-rangeserver.sh
  • Fixed a couple of uninitialized memory references
  • Fixed bug in ScanSpec end_row specification.
  • Changed default value of Hyperspace.LogGc.Interval to 10 mins instead of 1 hr
  • Fixed couple of possibly buggy comparisons in TableMutatorAsync code.
  • Added completed method to ResultCallback so applications can be notified when outstanding async calls are complete.
  • Added “heapcheck” command to RangeServer for dumping heap stats
  • Fixed race condition in TableMutatorAsync code.
  • Added stop_monitoring and start_monitoring cap commands.
  • Removed over-aggressive assert in TableMutatorAsyncDispatchHandler destructor.
  • Fixed a couple of warnings
  • Make Monitoring class map servers by location instead of id
  • If RangeServer.ProxyName equals “*” generate Location name using hostname+port
  • Added balancing mechanics
  • Minor fix to Thrift future calls.
  • Replaced TableMutator code with new mutator code based on TableMutatorAsync

Stability Improvements in the Hypertable 0.9.5.0 pre-release

Details on stability improvements in the Hypertable 0.9.5.0 pre-release have been posted on the blog: http://blog.hypertable.com/

 

We recently announced the Hypertable 0.9.5.0 pre-release.  Even though we’ve labelled it as a “pre” release, it is one of the biggest and most important Hypertable releases to date.  Among other things, it includes a complete re-write of the Master, to fix some known stability problems.  It represents a significant amount of work as can be seen by the following code change statistics:

  • 512 files changed
  • 30,633 line insertions
  • 14,354 line deletions

The following describes problems that existed in prior releases and how they were solved, and highlights other stability improvements included in the 0.9.5.0 pre-release.

Duplicate range load. In prior releases, when a Range Server decided to give up a range (e.g. after a split), it would inform the master by calling the Master::move_range() method and then record the move in its meta log (RSML).  Unfortunately, this logic contained a race condition.  If the range server called Master::move_range(), but died before it got a chance to record the move in the RSML, and then the Master was stopped (e.g. sysadmin restart of the system), all record of the move was lost.  When the RangeServer came back up, it would re-attempt to move the range, causing it to get loaded by two different range servers.  With the introduction of the Master MetaLog (MML) and a two-phaseMaster::move_range() operation, this problem has been resolved.

Overlapping ranges. In prior releases, the Master would ask a range server to load a range by calling theRangeServer::load_range() method and would rely on the ALREADY_LOADED response code to handle situations where the acknowledgement was lost (e.g. range server or master died at an inopportune moment) and the RangeServer::load_range() call was re-issued.  This logic also contained a race condition.  When a range was loaded and the acknowledgement was lost, the loaded range could split before the Master re-attempted to load the range.  When RangeServer::load_range() call was re-issued, the RangeServer happily loaded the range because it no longer contained the range in its live set (due to the split).  With the introduction of a two-phase load range operation, this problem has been resolved.

Lost updates (multiple access groups). This was a bug in how the system decided to remove commit log fragments for tables with multiple access groups.  The system computes an “earliest cached revision” value for each access group and only removes commit log fragments that contain cells whose revision number is less than that value.  In prior releases, there was a bug where the earliest cached revision of the last access group was taken for all the access groups.  In certain situations, this caused commit log fragments to be removed prematurely which resulted in data loss on system restart.  This bug has been fixed in the current release.
Transparent Master Failover. In prior releases, if a client issued a request to the Master and the Master failed before delivering the results, the request would fail.  With the introduction of the Master MetaLog and a two-phase request sequence, a Master failover can occur mid-request and the request will complete successfully on the new Master, completely transparent to the requesting client.
Cloudera’s CDH3 Hadoop Release. A big source of stability issues in the past have stemmed from problems with HDFS.  The well-known “sync” issue has been the biggest trouble for Hypertable, causing critical log files to effectively disappear, resulting in data loss or worse, leaving the system in an inconsistent and inoperable state.  The CDH3 Hadoop release from Cloudera includes a number of patches to the 0.20.2 Apache release that appear to have solved the sync problem.  We tested it with Hypertable through the beta period and have found it to be stable.  The current Hypertable release is built against CDH3 and we recommend it for all Hypertable deployments.
Stability is our #1 priority followed closely by performance and scalability.  Hypertable has been in development since early 2007 and the feedback we’ve gotten over the years from our production deployments and open source community has helped Hypertable to stabilize and become a much more mature product.  We’re aggressively working towards the 1.0 release and we look forward to seeing Hypertable become the infrastructure of choice for solving big data problems.

 

Hypertable 0.9.5.0 "pre4" released

Version 0.9.5.0.pre4:
  •     [Issue 617] Fixed superfluous RangeServer::create_scanner() request when scanning over exact range

  •     Removed benchmark tasks from Capfiles, added Capfile.benchmark

  •     Added support for ScanSpec in Hadoop streaming mapred

  •     Added hostname to AsyncComm proxy map

  •     issue 528: Allow embedded semicolons in HQL string constants

  •     Added optional support for jemalloc

  •     Fixed bug in scanner which causes assert when a scanner hits row limit and receives results from a cancelled scanner

  •     Missed checkin of auto-generated java code

  •     Fixed CellStore GC regression; Added test

  •     API Change: Made Table objects auto-refresh schema by default

  •     Added more info to RS_METRICS (including disk_bytes_read)

  •     Support for optionally including master location hash into RangeServer proxies

  •     Added support for IP addresses to be used as Hyperspace.Replication.Host strings.

  •     Added is_ipv4() method to InetAddr to test if a string is an ip addr in the n.n.n.n format.

  •     Fixed MergeScanner bug which failed to use MAX_VERSIONS filter correctly when value regex was used.

  •     Added '' escaping/unescaping to SELECT, DUMP TABLE, LOAD DATA INFILE

  •     Removed trailing slash on exclude hyperspace in Capfile rsync commands

  •     Added Hyperspace.Client.Datagram.SendPort property (default=0)

  •     Fixed bug in scanner timeout logic. Timers were not being reset correctly in between calls.

  •     Removing dependency on BerkeleyDb libraries in Hyperspace client library.

  •     Fixed Capfile install/upgrade tasks to work with standalone installs

  •     Fix for rounding error in LOAD DATA INFILE timestamp parsing

  •     Fix for KEYS_ONLY scan

  •     Fixed bug in MergeScanner logic.

 

http://www.hypertable.com/download/

Hypertable 0.9.5.0 "pre3" patch released

Hypertable 0.9.5.0 “pre3″ patch has been release and is available for download.

Upgrade is recommended as there was a performance regression under low memory condition that made it into “pre2″.

 

Download

Hypertable website

Hypertable 0.9.5.0 pre-release

Hypertable 0.9.5.0 pre-release is a big one: 512 files changed, 30,633 insertions, 14,354 deletions

Main changes:

  • Major stability fixes
  • CDH3B4 integration
  • Better monitoring UI
  • Async query API

Click here to download