Hypertable 0.9.4.3 has been released and bring a popular feature request: regular expression based filtering.
Queries can now filter cells by regular expression matches on the row key, column qualifier, and value.
Implementation is built over Google’s RE2 regular expression engine. Although there are a number of excellent regular expression engines to choose from, RE2 fits perfectly with Hypertable’s high performance philosophy and the fact that it powers Bigtable, Sawzall and a host of other Google projects made it a great candidate.
Reasons for picking RE2:
- RE2 supports most Perl/PCRE syntax and allows us to provide feature rich regular expression matching.
- It is blazingly fast (guaranteed linear run time), so we don’t have to worry about ad-hoc queries hogging up too much CPU. With a 110MB dataset consisting of about 4.5M unique URLs and our tests showed RE2 was 3X-50X faster as compared to java.util.regex.Pattern.
- It uses a small, fixed amount of memory so no query can bring down the server with unbounded memory usage.
- RE2 allows Hypertable to deliver powerful pattern matching capability at the lowest hardware cost.
- The fact that RE2 powers Bigtable, Sawzall and other critical Google code speaks for itself.
Download 0.9.4.3 release here
Full announcement on Hypertable’s blog