MongoDB 2.4 rc0 has been released

MongoDB 2.4 rc0 is now available for testing.

This release includes text indexes, new and improved geospatial capabilities, a new V8 JavaScript engine and hashed shard keys. This is a development release and is not meant for production use.

Full change list:

Text Indexes

The text index type is currently an experimental feature. To use a text index, you need to enable it at run time or startup.

Background

MongoDB 2.3.2 includes a new text index type. text indexes support boolean text search queries:

  • Any set of fields containing string data may be text indexed.
  • You may only maintain a single text index per collection.
  • text indexes are fully consistent and updated in real-time as applications insert, update, or delete documents from the database.
  • The text index and query system supports language specific stemming and stop words. Additionally:
    • Indexes and queries drop stop words (i.e. “the,” “an,” “a,” “and,” etc.)
    • MongoDB stores words stemmed during insertion, using simple suffix stemming, and includes support for a number of languages. MongoDB automatically stems text queries before beginning the query.

However, text indexes have large storage requirements and incur significant performance costs:

  • Text indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted.
  • Building a text index is very similar to building a large multi-key index, and therefore may take longer than building a simple ordered (scalar) index.
  • text indexes will impede insertion throughput, because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document.
  • Some text searches may affect performance on your mongod, particularly for negation queries and phrase matches that cannot use the index as effectively as other kinds of queries.

Additionally, the current experimental implementation of text indexes have the following limitations and behaviors:

  • text indexes do not store phrases or information about the proximity of words in the documents. As a result, phrase queries will run much more effectively when the entire collection fits in RAM.
  • MongoDB does not stem phrases or negations in text queries.
  • The index is case-insensitive.
  • A collection may only have a single text index at a time.

Warning

 

Do not enable or use text indexes on production systems.

Test text Indexes

The text index type is an experimental feature and you need to enable the feature before creating or accessing a text index.

To enable text indexes, issue the following command in the mongo shell:

Warning

 

Do not enable or use text indexes on production systems.

db.adminCommand( { setParameter: 1, textSearchEnabled: true } )

You can also start the mongod with the following invocation:

mongod –setParameter textSearchEnabled=true

Create Text Indexes

To create a text index, use the following syntax of ensureIndex():

db.collection.ensureIndex( { <field>: “text” } )

Consider the following example:

db.collection.ensureIndex( { content: “text” } )

This text index catalogs all string data in the content field where the content field contains a string or an array of string elements. To index fields in sub-documents, you need to specify the individual fields from the sub-documents using the dot notation. A text index can include multiple fields, as in the following:

db.collection.ensureIndex( { content: “text”,

“users.comments”: “text”,

“users.profiles”: “text” } )

The default name for the index consists of the <field name> concatenated with _text for the indexed fields, as in the following:

“content_text_users.comments_text_users.profiles_text”

These indexes may run into the Index Name Length limit. To avoid creating an index with a too-long name, you can specify a name in the options parameter, as in the following:

db.collection.ensureIndex( { content: “text”,

“users.profiles”: “text” },

{ name: “TextIndex” } )

When creating text indexes you may specify weights for specific fields. Weights are factored into the relevant score for each document. The score for a given word in a document is the weighted sum of the frequency for each of the indexed fields in that document. Consider the following:

db.collection.ensureIndex( { content: “text”,

“users.profiles”: “text” },

{ name: “TextIndex”,

weights: { content: 1,

“users.profiles”: 2 } } )

This example creates a text index on the top-level field named content and the profiles field in theusers sub-documents. Furthermore, the content field has a weight of 1 and the users.profiles field has a weight of 2.

You can add a conventional ascending or descending index field(s) as a prefix or suffix of the index. You cannot include multi-key index field nor geospatial index field.

If you create an ascending or descending index as a prefix of a text index:

  • MongoDB will only index documents that have the prefix field (i.e. username) and
  • The text query can limit the number of index entries to review in order to perform the query.
  • All text queries using this index must include the filter option that specifies an equality condition for the prefix field or fields.

Create this index with the following operation:

db.collection.ensureIndex( { username: 1,

“users.profiles”: “text” } )

Alternatively you create an ascending or descending index as a suffix to a text index. Then the text index can support covered queries if the text command specifies a project option.

Create this index with the following operation:

db.collection.ensureIndex( { “users.profiles”: “text”,

username: 1 } )

Finally, you may use the special wild card field specifier (i.e. $**) to specify index weights and fields. Consider the following example that indexes any string value in the data of every field of every document in a collection and names it TextIndex:

db.collection.ensureIndex( { “$**”: “text”,

username: 1 },

{ name: “TextIndex” } )

By default, an index field has a weight of 1. You may specify weights for a text index with compound fields, as in the following:

db.collection.ensureIndex( { content: “text”,

“users.profiles”: “text”,

comments: “text”,

keywords: “text”,

about: “text” },

{ name: “TextIndex”,

weights:

{ content: 10,

“user.profiles”: 2,

keywords: 5,

about: 5 } } )

This index, named TextIndex, includes a number of fields, with the following weights:

  • content field that has a weight of 10,
  • users.profiles that has a weight of 2,
  • comments that has a weight of 1,
  • keywords that has a weight of 5, and
  • about that has a weight of 5.

This means that documents that match words in the content field will appear in the result set more than all other fields in the index, and that the user.profiles and comments fields will be less likely to appear in responses than words from other fields.

Note

 

You must drop a text index using the name specified when you created the index. Alternatively, if you did not specify a name when creating the index, you can find the name usingdb.collection.getIndexes()

Text Queries

MongoDB 2.3.2 introduces the text command to provide query support for text indexes. Unlike normal MongoDB queries, text returns a document rather than a cursor.

text

The text provides an interface to search text context stored in the text index. Consider the following prototype: text:

db.collection.runCommand( “text”, { search: <string>,

filter: <document>,

project: <document>,

limit: <number>,

language: <string> } )

The text command has the following parameters:

Parameters:
  • search (string) –

A text string that MongoDB stems and uses to query the text index. In the mongoshell, to specify a phrase to match, you can either:

  • enclose the phrase in escaped double quotes and use double quotes to specify the search string, as in “”coffee table””, or
  • enclose the phrase in double quotes and use single quotes to specify thesearch string, as in ‘”coffee table”‘
  • filter (document) –

Optional. A query document to further limit the results of the query using another database field. You can use any valid MongoDB query in the filter document, except if the index includes an ascending or descending index field as a prefix.

If the index includes an ascending or descending index field as a prefix, thefilter is required and the filter query must be an equality match.

  • project (document) – Optional. Allows you to limit the fields returned by the query to only those specified.
  • limit (number) –

Optional. Specify the maximum number of documents to include in the response. The text sorts the results before applying the limit.

The default limit is 100.

  • language (string) – Optional. Specify the language that determines the tokenization, stemming, and the stop words for the search. The default language isenglish.
Returns: text returns results, in descending order by score, in the form of a document. Results must fit within the BSON Document Size. Use the limit and the projectparameters to limit the size of the result set.

The implicit connector between the terms of a multi-term search is a disjunction (OR). Search for“first second” searches for “first” or “second”. The scoring system will prefer documents that contain all terms.

However, consider the following behaviors of text queries:

  • With phrases (i.e. terms enclosed in escaped quotes), the search performs an AND with any other terms in the search string; e.g. search for “”twinkle twinkle” little star”searches for “twinkle twinkle” and (“little” or “star”).
  • text adds all negations to the query with the logical AND operator.

Example

 

Consider the following examples of text queries. All examples assume that you have a text index on the field named content in a collection named collection.

  1. Create a text index on the content field to enable text search on the field:

2.db.collection.ensureIndex( { content: “text” } )

  1. Search for a single word coffee:

4.db.collection.runCommand( “text”, { search: “coffee” } )

This query returns documents that contain the word coffee, case-insensitive, in thecontent field.

  1. Search for multiple words, bake or coffee or cake:

6.db.collection.runCommand( “text”, { search: “bake coffee cake” } )

This query returns documents that contain the either bake or coffee or cake in thecontent field.

  1. Search for the exact phrase bake coffee cake:

8.db.collection.runCommand( “text”, { search: “”bake coffee cake”” } )

This query returns documents that contain the exact phrase bake coffee cake.

  1. Search for documents that contain the words bake or coffee, but not cake:
  2. db.collection.runCommand( “text”, { search: “bake coffee -cake” } )

Use the - as a prefix to terms to specify negation in the search string. The query returns documents that contain the either bake or coffee, but not cake, all case-insensitive, in the content field. Prefixing a word with a hyphen (-) negates a word:

  • The negated word filters out documents from the result set, after selecting documents.
  • <search string> that only contains negative words returns no match.
  • A hyphenated word, such as case-insensitive, is not a negation. The textcommand treats the hyphen as a delimiter.

11. Search for a single word coffee with an additional filter on the about field, but limitthe results to 2 documents with the highest score and return only the comments field in the matching documents:

  1. db.collection.runCommand( “text”, {
  2.                                     search: “coffee”,
  3.                                     filter: { about: /desserts/ },
  4.                                     limit: 2,
  5.                                     project: { comments: 1, _id: 0 }
  6.                                   }
  7.                         )
  • The filter query document may use any of the available query operators.
  • Because the _id field is implicitly included, in order to return only the commentsfield, you must explicitly exclude (0) the _id field. Within the project document, you cannot mix inclusions (i.e. <fieldA>: 1) and exclusions (i.e. <fieldB>: 0), except for the _id field.

New Modular Authentication System with Support for Kerberos

Note

 

These features are only present in the MongoDB Subscriber Edition. To download the 2.4.0 release candidate the Subscriber Edition, use the following resources:

An improved authentication system is a core focus of the entire 2.3 cycle, as of 2.3.2, the following components of the new authentication system are available for use in MongoDB:

  • mongod instances can authenticate users via Kerberos.
  • the mongo shell can authenticate to mongod instances using Kerberos.
  • MongoDB Clients can authenticate using Kerberos with the C++ client library and development versions of the Java and C# drivers.

Initial Support for Kerberos Authentication

Development work on this functionality is ongoing, and additional related functionality is forthcoming. To use Kerberos with MongoDB as of the 2.4.0 release candidate, consider the following requirements:

  • add users to MongoDB as with the existing authentication mechanism:
    • Usernames must correspond to the Kerberos principal (e.g. <username>@<REALM> as inmongodbuser@EXAMPLE.COM,)
    • You must have a user document in the system.users collection with the Kerberos principal for any database that you want to grant access.
    • every mongod using Kerberos must have a fully resolvable fully qualified domain name. This includes all members of replica sets.
    • every mongod using Kerberos must have a Kerberos service principal, in the form of:mongodb/<fqdn>@<REALM>.
    • each system running a mongod with Kerberos must have a key tab file that holds key data granting access to it’s principal that the mongod can read.

Starting mongod with Kerberos

To start mongod with Kerberos support (i.e. with the GSSAPI authentication mechanism,) you must have a working Kerberos environment and a valid Kerberos keytab file.

To start mongod, use a command in the following form:

env KRB5_KTNAME=<path to keytab file> <mongod invocation>

You must start mongod with auth or keyFile [1] and configuring the list of in theauthenticationMechanisms parameter. An actual command would resemble:

env KRB5_KTNAME=/opt/etc/mongodb.keytab

/opt/bin/mongod –dbpath /opt/data/db –logpath /opt/log/mongod.log –fork

–auth –setParameter authenticationMechanisms=GSSAPI

Replace the paths as needed for your test deployment.

If you want to enable both Kerberos and the legacy challenge-and-response authentication mechanism, appendMONGO-CR to the authenticationMechanisms parameter. Consider the following example:

env KRB5_KTNAME=/opt/etc/mongodb.keytab

/opt/bin/mongod –dbpath /opt/data/db –logpath /opt/log/mongod.log –fork

–auth –setParameter authenticationMechanisms=GSSAPI,MONGO-CR

Note

 

If you’re having trouble getting mongod to start with Kerberos, there are a number of Kerberos-specific issues that can prevent successful authentication. As you begin troubleshooting your Kerberos deployment, ensure that:

  • You have a valid keytab file specified in the environment running the mongod.
  • The mongod is from the MongoDB Subscriber Edition.
  • DNS allows the mongod to resolve the components of the Kerberos infrastructure.
  • The time systems of the systems running the mongod instances and the Kerberos infrastructure are synchronized.

Until you can successfully authenticate a client using the Kerberos you may want to enable MONGO-CRauthentication mechanism to provide access to the mongod instance during configuration.

[1] keyFile implies auth. You must use keyFile for replica sets.

Connecting and Authenticating MongoDB Clients Using Kerberos

To use Kerberos with the mongo shell, begin by initializing a Kerberos session with kinit. Then start a 2.3.2 mongoshell instance, and use the following sequence of operations to associate the current connection with the Kerberos session:

use $external

db.auth( { mechanism: “GSSAPI”, user: “<username>@<REALM>” }  )

The value of the user field must be the same principal that you initialized with kinit. This connection will acquire access in accordance with all privileges granted to this user for all databases.

See

 

MongoDB Security Practices and Procedures.

Default JavaScript Engine Switched to v8 from SpiderMonkey

The default JavaScript engine used throughout MongoDB, for the mongo shell, mapReduce$where, and eval is now v8.

serverBuildInfo.interpreterVersion

The interpreterVersion field of the document output by db.serverBuildInfo() in the mongo shell reports which JavaScript interpreter the mongod instance is running.

interpreterVersion()

The interpreterVersion() in the mongo shell reports which JavaScript interpreter this mongo shell uses.

New Geospatial Indexes with GeoJSON and Improved Spherical Geometry

Note

 

In 2.3.2, the index type for Spherical Geospatial Indexes become 2dsphere.

The 2.3 series adds a new type of geospatial index that supports improved spherical queries and GeoJSON. Create the index by specifying 2dsphere as the value of the field in the index specification, as any of the following:

db.collection.ensureIndex( { geo: “2dsphere” } )

db.collection.ensureIndex( { type: 1, geo: “2dsphere” } )

db.collection.ensureIndex( { geo: “2dsphere”, type: 1 } )

In the first example you create a spherical geospatial index on the field named geo, in the second example, you create a compound index where the first field is a normal index, and the index of the second field is a spherical geospatial index. Unlike2d indexes, fields indexed using the 2dsphere type do not have to be the first field in a compound index.

You must store data in the fields indexed using the 2dsphere index using the GeoJSON specification, at the moment. Support for storing points, in the form used by the existing 2d (i.e. geospatial) indexes is forthcoming. Currently, 2dsphereindexes only support the following GeoJSON shapes:

  • Point, as in the following:
    • { “type”: “Point”, “coordinates”: [ 40, 5 ] }
    • LineString, as in the following:
      • { “type”: “LineString”, “coordinates”: [ [ 40, 5 ], [ 41, 6 ] ] }
      • Polygon, as in the following:
        • {
        •   “type”: “Polygon”,
        •   “coordinates”: [ [ [ 40, 5 ], [ 40, 6 ], [ 41, 6 ], [ 41, 5 ], [ 40, 5 ] ] ]
        • }

To query 2dsphere indexes, all current geospatial query operators with an additional $geoIntersects operator. Currently, all queries using the 2dsphere index must pass the query selector (e.g. $near$geoIntersects) a GeoJSON document. With the exception of the GeoJSON requirement, the operation of $near is the same for 2dsphere indexes as 2dindexes.

$geoIntersects

The $geoIntersects selects all indexed points that intersect with the provided geometry. (i.e. PointLineString, and Polygon.) You must pass $geoIntersects a document in GeoJSON format.

db.collection.find( { $geoIntersects: { $geometry: { “type”: “Point”, “coordinates”: [ 40, 5 ] } } } )

This query will select all indexed objects that intersect with the Point with the coordinates [ 40, 5 ]. MongoDB will return documents as intersecting if they have a shared edge.

The $geometry operator takes a single GeoJSON document.

$geometry

mongod Automatically Continues in Progress Index Builds Following Restart

If your mongod instance was building an index when it shutdown or terminated, mongod will now continue building the index when the mongod restarts. Previously, the index build had to finish building before mongod shutdown.

To disable this behavior the 2.3 series adds a new run time option, noIndexBuildRetry (or via,q –noIndexBuildRetry on the command line,) for mongodnoIndexBuildRetry prevents mongod from continuing rebuilding indexes that did were not finished building when the mongod last shut down.

noIndexBuildRetry

By default, mongod will attempt to rebuild indexes upon start-up if mongod shuts down or stops in the middle of an index build. When enabled, run time option prevents this behavior.

New Hashed Index and Sharding with a Hashed Shard Key

To support an easy to configure and evenly distributed shard key, version 2.3 adds a new “hashed” index type that indexes based on hashed values. This section introduces and documents both the new index type and its use in sharding:

Hashed Index

The new hashed index exists primarily to support automatically hashed shard keys. Consider the following properties of hashed indexes:

  • Hashed indexes must only have a single field, and cannot be compound indexes.
  • Fields indexed with hashed indexes must not hold arrays. Hashed indexes cannot be multikey indexes.
  • Hashed indexes cannot have a unique constraint.

You may create hashed indexes with the sparse property.

  • MongoDB can use the hashed index to support equality queries, but cannot use these indexes for range queries.
  • Hashed indexes offer no performance advantage over normal indexes. However, hashed indexes may be smaller than a normal index when the values of the indexed field are larger than 64 bits. [2]
  • it’s possible to have a hashed and non-hashed index on the same field: MongoDB will use the non-hashed for range queries.

Warning

 

Hashed indexes round floating point numbers to 64-bit integers before hashing. For example, a hashed index would store the same value for a field that held a value of 2.3 and 2.2. To prevent collisions do not use a hashed index for floating point numbers that cannot be consistently converted to 64-bit integers (and then back to floating point.) Hashed indexes do not support floating point values larger than 253.

Create a hashed index using an operation that resembles the following:

db.records.ensureIndex( { a: “hashed” } )

This operation creates a hashed index for the records collection on the a field.

[2] The hash stored in the hashed index is 64 bits long.

Hashed Sharding

To shard a collection using a hashed shard key, issue an operation in the mongo shell that resembles the following:

sh.shardCollection( “records.active”, { a: “hashed” } )

This operation shards the active collection in the records database, using a hash of the a field as the shard key. Consider the following properties when using a hashed shard key:

  • As with other kinds of shard key indexes, if your collection has data, you must create the hashed index before sharding. If your collection does not have data, sharding the collection will create the appropriate index.
  • The mongos will route all equality queries to a specific shard or set of shards; however, the mongos must route range queries to all shards.
  • When using a hashed shard key on a new collection, MongoDB automatically pre-splits the range of 64-bit hash values into chunks. By default, the initial number of chunks is equal to twice the number of shards at creation time. You can change the number of chunks created, using the numInitialChunks option, as in the following invocation of shardCollection:
    • db.adminCommand( { shardCollection: “test.collection”,
    •                    key: { a: “hashed”},
    •                    numInitialChunks: 2001 } )

MongoDB will only pre-split chunks in a collection when sharding empty collections. MongoDB will not create chunk splits in a collection sharding collections that have data.

Warning

 

Avoid using hashed shard keys when the hashed field has non-integral floating point values, see hashed indexes for more information.

 

Release Notes for MongoDB 2.4

From official website

Text Indexes

Note

The text index type is currently an experimental feature. To use a text index, you need to enable it at run time or startup.

Background

MongoDB 2.3.2 includes a new text index type. text indexes support boolean text search queries:

  • Any set of fields containing string data may be text indexed.
  • You may only maintain a single text index per collection.
  • text indexes are fully consistent and updated in real-time as applications insert, update, or delete documents from the database.
  • The text index and query system supports language specific stemming and stop words. Additionally:
    • Indexes and queries drop stop words (i.e. “the,” “an,” “a,” “and,” etc.)
    • MongoDB stores words stemmed during insertion, using simple suffix stemming, and includes support for a number of languages. MongoDB automatically stems text queries before beginning the query.

However, text indexes have large storage requirements and incur significant performance costs:

  • Text indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted.
  • Building a text index is very similar to building a large multi-key index, and therefore may take longer than building a simple ordered (scalar) index.
  • text indexes will impede insertion throughput, because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document.
  • Some text searches may affect performance on your mongod, particularly for negation queries and phrase matches that cannot use the index as effectively as other kinds of queries.

Additionally, the current experimental implementation of text indexes have the following limitations and behaviors:

  • text indexes do not store phrases or information about the proximity of words in the documents. As a result, phrase queries will run much more effectively when the entire collection fits in RAM.
  • MongoDB does not stem phrases or negations in text queries.
  • The index is case-insensitive.
  • A collection may only have a single text index at a time.

Warning

Do not enable or use text indexes on production systems.

Test text Indexes

The text index type is an experimental feature and you need to enable the feature before creating or accessing a text index.

To enable text indexes, issue the following command in the mongo shell:

Warning

Do not enable or use text indexes on production systems.

db.adminCommand( { setParameter: 1, textSearchEnabled: true } )

You can also start the mongod with the following invocation:

mongod –setParameter textSearchEnabled=true

Create Text Indexes

To create a text index, use the following syntax of ensureIndex():

db.collection.ensureIndex( { <field>: “text” } )

Consider the following example:

db.collection.ensureIndex( { content: “text” } )

This text index catalogs all string data in the content field where the content field contains a string or an array of string elements. To index fields in sub-documents, you need to specify the individual fields from the sub-documents using the dot notation. A text index can include multiple fields, as in the following:

db.collection.ensureIndex( { content: “text”,

“users.comments”: “text”,

“users.profiles”: “text” } )

The default name for the index consists of the <field name> concatenated with _text for the indexed fields, as in the following:

“content_text_users.comments_text_users.profiles_text”

These indexes may run into the Index Name Length limit. To avoid creating an index with a too-long name, you can specify a name in the options parameter, as in the following:

db.collection.ensureIndex( { content: “text”,

“users.profiles”: “text” },

{ name: “TextIndex” } )

When creating text indexes you may specify weights for specific fields. Weights are factored into the relevant score for each document. The score for a given word in a document is the weighted sum of the frequency for each of the indexed fields in that document. Consider the following:

db.collection.ensureIndex( { content: “text”,

“users.profiles”: “text” },

{ name: “TextIndex”,

weights: { content: 1,

“users.profiles”: 2 } } )

This example creates a text index on the top-level field named content and the profilesfield in the users sub-documents. Furthermore, the content field has a weight of 1 and theusers.profiles field has a weight of 2.

You can add a conventional ascending or descending index field(s) as a prefix or suffix of the index. You cannot include multi-key index field nor geospatial index field.

If you create an ascending or descending index as a prefix of a text index:

  • MongoDB will only index documents that have the prefix field (i.e. username) and
  • The text query can limit the number of index entries to review in order to perform the query.
  • All text queries using this index must include the filter option that specifies an equality condition for the prefix field or fields.

Create this index with the following operation:

db.collection.ensureIndex( { username: 1,

“users.profiles”: “text” } )

Alternatively you create an ascending or descending index as a suffix to a text index. Then thetext index can support covered queries if the text command specifies a project option.

Create this index with the following operation:

db.collection.ensureIndex( { “users.profiles”: “text”,

username: 1 } )

Finally, you may use the special wild card field specifier (i.e. $**) to specify index weights and fields. Consider the following example that indexes any string value in the data of every field of every document in a collection and names it TextIndex:

db.collection.ensureIndex( { “$**”: “text”,

username: 1 },

{ name: “TextIndex” } )

By default, an index field has a weight of 1. You may specify weights for a text index with compound fields, as in the following:

db.collection.ensureIndex( { content: “text”,

“users.profiles”: “text”,

comments: “text”,

keywords: “text”,

about: “text” },

{ name: “TextIndex”,

weights:

{ content: 10,

“user.profiles”: 2,

keywords: 5,

about: 5 } } )

This index, named TextIndex, includes a number of fields, with the following weights:

  • content field that has a weight of 10,
  • users.profiles that has a weight of 2,
  • comments that has a weight of 1,
  • keywords that has a weight of 5, and
  • about that has a weight of 5.

This means that documents that match words in the content field will appear in the result set more than all other fields in the index, and that the user.profiles and comments fields will be less likely to appear in responses than words from other fields.

Note

You must drop a text index using the name specified when you created the index. Alternatively, if you did not specify a name when creating the index, you can find the name using db.collection.getIndexes()

Text Queries

MongoDB 2.3.2 introduces the text command to provide query support for text indexes. Unlike normal MongoDB queries, text returns a document rather than a cursor.

text

The text provides an interface to search text context stored in the text index. Consider the following prototype: text:

db.collection.runCommand( “text”, { search: <string>,

filter: <document>,

project: <document>,

limit: <number>,

language: <string> } )

The text command has the following parameters:

Parameters:

  • search (string) –A text string that MongoDB stems and uses to query the text index. In the mongo shell, to specify a phrase to match, you can either:
    • enclose the phrase in escaped double quotes and use double quotes to specify the search string, as in “”coffee table””, or
    • enclose the phrase in double quotes and use single quotes to specify the search string, as in ‘”coffee table”‘
  • filter (document) –Optional. A query document to further limit the results of the query using another database field. You can use any valid MongoDB query in the filter document, except if the index includes an ascending or descending index field as a prefix.If the index includes an ascending or descending index field as a prefix, the filter is required and the filter query must be an equality match.
  • project (document) – Optional. Allows you to limit the fields returned by the query to only those specified.
  • limit (number) –Optional. Specify the maximum number of documents to include in the response. The text sorts the results before applying the limit.The default limit is 100.
  • language (string) – Optional. Specify the language that determines the tokenization, stemming, and the stop words for the search. The default language is english.

Returns:

text returns results, in descending order by score, in the form of a document. Results must fit within the BSON Document Size. Use thelimit and the project parameters to limit the size of the result set.

The implicit connector between the terms of a multi-term search is a disjunction (OR). Search for “first second” searches for “first” or “second”. The scoring system will prefer documents that contain all terms.

However, consider the following behaviors of text queries:

  • With phrases (i.e. terms enclosed in escaped quotes), the search performs an ANDwith any other terms in the search string; e.g. search for “”twinkle twinkle”little star” searches for “twinkle twinkle” and (“little” or “star”).
  • text adds all negations to the query with the logical AND operator.

Example

Consider the following examples of text queries. All examples assume that you have a text index on the field named content in a collection namedcollection.

  1. Create a text index on the content field to enable text search on the field:

db.collection.ensureIndex( { content: “text” } )

  1. Search for a single word coffee:

db.collection.runCommand( “text”, { search: “coffee” } )

This query returns documents that contain the word coffee, case-insensitive, in the content field.

  1. Search for multiple words, bake or coffee or cake:

db.collection.runCommand( “text”, { search: “bake coffee cake” } )

This query returns documents that contain the either bake or coffee or cakein the content field.

  1. Search for the exact phrase bake coffee cake:

db.collection.runCommand( “text”, { search: “”bake coffee cake”” } )

This query returns documents that contain the exact phrase bake coffeecake.

  1. Search for documents that contain the words bake or coffee, but not cake:

db.collection.runCommand( “text”, { search: “bake coffee -cake” } )

Use the – as a prefix to terms to specify negation in the search string. The query returns documents that contain the either bake or coffee, but not cake, all case-insensitive, in the content field. Prefixing a word with a hyphen (-) negates a word:

    • The negated word filters out documents from the result set, after selecting documents.
    • A <search string> that only contains negative words returns no match.
    • A hyphenated word, such as case-insensitive, is not a negation. Thetext command treats the hyphen as a delimiter.
  1. Search for a single word coffee with an additional filter on the about field, but limit the results to 2 documents with the highest score and return only thecomments field in the matching documents:
  1. db.collection.runCommand( “text”, {
  2.                                     search: “coffee”,
  3.                                     filter: { about: /desserts/ },
  4. 10.                                     limit: 2,
  5. 11.                                     project: { comments: 1, _id: 0 }
  6. 12.                                   }

)

  • The filter query document may use any of the available query operators.
  • Because the _id field is implicitly included, in order to return only thecomments field, you must explicitly exclude (0) the _id field. Within theproject document, you cannot mix inclusions (i.e. <fieldA>: 1) and exclusions (i.e. <fieldB>: 0), except for the _id field.

New Modular Authentication System with Support for Kerberos

Note

These features are only present in the MongoDB Subscriber Edition. To download the 2.4.0 release candidate the Subscriber Edition, use the following resources:

An improved authentication system is a core focus of the entire 2.3 cycle, as of 2.3.2, the following components of the new authentication system are available for use in MongoDB:

  • mongod instances can authenticate users via Kerberos.
  • the mongo shell can authenticate to mongod instances using Kerberos.
  • MongoDB Clients can authenticate using Kerberos with the C++ client library and development versions of the Java and C# drivers.

Initial Support for Kerberos Authentication

Development work on this functionality is ongoing, and additional related functionality is forthcoming. To use Kerberos with MongoDB as of the 2.4.0 release candidate, consider the following requirements:

  • add users to MongoDB as with the existing authentication mechanism:
    • Usernames must correspond to the Kerberos principal (e.g. <username>@<REALM> as inmongodbuser@EXAMPLE.COM,)
    • You must have a user document in the system.users collection with the Kerberos principal for any database that you want to grant access.
  • every mongod using Kerberos must have a fully resolvable fully qualified domain name. This includes all members of replica sets.
  • every mongod using Kerberos must have a Kerberos service principal, in the form of:mongodb/<fqdn>@<REALM>.
  • each system running a mongod with Kerberos must have a key tab file that holds key data granting access to it’s principal that the mongod can read.

Starting mongod with Kerberos

To start mongod with Kerberos support (i.e. with the GSSAPI authentication mechanism,) you must have a working Kerberos environment and a valid Kerberos keytab file.

To start mongod, use a command in the following form:

env KRB5_KTNAME=<path to keytab file> <mongod invocation>

You must start mongod with auth or keyFile [1] and configuring the list of in theauthenticationMechanisms parameter. An actual command would resemble:

env KRB5_KTNAME=/opt/etc/mongodb.keytab

/opt/bin/mongod –dbpath /opt/data/db –logpath /opt/log/mongod.log –fork

–auth –setParameter authenticationMechanisms=GSSAPI

Replace the paths as needed for your test deployment.

If you want to enable both Kerberos and the legacy challenge-and-response authentication mechanism, append MONGO-CR to the authenticationMechanisms parameter. Consider the following example:

env KRB5_KTNAME=/opt/etc/mongodb.keytab

/opt/bin/mongod –dbpath /opt/data/db –logpath /opt/log/mongod.log –fork

–auth –setParameter authenticationMechanisms=GSSAPI,MONGO-CR

Note

If you’re having trouble getting mongod to start with Kerberos, there are a number of Kerberos-specific issues that can prevent successful authentication. As you begin troubleshooting your Kerberos deployment, ensure that:

  • You have a valid keytab file specified in the environment running the mongod.
  • The mongod is from the MongoDB Subscriber Edition.
  • DNS allows the mongod to resolve the components of the Kerberos infrastructure.
  • The time systems of the systems running the mongod instances and the Kerberos infrastructure are synchronized.

Until you can successfully authenticate a client using the Kerberos you may want to enableMONGO-CR authentication mechanism to provide access to the mongod instance during configuration.

[1] keyFile implies auth. You must use keyFile for replica sets.

Connecting and Authenticating MongoDB Clients Using Kerberos

To use Kerberos with the mongo shell, begin by initializing a Kerberos session with kinit. Then start a 2.3.2 mongo shell instance, and use the following sequence of operations to associate the current connection with the Kerberos session:

use $external

db.auth( { mechanism: “GSSAPI”, user: “<username>@<REALM>” }  )

The value of the user field must be the same principal that you initialized with kinit. This connection will acquire access in accordance with all privileges granted to this user for all databases.

See

MongoDB Security Practices and Procedures.

Default JavaScript Engine Switched to v8 from SpiderMonkey

The default JavaScript engine used throughout MongoDB, for the mongo shell, mapReduce$where, and eval is now v8.

serverBuildInfo.interpreterVersion

The interpreterVersion field of the document output by db.serverBuildInfo() in the mongo shell reports which JavaScript interpreter the mongod instance is running.

interpreterVersion()

The interpreterVersion() in the mongo shell reports which JavaScript interpreter this mongo shell uses.

New Geospatial Indexes with GeoJSON and Improved Spherical Geometry

Note

In 2.3.2, the index type for Spherical Geospatial Indexes become 2dsphere.

The 2.3 series adds a new type of geospatial index that supports improved spherical queries and GeoJSON. Create the index by specifying 2dsphere as the value of the field in the index specification, as any of the following:

db.collection.ensureIndex( { geo: “2dsphere” } )

db.collection.ensureIndex( { type: 1, geo: “2dsphere” } )

db.collection.ensureIndex( { geo: “2dsphere”, type: 1 } )

In the first example you create a spherical geospatial index on the field named geo, in the second example, you create a compound index where the first field is a normal index, and the index of the second field is a spherical geospatial index. Unlike 2d indexes, fields indexed using the 2dsphere type do not have to be the first field in a compound index.

You must store data in the fields indexed using the 2dsphere index using the GeoJSON specification, at the moment. Support for storing points, in the form used by the existing 2d (i.e. geospatial) indexes is forthcoming. Currently, 2dsphere indexes only support the following GeoJSON shapes:

  • Point, as in the following:

{ “type”: “Point”, “coordinates”: [ 40, 5 ] }

  • LineString, as in the following:

{ “type”: “LineString”, “coordinates”: [ [ 40, 5 ], [ 41, 6 ] ] }

  • Polygon, as in the following:
  • {
  •   “type”: “Polygon”,
  •   “coordinates”: [ [ [ 40, 5 ], [ 40, 6 ], [ 41, 6 ], [ 41, 5 ], [ 40, 5 ] ] ]

}

To query 2dsphere indexes, all current geospatial query operators with an additional $geoIntersectsoperator. Currently, all queries using the 2dsphere index must pass the query selector (e.g. $near,$geoIntersects) a GeoJSON document. With the exception of the GeoJSON requirement, the operation of$near is the same for 2dsphere indexes as 2d indexes.

$geoIntersects

The $geoIntersects selects all indexed points that intersect with the provided geometry. (i.e. Point,LineString, and Polygon.) You must pass $geoIntersects a document in GeoJSON format.

db.collection.find( { $geoIntersects: { $geometry: { “type”: “Point”, “coordinates”: [ 40, 5 ] } } } )

This query will select all indexed objects that intersect with the Point with the coordinates [ 40, 5 ]. MongoDB will return documents as intersecting if they have a shared edge.

The $geometry operator takes a single GeoJSON document.

$geometry

mongod Automatically Continues in Progress Index Builds Following Restart

If your mongod instance was building an index when it shutdown or terminated, mongod will now continue building the index when the mongod restarts. Previously, the index build had to finish building before mongod shutdown.

To disable this behavior the 2.3 series adds a new run time option, noIndexBuildRetry (or via,q –noIndexBuildRetry on the command line,) for mongodnoIndexBuildRetry prevents mongod from continuing rebuilding indexes that did were not finished building when the mongod last shut down.

noIndexBuildRetry

By default, mongod will attempt to rebuild indexes upon start-up if mongod shuts down or stops in the middle of an index build. When enabled, run time option prevents this behavior.

New Hashed Index and Sharding with a Hashed Shard Key

To support an easy to configure and evenly distributed shard key, version 2.3 adds a new “hashed” index type that indexes based on hashed values. This section introduces and documents both the new index type and its use in sharding:

Hashed Index

The new hashed index exists primarily to support automatically hashed shard keys. Consider the following properties of hashed indexes:

  • Hashed indexes must only have a single field, and cannot be compound indexes.
  • Fields indexed with hashed indexes must not hold arrays. Hashed indexes cannot be multikey indexes.
  • Hashed indexes cannot have a unique constraint.You may create hashed indexes with the sparse property.
  • MongoDB can use the hashed index to support equality queries, but cannot use these indexes for range queries.
  • Hashed indexes offer no performance advantage over normal indexes. However, hashed indexes may be smaller than a normal index when the values of the indexed field are larger than 64 bits. [2]
  • it’s possible to have a hashed and non-hashed index on the same field: MongoDB will use the non-hashed for range queries.

Warning

Hashed indexes round floating point numbers to 64-bit integers before hashing. For example, a hashed index would store the same value for a field that held a value of 2.3 and2.2. To prevent collisions do not use a hashed index for floating point numbers that cannot be consistently converted to 64-bit integers (and then back to floating point.) Hashed indexes do not support floating point values larger than 253.

Create a hashed index using an operation that resembles the following:

db.records.ensureIndex( { a: “hashed” } )

This operation creates a hashed index for the records collection on the a field.

[2] The hash stored in the hashed index is 64 bits long.

Hashed Sharding

To shard a collection using a hashed shard key, issue an operation in the mongo shell that resembles the following:

sh.shardCollection( “records.active”, { a: “hashed” } )

This operation shards the active collection in the records database, using a hash of the a field as the shard key. Consider the following properties when using a hashed shard key:

  • As with other kinds of shard key indexes, if your collection has data, you must create the hashed index before sharding. If your collection does not have data, sharding the collection will create the appropriate index.
  • The mongos will route all equality queries to a specific shard or set of shards; however, the mongosmust route range queries to all shards.
  • When using a hashed shard key on a new collection, MongoDB automatically pre-splits the range of 64-bit hash values into chunks. By default, the initial number of chunks is equal to twice the number of shards at creation time. You can change the number of chunks created, using thenumInitialChunks option, as in the following invocation of shardCollection:
  • db.adminCommand( { shardCollection: “test.collection”,
  •                    key: { a: “hashed”},

numInitialChunks: 2001 } )

MongoDB will only pre-split chunks in a collection when sharding empty collections. MongoDB will not create chunk splits in a collection sharding collections that have data.

Warning

Avoid using hashed shard keys when the hashed field has non-integral floating point values, see hashed indexes for more information.

 

MoSQL live-replicating from MongoDB to PostgreSQL

MoSQL, a tool Stripe developed for live-replicating data from a MongoDB database into a PostgreSQL database. With MoSQL, you can run applications against a MongoDB database, but also maintain a live-updated mirror of your data in PostgreSQL, ready for querying with the full power of SQL.

Source:https://stripe.com/blog/announcing-mosql

Motivation

Here at Stripe, we use a number of different database technologies for both internal- and external-facing services. Over time, we’ve found ourselves with growing amounts of data in MongoDB that we would like to be able to analyze using SQL. MongoDB is great for a lot of reasons, but it’s hard to beat SQL for easy ad-hoc data aggregation and analysis, especially since virtually every developer or analyst already knows it.

An obvious solution is to periodically dump your MongoDB database and re-import into PostgreSQL, perhaps using mongoexport. We experimented with this approach, but found ourselves frustrated with the ever-growing time it took to do a full refresh. Even if most of your analyses can tolerate a day or two of delay, occasionally you want to ask ad-hoc questions about “what happened last night?”, and it’s frustrating to have to wait on a huge dump/load refresh to do that. In response, we built MoSQL, enabling us to keep a real-time SQL mirror of our Mongo data.

MoSQL does an initial import of your MongoDB collections into a PostgreSQL database, and then continues running, applying any changes to the MongoDB server in near-real-time to the PostgreSQL mirror. The replication works by tailing the MongoDB oplog, in essentially the same way Mongo’s own replication works.

Usage

MoSQL can be installed like any other gem:

 

$ gem install mosql

 

To use MoSQL, you’ll need to create a collection map which maps your MongoDB objects to a SQL schema. We’ll use the collection from the MongoDB tutorial as an example. A possible collection map for that collection would look like:

mydb:
  things:
    :columns:
      - _id: TEXT
      - x: INTEGER
      - j: INTEGER
    :meta:
     :table: things
     :extra_props: true

Save that file as collections.yaml, start a local mongod and postgres, and run:

 

$ mosql --collections collections.yaml

 

Now, run through the MongoDB tutorial, and then open a psql shell. You’ll find all your Mongo data now available in SQL form:

postgres=# select * from things limit 5;
           _id            | x | j |   _extra_props
--------------------------+---+---+------------------
 50f445b65c46a32ca8c84a5d |   |   | {"name":"mongo"}
 50f445df5c46a32ca8c84a5e | 3 |   | {}
 50f445e75c46a32ca8c84a5f | 4 | 1 | {}
 50f445e75c46a32ca8c84a60 | 4 | 2 | {}
 50f445e75c46a32ca8c84a61 | 4 | 3 | {}
(5 rows)

mosql will continue running, syncing any further changes you make into Postgres.

For more documentation and usage information, see the README.

mongoriver

MoSQL comes from a general philosophy of preferring real-time, continuously-updating solutions to periodic batch jobs.

 

MoSQL is built on top of mongoriver, a general library for MongoDB oplog tailing that we developed. Along with the MoSQL release, we have also released mongoriver as open source today. If you find yourself wanting to write your own MongoDB tailer, to monitor updates to your data in near-realtime, check it out.

MongoDB 2.2.3 has been released

MongoDB 2.2.3 has been released and is available for Downloads: http://www.mongodb.org/downloads

Main changes:

  •  Several fixes related to getLastError on sharded cluster
  • Support for $within without a geo-index
  • Stability and performance enhancements for mongos

Cassandra performance review

 

Original article available here

 

Four years ago, well before starting DataStax, I evaluated the then-current crop of distributed databases and explained why I chose Cassandra. In a lot of ways, Cassandra was the least mature of the options, but I chose to take a long view and wanted to work on a project that got the fundamentals right; things like documentation and distributed testscould come later.

 

2012 saw that validated in a big way, as the most comprehensive NoSQL benchmark to date was published at the VLDB conference by researchers at the University of Toronto. They concluded,

In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear in- creasing throughput from 1 to 12 nodes.

As a sample, here’s the throughput results from the mixed reads, writes, and (sequential) scans:

I encourage you to take a few minutes to skim the full results.

There are both architectural and implentation reasons for Cassandra’s dominating performance here. Let’s get down into the weeds and see what those are.

Architecture

Cassandra incorporates a number of architectural best practices that affect performance. None are unique to Cassandra, but Cassandra is the only NoSQL system that incorporates all of them.

Fully distributed: Every Cassandra machine handles a proportionate share of every activity in the system. There are no special cases like the HDFS namenode or MongoDB mongos that require special treatment or special hardware to avoid becoming a bottleneck. And with every node the same, Cassandra is far simpler to install and operate, which has long-term implications for troubleshooting.

Log-structured storage engine: A log-structured engine that avoids overwrites to turn updates into sequential i/o is essential both on hard disks (HDD) and solid-state disks (SSD). On HDD, because the seek penalty is so high; on SSD, to avoid write amplification and disk failure. This is why you see mongodb performance go through the floor as the dataset size exceeds RAM.

Tight integration with its storage engine: Voldemort and Riak support pluggable storage engines, which both limits them to a lowest-common-denominator of key/value pairs, and limits the optimizations that can be done with the distributed replication engine.

Locally-managed storage: HBase has an integrated, log-structured storage engine, but relies on HDFS for replication instead of managing storage locally. This means HBase is architecturally incapable of supporting Cassandra-style optimizations like putting the commitlog on a separate disk, or mixing SSD and HDD in a single cluster with appropriate data pinned to each.

Implementation

An architecture is only as good as its implementation. For the first years after Cassandra’s open-sourcing as an Apache project, every release was a learning experience. 0.3, 0.4, 0.5, 0.6, each attracted a new wave of users that exposed some previously unimportant weakness. Today, we estimate there are over a thousand production deployments of Cassandra, the most for any scalable database. Some are listed here. To paraphrase ESR, “With enough eyes, all performance problems are obvious.”

What are some implementation details relevant to performance? Let’s have a look at some of the options.

MongoDB

MongoDB can be a great alternative to MySQL, but it’s not really appropriate for the scale-out applications targeted by Cassandra. Still, as early members of the NoSQL category, the two do draw comparisons.

One important limitation in MongoDB is database-level locking. That is, only one writer may modify a given database at a time. Support for collection-level (a set of documents, analogous to a relational table) locking is planned. With either database- or collection-level locking, other writers or readers are locked out. Even a small number of writes can produce stalls in read performance.

Cassandra uses advanced concurrent structures to provide row-level isolation without locking. Cassandra eveneliminated the need for row-level locks for index updates in the recent 1.2 release.

A more subtle MongoDB limitation is that when adding or updating a field in a document, the entire document must be re-written. If you pre-allocate space for each document, you can avoid the associated fragmentation, but even with pre-allocation updating your document gets slower as it grows.

Cassandra’s storage engine only appends updated data, it never has to re-write or re-read existing data. Thus, updates to a Cassandra row or partition stay fast as your dataset grows.

Riak

Riak presents a document-based data model to the end user, but under the hood it maps everything to a key/value storage API. Thus, like MongoDB, updating any field in a document requires rewriting the whole thing.

However, Riak does emphasize the use of log-structured storage engines. Both the default BitCask backend and LevelDB are log-structured. Riak increasingly emphasizes LevelDB since BitCask does not support scan operations (which are required for indexes), but this brings its own set of problems.

LevelDB is a log-structured storage engine with a different approach to compaction than the one introduced by Bigtable. LevelDB trades more compaction i/o for less i/o at read time, which can be a good tradeoff for many workloads, but not all. Cassandra added support for leveldb-style compaction about a year ago.

LevelDB itself is designed to be an embedded database for the likes of Chrome, and clear growing pains are evident when pressed into service as a multi-user backend for Riak. (A LevelDB configuration for Voldemort also exists.) Basho cites “one stall every 2 hours for 10 to 30 seconds”, “cases that can still cause [compaction] infinite loops,” and no way to create snapshots or backups as of the recently released Riak 1.2.

HBase

HBase’s storage engine is the most similar to Cassandra’s; both drew on Bigtable’s design early on.

But despite a later start, Cassandra’s storage engine is far ahead of HBase’s today, in large part because building on HDFS instead of locally-managed storage makes everything harder for HBase. Cassandra added online snapshotsalmost four years ago; HBase still has a long ways to go.

HDFS also makes SSD support problematic for HBase, which is becoming increasingly relevant as SSD price/performance improves. Cassandra has excellent SSD support and even support for mixed SSD and HDD within the same cluster, with data pinned to the medium that makes the most sense for it.

Other differences that may not show up at benchmark time, but you would definitely notice in production:

HBase can’t delete data during minor compactions — you have to rewrite all the data in a region to reclaim disk space. Cassandra has deleted tombstones during minor compactions for over two years.

While you are running that major compaction, HBase gives you no way to throttle it and limit its impact on your application workload. Cassandra introduced this two years ago and continues to improve it. Dealing with local storage also lets Cassandra avoid polluting the page cache with sequential scans from compaction.

Compaction might seem like bookkeeping details, but it does impact the rest of the system. HBase limits you to two or three column families because of compaction and flushing limitations, forcing you to do sub-optimal things to your data model as a workaround.

Cassandra

I honestly think Cassandra is one to two years ahead of the competition, but I’m under no illusions that Cassandra itself is perfect. We have plenty of improvements to make still; from the recently released Cassandra 1.2 to our ticket backlog, there is no shortage of work to do.

Here are some of the areas I’d like to see Cassandra improve this year:

If working on an industry-leading, open-source database doing cutting edge performance work on the JVM sounds interesting to you, please get in touch.

MongoDB 2.3.2 has been released

MongoDB 2.3.2 unstable is now available …. for testing !

Some highlights of this development release include:
New features
- SERVER-380 basic text indexing and search(!)
New operators
- SERVER-8008 $push to sorted array
- SERVER-991 $push to fixed size array
- SERVER-340 $setOnInsert modifier for upserts
- SERVER-7781 $geoNear operator in aggregation framework
- SERVER-6531 $within operator in aggregation framework
Changes to defaults
- SERVER-7779 _secondaryThrottle is now on by default
- SERVER-7769 –objcheck is now on by default
Performance improvements
- SERVER-1752 Improved the performance of simple counts

MongoDB to SQL the Query Translator

Query Translator allow to  Convert MySQL Queries to MongoDB Syntax

http://www.querymongo.com/

 

Example of a MySQL query:

 

SELECT person, SUM(score), AVG(score), MIN(score), MAX(score), COUNT(*)
FROM demo
WHERE score > 0 AND person IN(‘bob’,’jake’)
GROUP BY person;

 

Translated into:

 

db.demo.group({
“key”: {
“person”: true
},
“initial”: {
“sumscore”: 0,
“sumforaveragescore”: 0,
“countforaveragescore”: 0,
“minscore”: 0,
“maxscore”: 0,
“countstar”: 0
},
“reduce”: function(obj, prev) {
prev.sumscore += obj.score;
prev.sumforaveragescore += obj.score;
prev.countforaveragescore++;
prev.minscore = Math.min(prev.minscore, obj.score);
prev.maxscore = Math.max(prev.maxscore, obj.score);
prev.countstar++;
},
“finalize”: function(prev) {
prev.averagescore = prev.sumforaveragescore / prev.countforaveragescore;
delete prev.sumforaveragescore;
delete prev.countforaveragescore;
},
“cond”: {
“score”: {
“$gt”: 0
},
“person”: {
“$in”: [“bob”, “jake”]
}
}
});

Intel Capital and Red Hat funding MongoDB

10gen, the MongoDB company, today announced that Intel Capital, Intel’s global investment and M&A organization, and Red Hat, the world’s leading provider of open source solutions, have made strategic investments in the company. 10gen will use the funds to further invest in product development for MongoDB and to better support its rapidly growing community and user base worldwide. The funding agreement marks Intel Capital’s first investment in the NoSQL market, and for Red Hat, expands the technical and go-to-market collaboration that exists between the two companies.

Official annoucement

http://www.10gen.com/press/10gen-announces-strategic-investment-intel-capital-and-red-hat

NoSQL Benchmark

There is probably no perfect NoSQL database. Every database has its advantages and disadvantages that become more or less important depending on your preferences and the type of tasks your trying to achieve.

Altoros Systems as performed an independent and interesting benchmark to help you sort out the current prons and crons between different solution including: HBase,Cassandra,Riak and MongoDb

http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/tech/2012/102212-nosql-263595.html

What makes this research unique?

Often referred to as NoSQL, non-relational databases feature elasticity and scalability in combination with a capability to store big data and work with cloud computing systems, all of which make them extremely popular. NoSQL data management systems are inherently schema-free (with no obsessive complexity and a flexible data model) and eventually consistent (complying with BASE rather than ACID). They have a simple API, serve huge amounts of data and provide high throughput.

In 2012, the number of NoSQL products reached 120-plus and the figure is still growing. That variety makes it difficult to select the best tool for a particular case. Database vendors usually measure productivity of their products with custom hardware and software settings designed to demonstrate the advantages of their solutions. We wanted to do independent and unbiased research to complement the work done by the folks at Yahoo.

Using Amazon virtual machines to ensure verifiable results and research transparency (which also helped minimize errors due to hardware differences), we have analyzed and evaluated the following NoSQL solutions:

● Cassandra, a column family store
● HBase (column-oriented, too)
● MongoDB, a document-oriented database
● Riak, a key-value store

We also tested MySQL Cluster and sharded MySQL, taking them as benchmarks.

After some of the results had been presented to the public, some observers said MongoDB should not be compared to other NoSQL databases because it is more targeted at working with memory directly. We certainly understand this, but the aim of this investigation is to determine the best use cases for different NoSQL products. Therefore, the databases were tested under the same conditions, regardless of their specifics.

MongoDB 2.2.1 has been released

MongoDB 2.2.1 has been released, this version fixes a few issues in 2.2.0 and is a recommend upgrade for all users.

Changes:

  • Fixed several batched oplog application issues required for db level locking
  • Fixed authentication in mixed version environments

Downloads: http://www.mongodb.org/downloads
Change Log: https://jira.mongodb.org/browse/SERVER/fixforversion/11494