From official website
Text Indexes
Note
The text index type is currently an experimental feature. To use a text index, you need to enable it at run time or startup.
Background
MongoDB 2.3.2 includes a new text index type. text indexes support boolean text search queries:
- Any set of fields containing string data may be text indexed.
- You may only maintain a single text index per collection.
- text indexes are fully consistent and updated in real-time as applications insert, update, or delete documents from the database.
- The text index and query system supports language specific stemming and stop words. Additionally:
- Indexes and queries drop stop words (i.e. “the,” “an,” “a,” “and,” etc.)
- MongoDB stores words stemmed during insertion, using simple suffix stemming, and includes support for a number of languages. MongoDB automatically stems text queries before beginning the query.
However, text indexes have large storage requirements and incur significant performance costs:
- Text indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted.
- Building a text index is very similar to building a large multi-key index, and therefore may take longer than building a simple ordered (scalar) index.
- text indexes will impede insertion throughput, because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document.
- Some text searches may affect performance on your mongod, particularly for negation queries and phrase matches that cannot use the index as effectively as other kinds of queries.
Additionally, the current experimental implementation of text indexes have the following limitations and behaviors:
- text indexes do not store phrases or information about the proximity of words in the documents. As a result, phrase queries will run much more effectively when the entire collection fits in RAM.
- MongoDB does not stem phrases or negations in text queries.
- The index is case-insensitive.
- A collection may only have a single text index at a time.
Warning
Do not enable or use text indexes on production systems.
Test text Indexes
The text index type is an experimental feature and you need to enable the feature before creating or accessing a text index.
To enable text indexes, issue the following command in the mongo shell:
Warning
Do not enable or use text indexes on production systems.
db.adminCommand( { setParameter: 1, textSearchEnabled: true } )
You can also start the mongod with the following invocation:
mongod –setParameter textSearchEnabled=true
Create Text Indexes
To create a text index, use the following syntax of ensureIndex():
db.collection.ensureIndex( { <field>: “text” } )
Consider the following example:
db.collection.ensureIndex( { content: “text” } )
This text index catalogs all string data in the content field where the content field contains a string or an array of string elements. To index fields in sub-documents, you need to specify the individual fields from the sub-documents using the dot notation. A text index can include multiple fields, as in the following:
db.collection.ensureIndex( { content: “text”,
“users.comments”: “text”,
“users.profiles”: “text” } )
The default name for the index consists of the <field name> concatenated with _text for the indexed fields, as in the following:
“content_text_users.comments_text_users.profiles_text”
These indexes may run into the Index Name Length limit. To avoid creating an index with a too-long name, you can specify a name in the options parameter, as in the following:
db.collection.ensureIndex( { content: “text”,
“users.profiles”: “text” },
{ name: “TextIndex” } )
When creating text indexes you may specify weights for specific fields. Weights are factored into the relevant score for each document. The score for a given word in a document is the weighted sum of the frequency for each of the indexed fields in that document. Consider the following:
db.collection.ensureIndex( { content: “text”,
“users.profiles”: “text” },
{ name: “TextIndex”,
weights: { content: 1,
“users.profiles”: 2 } } )
This example creates a text index on the top-level field named content and the profilesfield in the users sub-documents. Furthermore, the content field has a weight of 1 and theusers.profiles field has a weight of 2.
You can add a conventional ascending or descending index field(s) as a prefix or suffix of the index. You cannot include multi-key index field nor geospatial index field.
If you create an ascending or descending index as a prefix of a text index:
- MongoDB will only index documents that have the prefix field (i.e. username) and
- The text query can limit the number of index entries to review in order to perform the query.
- All text queries using this index must include the filter option that specifies an equality condition for the prefix field or fields.
Create this index with the following operation:
db.collection.ensureIndex( { username: 1,
“users.profiles”: “text” } )
Alternatively you create an ascending or descending index as a suffix to a text index. Then thetext index can support covered queries if the text command specifies a project option.
Create this index with the following operation:
db.collection.ensureIndex( { “users.profiles”: “text”,
username: 1 } )
Finally, you may use the special wild card field specifier (i.e. $**) to specify index weights and fields. Consider the following example that indexes any string value in the data of every field of every document in a collection and names it TextIndex:
db.collection.ensureIndex( { “$**”: “text”,
username: 1 },
{ name: “TextIndex” } )
By default, an index field has a weight of 1. You may specify weights for a text index with compound fields, as in the following:
db.collection.ensureIndex( { content: “text”,
“users.profiles”: “text”,
comments: “text”,
keywords: “text”,
about: “text” },
{ name: “TextIndex”,
weights:
{ content: 10,
“user.profiles”: 2,
keywords: 5,
about: 5 } } )
This index, named TextIndex, includes a number of fields, with the following weights:
- content field that has a weight of 10,
- users.profiles that has a weight of 2,
- comments that has a weight of 1,
- keywords that has a weight of 5, and
- about that has a weight of 5.
This means that documents that match words in the content field will appear in the result set more than all other fields in the index, and that the user.profiles and comments fields will be less likely to appear in responses than words from other fields.
Note
You must drop a text index using the name specified when you created the index. Alternatively, if you did not specify a name when creating the index, you can find the name using db.collection.getIndexes()
Text Queries
MongoDB 2.3.2 introduces the text command to provide query support for text indexes. Unlike normal MongoDB queries, text returns a document rather than a cursor.
text
The text provides an interface to search text context stored in the text index. Consider the following prototype: text:
db.collection.runCommand( “text”, { search: <string>,
filter: <document>,
project: <document>,
limit: <number>,
language: <string> } )
The text command has the following parameters:
|
Parameters:
|
- search (string) –A text string that MongoDB stems and uses to query the text index. In the mongo shell, to specify a phrase to match, you can either:
- enclose the phrase in escaped double quotes and use double quotes to specify the search string, as in ”\”coffee table\”", or
- enclose the phrase in double quotes and use single quotes to specify the search string, as in ’”coffee table”‘
- filter (document) –Optional. A query document to further limit the results of the query using another database field. You can use any valid MongoDB query in the filter document, except if the index includes an ascending or descending index field as a prefix.If the index includes an ascending or descending index field as a prefix, the filter is required and the filter query must be an equality match.
- project (document) – Optional. Allows you to limit the fields returned by the query to only those specified.
- limit (number) –Optional. Specify the maximum number of documents to include in the response. The text sorts the results before applying the limit.The default limit is 100.
- language (string) – Optional. Specify the language that determines the tokenization, stemming, and the stop words for the search. The default language is english.
|
|
Returns:
|
text returns results, in descending order by score, in the form of a document. Results must fit within the BSON Document Size. Use thelimit and the project parameters to limit the size of the result set. |
The implicit connector between the terms of a multi-term search is a disjunction (OR). Search for ”first second” searches for ”first” or ”second”. The scoring system will prefer documents that contain all terms.
However, consider the following behaviors of text queries:
- With phrases (i.e. terms enclosed in escaped quotes), the search performs an ANDwith any other terms in the search string; e.g. search for ”\”twinkle twinkle\”little star” searches for ”twinkle twinkle” and (“little” or ”star”).
- text adds all negations to the query with the logical AND operator.
Example
Consider the following examples of text queries. All examples assume that you have a text index on the field named content in a collection namedcollection.
- Create a text index on the content field to enable text search on the field:
db.collection.ensureIndex( { content: “text” } )
- Search for a single word coffee:
db.collection.runCommand( “text”, { search: “coffee” } )
This query returns documents that contain the word coffee, case-insensitive, in the content field.
- Search for multiple words, bake or coffee or cake:
db.collection.runCommand( “text”, { search: “bake coffee cake” } )
This query returns documents that contain the either bake or coffee or cakein the content field.
- Search for the exact phrase bake coffee cake:
db.collection.runCommand( “text”, { search: “\”bake coffee cake\”" } )
This query returns documents that contain the exact phrase bake coffeecake.
- Search for documents that contain the words bake or coffee, but not cake:
db.collection.runCommand( “text”, { search: “bake coffee -cake” } )
Use the - as a prefix to terms to specify negation in the search string. The query returns documents that contain the either bake or coffee, but not cake, all case-insensitive, in the content field. Prefixing a word with a hyphen (-) negates a word:
- The negated word filters out documents from the result set, after selecting documents.
- A <search string> that only contains negative words returns no match.
- A hyphenated word, such as case-insensitive, is not a negation. Thetext command treats the hyphen as a delimiter.
- Search for a single word coffee with an additional filter on the about field, but limit the results to 2 documents with the highest score and return only thecomments field in the matching documents:
- db.collection.runCommand( “text”, {
- search: “coffee”,
- filter: { about: /desserts/ },
- 10. limit: 2,
- 11. project: { comments: 1, _id: 0 }
- 12. }
)
- The filter query document may use any of the available query operators.
- Because the _id field is implicitly included, in order to return only thecomments field, you must explicitly exclude (0) the _id field. Within theproject document, you cannot mix inclusions (i.e. <fieldA>: 1) and exclusions (i.e. <fieldB>: 0), except for the _id field.
New Modular Authentication System with Support for Kerberos
Note
These features are only present in the MongoDB Subscriber Edition. To download the 2.4.0 release candidate the Subscriber Edition, use the following resources:
An improved authentication system is a core focus of the entire 2.3 cycle, as of 2.3.2, the following components of the new authentication system are available for use in MongoDB:
- mongod instances can authenticate users via Kerberos.
- the mongo shell can authenticate to mongod instances using Kerberos.
- MongoDB Clients can authenticate using Kerberos with the C++ client library and development versions of the Java and C# drivers.
Initial Support for Kerberos Authentication
Development work on this functionality is ongoing, and additional related functionality is forthcoming. To use Kerberos with MongoDB as of the 2.4.0 release candidate, consider the following requirements:
- add users to MongoDB as with the existing authentication mechanism:
- Usernames must correspond to the Kerberos principal (e.g. <username>@<REALM> as inmongodbuser@EXAMPLE.COM,)
- You must have a user document in the system.users collection with the Kerberos principal for any database that you want to grant access.
- every mongod using Kerberos must have a fully resolvable fully qualified domain name. This includes all members of replica sets.
- every mongod using Kerberos must have a Kerberos service principal, in the form of:mongodb/<fqdn>@<REALM>.
- each system running a mongod with Kerberos must have a key tab file that holds key data granting access to it’s principal that the mongod can read.
Starting mongod with Kerberos
To start mongod with Kerberos support (i.e. with the GSSAPI authentication mechanism,) you must have a working Kerberos environment and a valid Kerberos keytab file.
To start mongod, use a command in the following form:
env KRB5_KTNAME=<path to keytab file> <mongod invocation>
You must start mongod with auth or keyFile [1] and configuring the list of in theauthenticationMechanisms parameter. An actual command would resemble:
env KRB5_KTNAME=/opt/etc/mongodb.keytab \
/opt/bin/mongod –dbpath /opt/data/db –logpath /opt/log/mongod.log –fork \
–auth –setParameter authenticationMechanisms=GSSAPI
Replace the paths as needed for your test deployment.
If you want to enable both Kerberos and the legacy challenge-and-response authentication mechanism, append MONGO-CR to the authenticationMechanisms parameter. Consider the following example:
env KRB5_KTNAME=/opt/etc/mongodb.keytab \
/opt/bin/mongod –dbpath /opt/data/db –logpath /opt/log/mongod.log –fork \
–auth –setParameter authenticationMechanisms=GSSAPI,MONGO-CR
Note
If you’re having trouble getting mongod to start with Kerberos, there are a number of Kerberos-specific issues that can prevent successful authentication. As you begin troubleshooting your Kerberos deployment, ensure that:
- You have a valid keytab file specified in the environment running the mongod.
- The mongod is from the MongoDB Subscriber Edition.
- DNS allows the mongod to resolve the components of the Kerberos infrastructure.
- The time systems of the systems running the mongod instances and the Kerberos infrastructure are synchronized.
Until you can successfully authenticate a client using the Kerberos you may want to enableMONGO-CR authentication mechanism to provide access to the mongod instance during configuration.
Connecting and Authenticating MongoDB Clients Using Kerberos
To use Kerberos with the mongo shell, begin by initializing a Kerberos session with kinit. Then start a 2.3.2 mongo shell instance, and use the following sequence of operations to associate the current connection with the Kerberos session:
use $external
db.auth( { mechanism: “GSSAPI”, user: “<username>@<REALM>” } )
The value of the user field must be the same principal that you initialized with kinit. This connection will acquire access in accordance with all privileges granted to this user for all databases.
See
MongoDB Security Practices and Procedures.
Default JavaScript Engine Switched to v8 from SpiderMonkey
The default JavaScript engine used throughout MongoDB, for the mongo shell, mapReduce, $where, and eval is now v8.
serverBuildInfo.interpreterVersion
The interpreterVersion field of the document output by db.serverBuildInfo() in the mongo shell reports which JavaScript interpreter the mongod instance is running.
interpreterVersion()
The interpreterVersion() in the mongo shell reports which JavaScript interpreter this mongo shell uses.
New Geospatial Indexes with GeoJSON and Improved Spherical Geometry
Note
In 2.3.2, the index type for Spherical Geospatial Indexes become 2dsphere.
The 2.3 series adds a new type of geospatial index that supports improved spherical queries and GeoJSON. Create the index by specifying 2dsphere as the value of the field in the index specification, as any of the following:
db.collection.ensureIndex( { geo: “2dsphere” } )
db.collection.ensureIndex( { type: 1, geo: “2dsphere” } )
db.collection.ensureIndex( { geo: “2dsphere”, type: 1 } )
In the first example you create a spherical geospatial index on the field named geo, in the second example, you create a compound index where the first field is a normal index, and the index of the second field is a spherical geospatial index. Unlike 2d indexes, fields indexed using the 2dsphere type do not have to be the first field in a compound index.
You must store data in the fields indexed using the 2dsphere index using the GeoJSON specification, at the moment. Support for storing points, in the form used by the existing 2d (i.e. geospatial) indexes is forthcoming. Currently, 2dsphere indexes only support the following GeoJSON shapes:
- Point, as in the following:
{ “type”: “Point”, “coordinates”: [ 40, 5 ] }
- LineString, as in the following:
{ “type”: “LineString”, “coordinates”: [ [ 40, 5 ], [ 41, 6 ] ] }
- Polygon, as in the following:
- {
- ”type”: “Polygon”,
- ”coordinates”: [ [ [ 40, 5 ], [ 40, 6 ], [ 41, 6 ], [ 41, 5 ], [ 40, 5 ] ] ]
}
To query 2dsphere indexes, all current geospatial query operators with an additional $geoIntersectsoperator. Currently, all queries using the 2dsphere index must pass the query selector (e.g. $near,$geoIntersects) a GeoJSON document. With the exception of the GeoJSON requirement, the operation of$near is the same for 2dsphere indexes as 2d indexes.
$geoIntersects
The $geoIntersects selects all indexed points that intersect with the provided geometry. (i.e. Point,LineString, and Polygon.) You must pass $geoIntersects a document in GeoJSON format.
db.collection.find( { $geoIntersects: { $geometry: { “type”: “Point”, “coordinates”: [ 40, 5 ] } } } )
This query will select all indexed objects that intersect with the Point with the coordinates [ 40, 5 ]. MongoDB will return documents as intersecting if they have a shared edge.
The $geometry operator takes a single GeoJSON document.
$geometry
mongod Automatically Continues in Progress Index Builds Following Restart
If your mongod instance was building an index when it shutdown or terminated, mongod will now continue building the index when the mongod restarts. Previously, the index build had to finish building before mongod shutdown.
To disable this behavior the 2.3 series adds a new run time option, noIndexBuildRetry (or via,q –noIndexBuildRetry on the command line,) for mongod. noIndexBuildRetry prevents mongod from continuing rebuilding indexes that did were not finished building when the mongod last shut down.
noIndexBuildRetry
By default, mongod will attempt to rebuild indexes upon start-up if mongod shuts down or stops in the middle of an index build. When enabled, run time option prevents this behavior.
New Hashed Index and Sharding with a Hashed Shard Key
To support an easy to configure and evenly distributed shard key, version 2.3 adds a new “hashed” index type that indexes based on hashed values. This section introduces and documents both the new index type and its use in sharding:
Hashed Index
The new hashed index exists primarily to support automatically hashed shard keys. Consider the following properties of hashed indexes:
- Hashed indexes must only have a single field, and cannot be compound indexes.
- Fields indexed with hashed indexes must not hold arrays. Hashed indexes cannot be multikey indexes.
- Hashed indexes cannot have a unique constraint.You may create hashed indexes with the sparse property.
- MongoDB can use the hashed index to support equality queries, but cannot use these indexes for range queries.
- Hashed indexes offer no performance advantage over normal indexes. However, hashed indexes may be smaller than a normal index when the values of the indexed field are larger than 64 bits. [2]
- it’s possible to have a hashed and non-hashed index on the same field: MongoDB will use the non-hashed for range queries.
Warning
Hashed indexes round floating point numbers to 64-bit integers before hashing. For example, a hashed index would store the same value for a field that held a value of 2.3 and2.2. To prevent collisions do not use a hashed index for floating point numbers that cannot be consistently converted to 64-bit integers (and then back to floating point.) Hashed indexes do not support floating point values larger than 253.
Create a hashed index using an operation that resembles the following:
db.records.ensureIndex( { a: “hashed” } )
This operation creates a hashed index for the records collection on the a field.
| [2] |
The hash stored in the hashed index is 64 bits long. |
Hashed Sharding
To shard a collection using a hashed shard key, issue an operation in the mongo shell that resembles the following:
sh.shardCollection( “records.active”, { a: “hashed” } )
This operation shards the active collection in the records database, using a hash of the a field as the shard key. Consider the following properties when using a hashed shard key:
- As with other kinds of shard key indexes, if your collection has data, you must create the hashed index before sharding. If your collection does not have data, sharding the collection will create the appropriate index.
- The mongos will route all equality queries to a specific shard or set of shards; however, the mongosmust route range queries to all shards.
- When using a hashed shard key on a new collection, MongoDB automatically pre-splits the range of 64-bit hash values into chunks. By default, the initial number of chunks is equal to twice the number of shards at creation time. You can change the number of chunks created, using thenumInitialChunks option, as in the following invocation of shardCollection:
- db.adminCommand( { shardCollection: “test.collection”,
- key: { a: “hashed”},
numInitialChunks: 2001 } )
MongoDB will only pre-split chunks in a collection when sharding empty collections. MongoDB will not create chunk splits in a collection sharding collections that have data.
Warning
Avoid using hashed shard keys when the hashed field has non-integral floating point values, see hashed indexes for more information.