OPTIONS

Text Indexes

New in version 2.4.

MongoDB provides text indexes to support text search of string content in documents of a collection.

text indexes can include any field whose value is a string or an array of string elements. To perform queries that access the text index, use the $text query operator.

Changed in version 2.6: MongoDB enables the text search feature by default. In MongoDB 2.4, you need to enable the text search feature manually to create text indexes and perform text search.

Create Text Index

To create a text index, use the db.collection.ensureIndex() method. To index a field that contains a string or an array of string elements, include the field and specify the string literal "text" in the index document, as in the following example:

db.reviews.ensureIndex( { comments: "text" } )

A collection can have at most one text index.

For examples of creating text indexes on multiple fields, see Create a text Index.

Supported Languages and Stop Words

MongoDB supports text search for various languages. text indexes drop language-specific stop words (e.g. in English, “the”, “an”, “a”, “and”, etc.) and uses simple language-specific suffix stemming. For a list of the supported languages, see Text Search Languages.

If you specify a language value of "none", then the text index uses simple tokenization with no list of stop words and no stemming.

If the index language is English, text indexes are case-insensitive for non-diacritics; i.e. case insensitive for [A-z].

To specify a language for the text index, see Specify a Language for Text Index.

sparse Property

text indexes are sparse by default and ignores the sparse: true option. If a document lacks a text index field (or the field is null or an empty array), MongoDB does not add an entry for the document to the text index. For inserts, MongoDB inserts the document but does not add to the text index.

For a compound index that includes a text index key along with keys of other types, only the text index field determine whether the index references a document. The other keys do not determine whether the index references the documents or not.

Restrictions

Text Search and Hints

You cannot use hint() if the query includes a $text query expression.

Compound Index

A compound index can include a text index key in combination with ascending/descending index keys. However, these compound indexes have the following restrictions:

A compound text index cannot include any other special index types, such as multi-key or geospatial index fields.

If the compound text index includes keys preceding the text index key, to perform a $text search, the query predicate must include equality match conditions on the preceding keys.

See Limit the Number of Entries Scanned.

Drop a Text Index

To drop a text index, pass the name of the index to the db.collection.dropIndex() method. To get the name of the index, run the getIndexes() method.

For information on the default naming scheme for text indexes as well as overriding the default name, see Specify Name for text Index.

Storage Requirements and Performance Costs

text indexes have the following storage requirements and performance costs:

  • text indexes change the space allocation method for all future record allocations in a collection to usePowerOf2Sizes.
  • text indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted.
  • Building a text index is very similar to building a large multi-key index and will take longer than building a simple ordered (scalar) index on the same data.
  • When building a large text index on an existing collection, ensure that you have a sufficiently high limit on open file descriptors. See the recommended settings.
  • text indexes will impact insertion throughput because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document.
  • Additionally, text indexes do not store phrases or information about the proximity of words in the documents. As a result, phrase queries will run much more effectively when the entire collection fits in RAM.