- A special virtual collection that exposes MongoDB’s
To use database commands, see Issue Commands.
A field required in every MongoDB document. The _id
field must have a unique value. You can
think of the _id field as the document’s primary key.
If you create a new document without an _id field, MongoDB
automatically creates the field and assigns a unique
- An expression in the aggregation framework that
maintains state between documents in the aggregation
pipeline. For a list of accumulator operations, see
- admin database
- A privileged database. Users
must have access to the admin database to run certain
administrative commands. For a list of administrative commands,
see Instance Administration Commands.
- Any of a variety of operations that reduces and summarizes large
sets of data. MongoDB’s aggregate() and
mapReduce() methods are two
examples of aggregation operations. For more information, see
- aggregation framework
- The set of MongoDB operators that let you calculate aggregate
values without having to use map-reduce. For a list of
operators, see Aggregation Reference.
- A member of a replica set that exists solely to vote in
elections. Arbiters do not replicate data. See
Replica Set Arbiter.
- A data structure commonly used by database management systems to
store indexes. MongoDB uses B-trees for its indexes.
- An internal MongoDB process that runs in the context of a
sharded cluster and manages the migration of chunks. Administrators must disable the balancer for all
maintenance operations on a sharded cluster. See
Sharded Collection Balancing.
A serialization format used to store documents and make remote
procedure calls in MongoDB. “BSON” is a portmanteau of the words
“binary” and “JSON”. Think of BSON as a binary representation
- BSON types
- The set of types supported by the BSON serialization
format. For a list of BSON types, see BSON Types.
- CAP Theorem
- Given three properties of computing systems, consistency,
availability, and partition tolerance, a distributed computing
system can provide any two of these features, but never all
- capped collection
- A fixed-sized collection that automatically
overwrites its oldest entries when it reaches its maximum size.
The MongoDB oplog that is used in replication is a
capped collection. See Capped Collections.
- A calculated value used to ensure data integrity.
The md5 algorithm is sometimes used as a checksum.
- A contiguous range of shard key values within a particular
shard. Chunk ranges are inclusive of the lower boundary
and exclusive of the upper boundary. MongoDB splits chunks when
they grow beyond the configured chunk size, which by default is
64 megabytes. MongoDB migrates chunks when a shard contains too
many chunks of a collection relative to other shards. See
Data Partitioning and Sharding Mechanics.
- The application layer that uses a database for data persistence
and storage. Drivers provide the interface
level between the application layer and the database server.
- See sharded cluster.
- A grouping of MongoDB documents. A collection
is the equivalent of an RDBMS table. A collection exists
within a single database. Collections do not enforce a
schema. Documents within a collection can have different fields.
Typically, all documents in a collection have a similar or related
purpose. See What is a namespace in MongoDB?.
- compound index
- An index consisting of two or more keys. See
- config database
- An internal database that holds the metadata associated with a
sharded cluster. Applications and administrators should
not modify the config database in the course of normal
operation. See Config Database.
- config server
- A mongod instance that stores all the metadata
associated with a sharded cluster. A production sharded
cluster requires three config servers, each on a separate machine.
See Config Servers.
- control script
- A simple shell script, typically located in the /etc/rc.d or
/etc/init.d directory, and used by the system’s initialization
process to start, restart or stop a daemon process.
- An acronym for the fundamental operations of a database: Create,
Read, Update, and Delete. See MongoDB CRUD Operations.
- A text-based data format consisting of comma-separated values.
This format is commonly used to exchange data between relational
databases since the format is well-suited to tabular data. You can
import CSV files using mongoimport.
- A pointer to the result set of a query. Clients can
iterate through a cursor to retrieve results. By default, cursors
timeout after 10 minutes of inactivity. See
- The conventional name for a background, non-interactive
- data-center awareness
- A property that allows clients to address members in a system
based on their locations. Replica sets
implement data-center awareness using tagging. See
Data Center Awareness.
- A physical container for collections.
Each database gets its own set of files on the file
system. A single MongoDB server typically has multiple
- database command
- A MongoDB operation, other than an insert, update, remove, or
query. For a list of database commands, see
Database Commands. To use database commands, see
- database profiler
- A tool that, when enabled, keeps a record on all long-running
operations in a database’s system.profile collection. The
profiler is most often used to diagnose slow queries. See
- A set of values used to define measurements on the earth. MongoDB
uses the WGS84 datum in certain geospatial
calculations. See Geospatial Indexes and Queries.
- The location of MongoDB’s data file storage. See
- delayed member
- A replica set member that cannot become primary and
applies operations at a specified delay. The delay is useful for
protecting data from human error (i.e. unintentionally deleted
databases) or updates that have unforeseen effects on the
production database. See Delayed Replica Set Members.
- diagnostic log
- A verbose log of operations stored in the dbpath.
- A record in a MongoDB collection and the basic unit of
data in MongoDB. Documents are analogous to JSON objects
but exist in the database in a more type-rich format known as
BSON. See Documents.
- dot notation
- MongoDB uses the dot notation to access the elements of an array
and to access the fields of a subdocument. See
- The process of removing or “shedding” chunks from
one shard to another. Administrators must drain shards
before removing them from the cluster. See
Remove Shards from an Existing Sharded Cluster.
- A client library for interacting with MongoDB in a particular
language. See MongoDB Drivers and Client Libraries.
- The process by which members of a replica set select a
primary on startup and in the event of a failure. See
Replica Set Elections.
- eventual consistency
- A property of a distributed system that allows changes to the
system to propagate gradually. In a database system, this means
that readable members are not required to reflect the latest
writes at all times. In MongoDB, reads to a primary have
strict consistency; reads to secondaries have eventual
- In the context of aggregation framework, expressions are
the stateless transformations that operate on the data that passes
through a pipeline. See Aggregation Concepts.
- The process that allows a secondary member of a
replica set to become primary in the event of a
failure. See Replica Set High Availability.
A name-value pair in a document. A document has
zero or more fields. Fields are analogous to columns in relational
- A system level networking filter that restricts access based on,
among other things, IP address. Firewalls form a part of an
effective network security strategy. See
- A system call that flushes all dirty, in-memory pages to
disk. MongoDB calls fsync() on its database files at least
every 60 seconds. See fsync.
- A geohash value is a binary representation of the location on a
coordinate grid. See Calculation of Geohash Values for 2d Indexes.
Object Notation (JSON). GeoJSON is used in
geospatial queries. For
supported GeoJSON objects, see Location Data.
For the GeoJSON format specification, see
- Data that relates to geographical location. In MongoDB, you may
store, index, and query data according to geographical parameters.
See Geospatial Indexes and Queries.
A convention for storing large files in a MongoDB database. All of
the official MongoDB drivers support this convention, as does the
mongofiles program. See GridFS.
- hashed shard key
- A special type of shard key that uses a hash of the value
in the shard key field to distribute documents among members of
the sharded cluster. See Hashed Index.
- haystack index
- A geospatial index that enhances searches by creating
“buckets” of objects grouped by a second criterion. See
- A replica set member that cannot become primary
and are invisible to client applications. See
Hidden Replica Set Members.
- The quality of an operation to produce the same result given the
same input, whether run once or run multiple times.
- A data structure that optimizes queries. See Index Concepts.
- initial sync
- The replica set operation that replicates data from an
existing replica set member to a new or restored replica set
member. See Initial Sync.
- A revision to the IP (Internet Protocol) standard that
provides a significantly larger address space to more effectively
support the number of hosts on the contemporary Internet.
- The international date format used by mongo
to display dates. The format is: YYYY-MM-DD HH:MM.SS.millis.
- A popular scripting language originally designed for web
browsers. The MongoDB shell and certain server-side functions use
- A sequential, binary transaction log used to bring the database
into a valid state in the event of a hard shutdown.
Journaling writes data first to the journal and then to the core
data files. MongoDB enables journaling by default for 64-bit
builds of MongoDB version 2.0 and newer. Journal files are
pre-allocated and exist as files in the data directory. See
for expressing structured data with support in many programming
languages. For more information, see http://www.json.org.
Certain MongoDB tools render an approximation of MongoDB
BSON documents in JSON format. See
MongoDB Extended JSON.
- JSON document
- A JSON document is a collection of fields and values in a
structured format. For sample JSON documents, see
- JSON with Padding. Refers to a method of injecting JSON
into applications. Presents potential security concerns.
- legacy coordinate pairs
- The format used for geospatial data prior to MongoDB
version 2.4. This format stores geospatial data as points on a
planar coordinate system (e.g. [ x, y ]). See
Geospatial Indexes and Queries.
- A LineString is defined by an array of two or more positions. A
closed LineString with four or more positions is called a
LinearRing, as described in the GeoJSON LineString specification:
http://geojson.org/geojson-spec.html#linestring. To use a
LineString in MongoDB, see
Store GeoJSON Objects.
- Logical volume manager. LVM is a program that abstracts disk
images from physical devices and provides a number of raw disk
manipulation and snapshot capabilities useful for system
management. For information on LVM and MongoDB, see
Backup and Restore Using LVM on a Linux System.
- A data processing and aggregation paradigm consisting of a “map”
phase that selects data and a “reduce” phase that transforms the
data. In MongoDB, you can run arbitrary aggregations over data
using map-reduce. For map-reduce implementation, see
Map-Reduce. For all approaches to aggregation,
see Aggregation Concepts.
- mapping type
- A Structure in programming languages that associate keys with
values, where keys may nest other pairs of keys and values
(e.g. dictionaries, hashes, maps, and associative arrays).
The properties of these structures depend on the language
specification and implementation. Generally the order of keys in
mapping types is arbitrary and not guaranteed.
- The database that receives all writes in a conventional
master-slave replication. In MongoDB, replica
sets replace master-slave replication for most use
cases. For more information on master-slave replication, see
Master Slave Replication.
- A hashing algorithm used to efficiently provide
reproducible unique strings to identify and checksum
data. MongoDB uses md5 to identify chunks of data for
GridFS. See filemd5.
- Multipurpose Internet Mail Extensions. A standard set of type and
encoding definitions used to declare the encoding and type of data
in multiple data storage, transmission, and email contexts. The
mongofiles tool provides an option to specify a MIME
type to describe a file inserted into GridFS storage.
- The MongoDB shell. The mongo process starts the MongoDB
shell as a daemon connected to either a mongod or
See mongo and mongo Shell Methods.
- The MongoDB database server. The mongod process starts
the MongoDB server as a daemon. The MongoDB server manages data
requests and formats and manages background operations. See
- An open-source document-based database system. “MongoDB” derives
from the word “humongous” because of the database’s ability to
scale up with ease and hold very large amounts of data. MongoDB
stores documents in collections within databases.
- The routing and load balancing process that acts an interface
between an application and a MongoDB sharded cluster. See
- The canonical name for a collection or index in MongoDB.
The namespace is a combination of the database name and
the name of the collection or index, like so:
[database-name].[collection-or-index-name]. All documents
belong to a namespace. See What is a namespace in MongoDB?.
- natural order
- The order that a database stores documents on disk. Typically,
the order of documents on disks reflects insertion order, except
when a document moves internally because an update operation
increases its size. In capped collections, documents do not move internally, and therefore
insertion order and natural order are identical in capped
collections. MongoDB returns documents in forward natural order
for a find() query with no parameters.
MongoDB returns documents in reverse natural order for a
find() query sorted with a parameter of $natural:-1. See
- A special 12-byte BSON type that guarantees uniqueness
within the collection. The ObjectID is generated based on
timestamp, machine ID, process ID, and a process-local incremental
counter. MongoDB uses ObjectId values as the default values for
- A keyword beginning with a $ used to express an update,
complex query, or data transformation. For example, $gt is the
query language’s “greater than” operator. For available operators,
- A capped collection that stores an ordered history of
logical writes to a MongoDB database. The oplog is the
basic mechanism enabling replication in MongoDB.
See Replica Set Oplog.
- ordered query plan
- A query plan that returns results in the order consistent with the
sort() order. See
- The extra space allocated to document on the disk to prevent
moving a document when it grows as the result of
operations. See Padding Factor.
- padding factor
- An automatically-calibrated constant used to determine how much
extra space MongoDB should allocate per document container on disk.
A padding factor of 1 means that MongoDB will allocate only the
amount of space needed for the document. A padding factor of 2
means that MongoDB will allocate twice the amount of space
required by the document. See
- page fault
- The event that occurs when a process requests stored data
(i.e. a page) from memory that the operating system has moved to
disk. See What are page faults?.
- A distributed system architecture that splits data into ranges.
Sharding uses partitioning. See
- passive member
- A member of a replica set that cannot become primary
because its priority is
0. See Priority 0 Replica Set Members.
- A packet-capture format used by mongosniff to record
packets captured from network interfaces and display them as
human-readable MongoDB operations. See Options.
- A process identifier. UNIX-like systems assign a unique-integer
PID to each running process. You can use a PID to inspect a
running process and send signals to it. See
/proc File System.
- A communication channel in UNIX-like systems allowing independent
processes to send and receive data. In the UNIX shell, piped
operations allow users to direct the output of one command into
the input of another.
A series of operations in an aggregation process.
See Aggregation Concepts.
- A single coordinate pair as described in the GeoJSON Point
specification: http://geojson.org/geojson-spec.html#point. To
use a Point in MongoDB, see
Store GeoJSON Objects.
An array of LinearRing coordinate arrays, as
described in the GeoJSON Polygon specification:
http://geojson.org/geojson-spec.html#polygon. For Polygons
with multiple rings, the first must be the exterior ring and
any others must be interior rings or holes.
MongoDB does not permit the exterior ring to self-intersect.
Interior rings must be fully contained within the outer loop and
cannot intersect or overlap with each other. See
Store GeoJSON Objects.
- A per-collection setting that changes and normalizes the way
MongoDB allocates space for each document, in an effort to
maximize storage reuse and to reduce fragmentation. This is the
default for TTL Collections. See
- An operation performed before inserting data that divides the
range of possible shard key values into chunks to facilitate easy
insertion and high write throughput. In some cases pre-splitting
expedites the initial distribution of documents in sharded
cluster by manually dividing the collection rather than waiting
for the MongoDB balancer to do so. See
Create Chunks in a Sharded Cluster.
- In a replica set, the primary member is the current
master instance, which receives all write operations.
- primary key
- A record’s unique immutable identifier. In an RDBMS, the primary
key is typically an integer stored in each row’s id field.
In MongoDB, the _id field holds a document’s primary
key which is usually a BSON ObjectId.
- primary shard
- The shard that holds all the un-sharded collections. See
- A configurable value that helps determine which members in
a replica set are most likely to become primary.
- A document given to a query that specifies which fields
MongoDB returns in the result set. See Limit Fields to Return from a Query. For a
list of projection operators, see
- A read request. MongoDB uses a JSON-like query language
that includes a variety of query operators with
names that begin with a $ character. In the mongo
shell, you can issue queries using the
findOne() methods. See
- query optimizer
- A process that generates query plans. For each query, the
optimizer generates a plan that matches the query to the index
that will return results as efficiently as possible. The
optimizer reuses the query plan each time the query runs. If a
collection changes significantly, the optimizer creates a new
query plan. See Query Plans.
- Relational Database Management System. A database management
system based on the relational model, typically using
SQL as the query language.
- read lock
- In the context of a reader-writer lock, a lock that while held
allows concurrent readers but no writers. See
What type of locking does MongoDB use?.
- read preference
- A setting that determines how clients direct read operations. Read
preference affects all replica sets, including shards. By default,
MongoDB directs reads to primaries for
strict consistency. However, you may also direct reads to
secondaries for eventually consistent reads. See Read Preference.
- record size
- The space allocated for a document including the padding. For more
information on padding, see Padding Factor
- A replica set member status indicating that a member
is not ready to begin normal activities of a secondary or primary.
Recovering members are unavailable for reads.
- replica pairs
The precursor to the MongoDB replica sets.
Deprecated since version 1.6.
- replica set
- A cluster of MongoDB servers that implements master-slave
replication and automated failover. MongoDB’s recommended
replication strategy. See Replication.
- A feature allowing multiple database servers to share the same
data, thereby ensuring redundancy and facilitating load balancing.
- replication lag
- The length of time between the last operation in the
primary’s oplog and the last operation
applied to a particular secondary. In general, you want to
keep replication lag as small as possible. See Replication
- resident memory
- The subset of an application’s memory currently stored in
physical RAM. Resident memory is a subset of virtual memory,
which includes memory mapped to physical RAM and to disk.
- An API design pattern centered around the idea of resources and the
CRUD operations that apply to them. Typically REST is
implemented over HTTP. MongoDB provides a simple HTTP REST
interface that allows HTTP clients to run commands against the
server. See REST Interface and REST API.
- A process that reverts writes operations to ensure the consistency
of all replica set members. See Rollbacks During Replica Set Failover.
- A replica set member that replicates the contents of the
master database. Secondary members may handle read requests, but
only the primary members can handle write operations. See
- secondary index
- A database index that improves query performance by
minimizing the amount of work that the query engine must perform
to fulfill a query. See Indexes.
- set name
- The arbitrary name given to a replica set. All members of a
replica set must have the same name specified with the
replSet setting or the --replSet option.
- A single mongod instance or replica set that
stores some portion of a sharded cluster’s total data set. In production, all shards should be
replica sets. See Shards.
- shard key
- The field MongoDB uses to distribute documents among members of a
sharded cluster. See Shard Keys.
- sharded cluster
- The set of nodes comprising a sharded MongoDB
deployment. A sharded cluster consists of three config processes,
one or more replica sets, and one or more mongos
routing processes. See Sharded Cluster Components.
- A database architecture that partitions data by key ranges and
distributes the data among two or more database instances.
Sharding enables horizontal scaling. See Sharding.
- shell helper
- A method in the mongo shell that provides a more concise
syntax for a database command. Shell helpers
improve the general interactive experience. See
mongo Shell Methods.
- single-master replication
- A replication topology where only a single database
instance accepts writes. Single-master replication ensures
consistency and is the replication topology employed by MongoDB.
See Replica Set Primary.
- A read-only database that replicates operations from a
master database in conventional master/slave replication.
In MongoDB, replica sets replace
master/slave replication for most use cases. However, for
information on master/slave replication, see
Master Slave Replication.
- The division between chunks in a sharded
cluster. See Chunk Splits in a Sharded Cluster.
- Structured Query Language (SQL) is a common special-purpose
programming language used for interaction with a relational
database, including access control, insertions,
updates, queries, and deletions. There are some similar
elements in the basic SQL syntax supported by different database
vendors, but most implementations have their own dialects, data
types, and interpretations of proposed SQL standards. Complex
SQL is generally not directly portable between major
RDBMS products. SQL is often used as
metonym for relational databases.
- Solid State Disk. A high-performance disk drive that uses solid
state electronics for persistence, as opposed to the rotating platters
and movable read/write heads used by traditional mechanical hard drives.
- An instance of mongod that is running as a single
server and not as part of a replica set. To convert a
standalone into a replica set, see
Convert a Standalone to a Replica Set.
- strict consistency
- A property of a distributed system requiring that all members
always reflect the latest changes to the system. In a database
system, this means that any system that can provide data must
reflect the latest writes at all times. In MongoDB, reads from a
primary have strict consistency; reads from secondary
members have eventual consistency.
- The replica set operation where members replicate data
from the primary. Sync first occurs when MongoDB creates
or restores a member, which is called initial sync. Sync
then occurs continually to keep the member updated with changes to
the replica set’s data. See Replica Set Data Synchronization.
- On UNIX-like systems, a logging process that provides a uniform
standard for servers and processes to submit logging information.
MongoDB provides an option to send output to the host’s syslog
system. See syslog.
- A label applied to a replica set member or shard and used by
clients to issue data-center-aware operations. For more information
on using tags with replica sets and with shards, see the following
sections of this manual: Tag Sets
and Behavior and Operations.
- A text-based data format consisting of tab-separated values.
This format is commonly used to exchange data between relational
databases, since the format is well-suited to tabular data. You can
import TSV files using mongoimport.
- Stands for “time to live” and represents an expiration time or
period for a given piece of information to remain in a cache or
other temporary storage before the system deletes it or ages it
out. MongoDB has a TTL collection feature. See
Expire Data from Collections by Setting TTL.
- unique index
- An index that enforces uniqueness for a particular field across
a single collection. See Unique Indexes.
- unordered query plan
- A query plan that returns results in an order inconsistent with the
See Query Plans.
- An operation that will either update the first document matched by
a query or insert a new document if none matches. The new document
will have the fields implied by the operation. You perform upserts
with the update() operation. See
- virtual memory
- An application’s working memory, typically residing on both
disk an in physical RAM.
- The default datum MongoDB uses to calculate geometry over
an Earth-like sphere. MongoDB uses the WGS84 datum for
geospatial queries on GeoJSON objects. See
the “EPSG:4326: WGS 84” specification:
- working set
- The data that MongoDB uses most often. This data is preferably
held in RAM, solid-state drive (SSD), or other fast media. See
What is the working set?.
- write concern
- Specifies whether a write operation has succeeded. Write concern
allows your application to detect insertion errors or unavailable
mongod instances. For replica sets, you can configure write concern to confirm replication to a
specified number of members. See Write Concern.
- write lock
- A lock on the database for a given writer. When a process writes
to the database, it takes an exclusive write lock to prevent other
processes from writing or reading. For more information on locks,
see FAQ: Concurrency.
- The process within the sharding system that ensures that writes
issued to a shard that is not responsible for the
relevant chunk get applied to the proper shard. For related
information, see What does writebacklisten in the log mean? and