- Reference >
- Glossary
Glossary¶
- $cmd
- A virtual collection that exposes MongoDB’s database commands.
- _id
- A field containing a unique ID, typically a BSON ObjectId.
If not specified, this value is automatically assigned
upon the creation of a new document. You can think of the
_id
as the document’s primary key. - accumulator
- An expression in the aggregation framework that
maintains state between documents in the aggregation
pipeline. See:
$group
for a list of accumulator operations. - admin database
- A privileged database named
admin
. Users must have access to this database to run certain administrative commands. See administrative commands for more information and Administration Commands for a list of these commands. - aggregation
- Any of a variety of operations that reduce and summarize large
sets of data. SQL’s
GROUP
and MongoDB’s map-reduce are two examples of aggregation functions. - aggregation framework
The MongoDB aggregation framework provides a means to calculate aggregate values without having to use map-reduce.
See also
- arbiter
A member of a replica set that exists solely to vote in elections. Arbiters do not replicate data.
See also
- balancer
- An internal MongoDB process that runs in the context of a sharded cluster and manages the migration of chunks. Administrators must disable the balancer for all maintenance operations on a sharded cluster.
- box
- MongoDB’s geospatial indexes and querying system
allow you to build queries around rectangles on two-dimensional
coordinate systems. These queries use the
$box
operator to define a shape using the lower-left and the upper-right coordinates. - BSON
A serialization format used to store documents and make remote procedure calls in MongoDB. “BSON” is a portmanteau of the words “binary” and “JSON”. Think of BSON as a binary representation of JSON (JavaScript Object Notation) documents. For a detailed spec, see bsonspec.org.
See also
The Data Type Fidelity section.
- BSON types
The set of types supported by the BSON serialization format. The following types are available:
Type Number Double 1 String 2 Object 3 Array 4 Binary data 5 Object id 7 Boolean 8 Date 9 Null 10 Regular Expression 11 JavaScript 13 Symbol 14 JavaScript (with scope) 15 32-bit integer 16 Timestamp 17 64-bit integer 18 Min key 255 Max key 127 - btree
- A data structure used by most database management systems for to store indexes. MongoDB uses b-trees for its indexes.
- CAP Theorem
- Given three properties of computing systems, consistency, availability, and partition tolerance, a distributed computing system can provide any two of these features, but never all three.
- capped collection
A fixed-sized collection. Once they reach their fixed size, capped collections automatically overwrite their oldest entries. MongoDB’s oplog replication mechanism depends on capped collections. Developers may also use capped collections in their applications.
See also
The Capped Collections page.
- checksum
- A calculated value used to ensure data integrity. The md5 algorithm is sometimes used as a checksum.
- chunk
- In the context of a sharded cluster, a chunk is a contiguous
range of shard key values assigned to a particular
shard. Chunk ranges are inclusive of the lower boundary
and exclusive of the upper boundary. By default, chunks are 64
megabytes or less. When they grow beyond the configured chunk
size, a
mongos
splits the chunk into two chunks. - circle
- MongoDB’s geospatial indexes and querying system
allow you to build queries around circles on two-dimensional
coordinate systems. These queries use the
$within
operator and the$center
operator to define a circle using the center and the radius of the circle. - client
- The application layer that uses a database for data persistence and storage. Drivers provide the interface level between the application layer and the database server.
- cluster
- A set of
mongod
instances running in conjunction to increase database availability and performance. See sharding and replication for more information on the two different approaches to clustering with MongoDB. - collection
Collections are groupings of BSON documents. Collections do not enforce a schema, but they are otherwise mostly analogous to RDBMS tables.
The documents within a collection may not need the exact same set of fields, but typically all documents in a collection have a similar or related purpose for an application.
All collections exist within a single database. The namespace within a database for collections are flat.
See What is a namespace in MongoDB? and BSON Documents for more information.
- compound index
- An index consisting of two or more keys. See Indexing Overview for more information.
- config database
- One of three
mongod
instances that store all of the metadata associated with a sharded cluster. - control script
- A simple shell script, typically located in the
/etc/rc.d
or/etc/init.d
directory and used by the system’s initialization process to start, restart and stop a daemon process. - control script
- A script used by a UNIX-like operating system to start, stop,
or restart a daemon process. On most systems,
you can find these scripts in the
/etc/init.d/
or/etc/rc.d/
directories. - CRUD
- Create, read, update, and delete. The fundamental operations of any database.
- CSV
- A text-based data format consisting of comma-separated values.
This format is commonly used to exchange database between relational
databases, since the format is well-suited to tabular data. You can
import CSV files using
mongoimport
. - cursor
- In MongoDB, a cursor is a pointer to the result set of a query, that clients can iterate through to retrieve results. By default, cursors will timeout after 10 minutes of inactivity.
- daemon
- The conventional name for a background, non-interactive process.
- data-center awareness
A property that allows clients to address members in a system to based upon their location.
Replica sets implement data-center awareness using tagging. See Data Center Awareness for more information.
- database
- A physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically servers multiple databases.
- database command
Any MongoDB operation other than an insert, update, remove, or query. MongoDB exposes commands as queries against the special $cmd collection. For example, the implementation of
count
for MongoDB is a command.See also
Database Commands Quick Reference for a full list of database commands in MongoDB
- database profiler
A tool that, when enabled, keeps a record on all long-running operations in a database’s
system.profile
collection. The profiler is most often used to diagnose slow queries.See also
- dbpath
Refers to the location of MongoDB’s data file storage. The default
dbpath
is/data/db
. Other common data paths include/srv/mongodb
and/var/lib/mongodb
.- delayed member
A member of a replica set that cannot become primary and applies operations at a specified delay. This delay is useful for protecting data from human error (i.e. unintentionally deleted databases) or updates that have unforeseen effects on the production database.
See also
- diagnostic log
mongod
can create a verbose log of operations with themongod --diaglog
option or through thediagLogging
command. Themongod
creates this log in the directory specified tomongod --dbpath
. The name of the isdiaglog.<time in hex>
, where “<time-in-hex>
” reflects the initiation time of logging as a hexadecimal string.Warning
Setting the diagnostic level to
0
will causemongod
to stop writing data to the diagnostic log file. However, themongod
instance will continue to keep the file open, even if it is no longer writing data to the file. If you want to rename, move, or delete the diagnostic log you must cleanly shut down themongod
instance before doing so.See also
mongod --diaglog
,diaglog
, anddiagLogging
.- document
- A record in a MongoDB collection, and the basic unit of data in MongoDB. Documents are analogous to JSON objects, but exist in the database in a more type-rich format known as BSON.
- dot notation
MongoDB uses the dot notation to access the elements of an array and to access the fields of a subdocument.
To access an element of an array by the zero-based index position, you concatenate the array name with the dot (
.
) and zero-based index position:To access a field of a subdocument with dot-notation, you concatenate the subdocument name with the dot (
.
) and the field name:- draining
The process of removing or “shedding” chunks from one shard to another. Administrators must drain shards before removing them from the cluster.
See also
- driver
A client implementing the communication protocol required for talking to a server. The MongoDB drivers provide language-idiomatic methods for interfacing with MongoDB.
See also
- election
In the context of replica sets, an election is the process by which members of a replica set select primaries on startup and in the event of failures.
See also
- eventual consistency
- A property of a distributed system allowing changes to the system to propagate gradually. In a database system, this means that readable members are not required to reflect the latest writes at all times. In MongoDB, reads to a primary have strict consistency; reads to secondaries have eventual consistency.
- expression
In the context of the aggregation framework, expressions are the stateless transformations that operate on the data that passes through the pipeline.
See also
- failover
The process that allows one of the secondary members in a replica set to become primary in the event of a failure.
See also
- field
- A name-value pair in a document. Documents have zero or more fields. Fields are analogous to columns in relational databases.
- firewall
- A system level networking filter that restricts access based on, among other things, IP address. Firewalls form part of effective network security strategy.
- fsync
- A system call that flushes all dirty, in-memory pages to
disk. MongoDB calls
fsync()
on its database files at least every 60 seconds. - Geohash
- A value is a binary representation of the location on a coordinate grid.
- geospatial
- Data that relates to geographical location. In MongoDB, you may index or store geospatial data according to geographical parameters and reference specific coordinates in queries.
- GridFS
A convention for storing large files in a MongoDB database. All of the official MongoDB drivers support this convention, as does the
mongofiles
program.See also
- haystack index
- In the context of geospatial queries, haystack indexes enhance searches by creating “bucket” of objects grouped by a second criterion. For example, you might want all geospatial searches to first select along a non-geospatial dimension and then match on location.
A member of a replica set that cannot become primary and is not advertised as part of the set in the database command
isMaster
, which prevents it from receiving read-only queries depending on read preference.See also
Hidden Member,
isMaster
,db.isMaster
, andlocal.system.replset.members[n].hidden
.- idempotent
- When calling an idempotent operation on a value or state, the operation only affects the value once. Thus, the operation can safely run multiple times without unwanted side effects. In the context of MongoDB, oplog entries must be idempotent to support initial synchronization and recovery from certain failure situations. Thus, MongoDB can safely apply oplog entries more than once without any ill effects.
- index
- A data structure that optimizes queries. See Indexing Overview for more information.
- initial sync
- The replica set operation that replicates data from an existing replica set member to a new or restored replica set member.
- IPv6
- A revision to the IP (Internet Protocol) standard that provides a significantly larger address space to more effectively support the number of hosts on the contemporary Internet.
- ISODate
- The international date format used by
mongo
to display dates. E.g.YYYY-MM-DD HH:MM.SS.milis
. - JavaScript
- A popular scripting language original designed for web browsers. The MongoDB shell and certain server-side functions use a JavaScript interpreter.
- journal
A sequential, binary transaction used to bring the database into a consistent state in the event of a hard shutdown. MongoDB enables journaling by default for 64-bit builds of MongoDB version 2.0 and newer. Journal files are pre-allocated and will exist as three 1GB file in the data directory. To make journal files smaller, use
smallfiles
.When enabled, MongoDB writes data first to the journal and then to the core data files. MongoDB commits to the journal within 100ms, which is configurable using the
journalCommitInterval
runtime option.To force
mongod
to commit to the journal more frequently, you can specifyj:true
. When a write operation withj:true
is pending,mongod
will reducejournalCommitInterval
to a third of the set value.See also
The Journaling page.
- JSON
- JavaScript Object Notation. A human-readable, plain text format for expressing structured data with support in many programming languages.
- JSON document
A JSON document is a collection of fields and values in a structured format. The following is a sample JSON document with two fields:
- JSONP
- JSON with Padding. Refers to a method of injecting JSON into applications. Presents potential security concerns.
- LVM
- Logical volume manager. LVM is a program that abstracts disk images from physical devices, and provides a number of raw disk manipulation and snapshot capabilities useful for system management.
- map-reduce
A data and processing and aggregation paradigm consisting of a “map” phase that selects data, and a “reduce” phase that transforms the data. In MongoDB, you can run arbitrary aggregations over data using map-reduce.
See also
The Map-Reduce page for more information regarding MongoDB’s map-reduce implementation, and Aggregation Framework for another approach to data aggregation in MongoDB.
- master
- In conventional master/slave replication, the master database receives all writes. The slave instances replicate from the master instance in real time.
- md5
md5
is a hashing algorithm used to efficiently provide reproducible unique strings to identify and checksum data. MongoDB uses md5 to identify chunks of data for GridFS.- MIME
- “Multipurpose Internet Mail Extensions.” A standard set of type and encoding definitions used to declare the encoding and type of data in multiple data storage, transmission, and email contexts.
- mongo
The MongoDB Shell.
mongo
connects tomongod
andmongos
instances, allowing administration, management, and testing.mongo
has a JavaScript interface.See also
- mongod
The program implementing the MongoDB database server. This server typically runs as a daemon.
See also
- MongoDB
- The document-based database server described in this manual.
- mongos
The routing and load balancing process that acts an interface between an application and a MongoDB sharded cluster.
See also
- multi-master replication
- A replication method where multiple database instances can accept write operations to the same data set at any time. Multi-master replication exchanges increased concurrency and availability for a relaxed consistency semantic. MongoDB ensures consistency and, therefore, does not provide multi-master replication.
- namespace
- The canonical name for a collection or index in MongoDB.
The namespace is a combination of the database name and
the name of the collection or index, like so:
[database-name].[collection-or-index-name]
. All documents belong to a namespace. - natural order
The order in which a database stores documents on disk. Typically, the order of documents on disks reflects insertion order, except when documents move internal because of document growth due to update operations. However, Capped collections guarantee that insertion order and natural order are identical.
When you execute
find()
with no parameters, the database returns documents in forward natural order. When you executefind()
and includesort()
with a parameter of$natural:-1
, the database returns documents in reverse natural order.- ObjectId
- A special 12-byte BSON type that has a high probability an ObjectId represent the time of the ObjectId’s creation. MongoDB uses ObjectId values as the default values for _id fields.
- operator
- A keyword beginning with a
$
used to express a complex query, update, or data transformation. For example,$gt
is the query language’s “greater than” operator. See the Query, Update, and Projection Operators Quick Reference for more information about the available operators. - oplog
A capped collection that stores an ordered history of logical writes to a MongoDB database. The oplog is the basic mechanism enabling replication in MongoDB.
See also
- ordered query plan
Query plan that returns results in the order consistent with the
sort()
order.See also
- padding
- The extra space allocated to document on the disk to prevent
moving a document when it grows as the result of
update()
operations. - padding factor
- An automatically-calibrated constant used to determine how much extra space MongoDB should allocate per document container on disk. A padding factor of 1 means that MongoDB will allocate only the amount of space needed for the document. A padding factor of 2 means that MongoDB will allocate twice the amount of space required by the document.
- page fault
The event that occurs when a process requests stored data (i.e. a page) from memory that the operating system has moved to disk.
See also
- partition
- A distributed system architecture that splits data into ranges. Sharding is a kind of partitioning.
- pcap
- A packet capture format used by
mongosniff
to record packets captured from network interfaces and display them as human-readable MongoDB operations. - PID
- A process identifier. On UNIX-like systems, a unique integer PID is assigned to each running process. You can use a PID to inspect a running process and send signals to it.
- pipe
- A communication channel in UNIX-like systems allowing independent processes to send and receive data. In the UNIX shell, piped operations allow users to direct the output of one command into the input of another.
- pipeline
The series of operations in the aggregation process.
See also
- polygon
- MongoDB’s geospatial indexes and querying system
allow you to build queries around multi-sided
polygons on two-dimensional coordinate systems. These queries use
the
$within
operator and a sequence of points that define the corners of the polygon. - powerOf2Sizes
A per-collection setting that changes and normalizes the way that MongoDB allocates space for each document in an effort to maximize storage reuse reduce fragmentation. This is the default for TTL Collections. See
collMod
andusePowerOf2Sizes
for more information.New in version 2.2.
- pre-splitting
- An operation, performed before inserting data that divides the range of possible shard key values into chunks to facilitate easy insertion and high write throughput. When deploying a sharded cluster, in some cases pre-splitting will expedite the initial distribution of documents among shards by manually dividing the collection into chunks rather than waiting for the MongoDB balancer to create chunks during the course of normal operation.
- primary
- In a replica set, the primary member is the current master instance, which receives all write operations.
- primary key
- A record’s unique, immutable identifier. In an RDBMS, the primary
key is typically an integer stored in each row’s
id
field. In MongoDB, the _id field holds a document’s primary key which is usually a BSON ObjectId. - primary shard
- For a database where sharding is enabled, the primary shard holds all un-sharded collections.
- priority
In the context of replica sets, priority is a configurable value that helps determine which members in a replica set are most likely to become primary.
See also
- projection
- A document given to a query that specifies which fields MongoDB will return from the documents in the result set.
- query
- A read request. MongoDB queries use a JSON-like query
language that includes a variety of query operators
with names that begin with a
$
character. In themongo
shell, you can issue queries using thedb.collection.find()
anddb.collection.findOne()
methods. - query optimizer
For each query, the MongoDB query optimizer generates a query plan that matches the query to the index that produces the fastest results. The optimizer then uses the query plan each time the
mongod
receives the query. If a collection changes significantly, the optimizer creates a new query plan.See also
- RDBMS
- Relational Database Management System. A database management system based on the relational model, typically using SQL as the query language.
- read preference
A setting on the MongoDB drivers that determines how the clients direct read operations. Read preference affects all replica sets including shards. By default, drivers direct all reads to primaries for strict consistency. However, you may also direct reads to secondaries for eventually consistent reads.
See also
- read-lock
- In the context of a reader-writer lock, a lock that while held allows concurrent readers, but no writers.
- record size
- The space allocated for a document including the padding.
- recovering
- A replica set member status indicating that a member is not ready to begin normal activities of a secondary or primary. Recovering members are unavailable for reads.
- replica pairs
The precursor to the MongoDB replica sets.
Deprecated since version 1.6.
- replica set
A cluster of MongoDB servers that implements master-slave replication and automated failover. MongoDB’s recommended replication strategy.
See also
- replication
A feature allowing multiple database servers to share the same data, thereby ensuring redundancy and facilitating load balancing. MongoDB supports two flavors of replication: master-slave replication and replica sets.
See also
replica set, sharding, Replication. and Replica Set Fundamental Concepts.
- replication lag
The length of time between the last operation in the primary’s oplog last operation applied to a particular secondary or slave. In general, you want to keep replication lag as small as possible.
See also
- resident memory
- The subset of an application’s memory currently stored in physical RAM. Resident memory is a subset of virtual memory, which includes memory mapped to physical RAM and to disk.
- REST
- An API design pattern centered around the idea of resources and the CRUD operations that apply to them. Typically implemented over HTTP. MongoDB provides a simple HTTP REST interface that allows HTTP clients to run commands against the server.
- rollback
- A process that, in certain replica set situations, reverts writes operations to ensure the consistency of all replica set members.
- secondary
- In a replica set, the
secondary
members are the current slave instances that replicate the contents of the master database. Secondary members may handle read requests, but only the primary members can handle write operations. - secondary index
- A database index that improves query performance by minimizing the amount of work that the query engine must perform to fulfill a query.
- set name
In the context of a replica set, the
set name
refers to an arbitrary name given to a replica set when it’s first configured. All members of a replica set must have the same name specified with thereplSet
setting (or--replSet
option formongod
.)See also
replication, Replication and Replica Set Fundamental Concepts.
- shard
A single replica set that stores some portion of a sharded cluster’s total data set. See sharding.
See also
The documents in the Sharding section of manual.
- shard key
- In a sharded collection, a shard key is the field that MongoDB uses to distribute documents among members of the sharded cluster.
- sharded cluster
The set of nodes comprising a sharded MongoDB deployment. A sharded cluster consists of three config processes, one or more replica sets, and one or more
mongos
routing processes.See also
The documents in the Sharding section of manual.
- sharding
A database architecture that enable horizontal scaling by splitting data into key ranges among two or more replica sets. This architecture is also known as “range-based partitioning.” See shard.
See also
The documents in the Sharding section of manual.
- shell helper
A number of database commands have “helper” methods in the
mongo
shell that provide a more concise syntax and improve the general interactive experience.See also
- single-master replication
- A replication topology where only a single database instance accepts writes. Single-master replication ensures consistency and is the replication topology employed by MongoDB.
- slave
- In conventional master/slave replication, slaves are read-only instances that replicate operations from the master database. Data read from slave instances may not be completely consistent with the master. Therefore, applications requiring consistent reads must read from the master database instance.
- split
- The division between chunks in a sharded cluster.
- SQL
- Structured Query Language (SQL) is a common special-purpose
programming language used for interaction with a relational
database including access control as well as inserting,
updating, querying, and deleting data. There are some similar
elements in the basic SQL syntax supported by different database
vendors, but most implementations have their own dialects, data
types, and interpretations of proposed SQL standards. Complex
SQL is generally not directly portable between major
RDBMS products.
SQL
is often used as metonym for relational databases. - SSD
- Solid State Disk. A high-performance disk drive that uses solid state electronics for persistence, as opposed to the rotating platters and movable read/write heads used by traditional mechanical hard drives.
- standalone
- In MongoDB, a standalone is an instance of
mongod
that is running as a single server and not as part of a replica set. - strict consistency
- A property of a distributed system requiring that all members always reflect the latest changes to the system. In a database system, this means that any system that can provide data must reflect the latest writes at all times. In MongoDB, reads to a primary have strict consistency; reads to secondary members have eventual consistency.
- sync
The replica set operation where members replicate data from the primary. Replica sets synchronize data at two different points:
- Initial sync occurs when MongoDB creates new databases on a new or restored replica set member, populating the the member with the replica set’s data.
- “Replication” occurs continually after initial sync and keeps the member updated with changes to the replica set’s data.
- syslog
- On UNIX-like systems, a logging process that provides a uniform standard for servers and processes to submit logging information.
- tag
- One or more labels applied to a given replica set member that clients may use to issue data-center aware operations.
- TSV
- A text-based data format consisting of tab-separated values.
This format is commonly used to exchange database between relational
databases, since the format is well-suited to tabular data. You can
import TSV files using
mongoimport
. - TTL
- Stands for “time to live,” and represents an expiration time or period for a given piece of information to remain in a cache or other temporary storage system before the system deletes it or ages it out.
- unique index
- An index that enforces uniqueness for a particular field across a single collection.
- unordered query plan
Query plan that returns results in an order inconsistent with the
sort()
order.See also
- upsert
- A kind of update that either updates the first document matched in the provided query selector or, if no document matches, inserts a new document having the fields implied by the query selector and the update operation.
- virtual memory
- An application’s working memory, typically residing on both disk an in physical RAM.
- working set
- The collection of data that MongoDB uses regularly. This data is typically (or preferably) held in RAM.
- write concern
Specifies whether a write operation has succeeded. Write concern allows your application to detect insertion errors or unavailable
mongod
instances. For replica sets, you can configure write concern to confirm replication to a specified number of members.See also
Write Concern, Write Operations, and Write Concern for Replica Sets.
- write-lock
- A lock on the database for a given writer. When a process writes to the database, it takes an exclusive write-lock to prevent other processes from writing or reading.
- writeBacks
- The process within the sharding system that ensures that writes issued to a shard that isn’t responsible for the relevant chunk, get applied to the proper shard.