MongoDB Manual Contents

Install MongoDB

Installation Guides

MongoDB runs on most platforms, and supports 32-bit and 64-bit architectures. 10gen, the MongoDB makers, provides both binaries and packages. Choose your platform below:

Install MongoDB on Red Hat Enterprise, CentOS, or Fedora Linux

Synopsis

This tutorial outlines the basic installation process for deploying MongoDB on Red Hat Enterprise Linux, CentOS Linux, Fedora Linux and related systems. This procedure uses .rpm packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .rpm packages for easy installation and management for users of CentOS, Fedora and Red Hat Enterprise Linux systems. While some of these distributions include their own MongoDB packages, the 10gen packages are generally more up to date.

This tutorial includes: an overview of the available packages, instructions for configuring the package manager, the process install packages from the 10gen repository, and preliminary MongoDB configuration and operation.

Package Options

The 10gen repository contains two packages:

  • mongo-10gen-server

    This package contains the mongod and mongos daemons from the latest stable release and associated configuration and init scripts. Additionally, you can use this package to install daemons from a previous release of MongoDB.

  • mongo-10gen

    This package contains all MongoDB tools from the latest stable release. Additionally, you can use this package to install tools from a previous release of MongoDB. Install this package on all production MongoDB hosts and optionally on other systems from which you may need to administer MongoDB systems.

Install MongoDB
Configure Package Management System (YUM)

Create a /etc/yum.repos.d/10gen.repo file to hold information about your repository. If you are running a 64-bit system (recommended,) place the following configuration in /etc/yum.repos.d/10gen.repo file:

[10gen]
name=10gen Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64
gpgcheck=0
enabled=1

If you are running a 32-bit system, which isn’t recommended for production deployments, place the following configuration in /etc/yum.repos.d/10gen.repo file:

[10gen]
name=10gen Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686
gpgcheck=0
enabled=1
Install Packages

Issue the following command (as root or with sudo) to install the latest stable version of MongoDB and the associated tools:

yum install mongo-10gen mongo-10gen-server

When this command completes, you have successfully installed MongoDB!

Manage Installed Versions

You can use the mongo-10gen and mongo-10gen-server packages to install previous releases of MongoDB. To install a specific release, append the version number, as in the following example:

yum install mongo-10gen-2.2.3 mongo-10gen-server-2.2.3

This installs the mongo-10gen and mongo-10gen-server packages with the 2.2.3 release. You can specify any available version of MongoDB; however yum will upgrade the mongo-10gen and mongo-10gen-server packages when a newer version becomes available. Use the following pinning procedure to prevent unintended upgrades.

To pin a package, add the following line to your /etc/yum.conf file:

exclude=mongo-10gen,mongo-10gen-server
Configure MongoDB

These packages configure MongoDB using the /etc/mongod.conf file in conjunction with the control script. You can find the init script at /etc/rc.d/init.d/mongod.

This MongoDB instance will store its data files in the /var/lib/mongo and its log files in /var/log/mongo, and run using the mongod user account.

Note

If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongo and /var/log/mongo directories.

Control MongoDB

Warning

With the introduction of systemd in Fedora 15, the control scripts included in the packages available in the 10gen repository are not compatible with Fedora systems. A correction is forthcoming, see SERVER-7285 for more information, and in the mean time use your own control scripts or install using the procedure outlined in Install MongoDB on Linux.

Start MongoDB

Start the mongod process by issuing the following command (as root, or with sudo):

service mongod start

You can verify that the mongod process has started successfully by checking the contents of the log file at /var/log/mongo/mongod.log.

You may optionally, ensure that MongoDB will start following a system reboot, by issuing the following command (with root privileges:)

chkconfig mongod on
Stop MongoDB

Stop the mongod process by issuing the following command (as root, or with sudo):

service mongod stop
Restart MongoDB

You can restart the mongod process by issuing the following command (as root, or with sudo):

service mongod restart

Follow the state of this process by watching the output in the /var/log/mongo/mongod.log file to watch for errors or important messages from the server.

Control mongos

As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.

SELinux Considerations

You must SELinux to allow MongoDB to start on Fedora systems. Administrators have two options:

  • enable access to the relevant ports (e.g. 27017) for SELinux. See Interfaces and Port Numbers for more information on MongoDB’s default ports.
  • disable SELinux entirely. This requires a system reboot and may have larger implications for your deployment.
Using MongoDB

Among the tools included in the mongo-10gen package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:

mongo

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database and then retrieve that document.

> db.test.save( { a: 1 } )
> db.test.find()

See also

mongo” and “mongo Shell Methods

Install MongoDB on Ubuntu

Synopsis

This tutorial outlines the basic installation process for installing MongoDB on Ubuntu Linux systems. This tutorial uses .deb packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .deb packages for easy installation and management for users of Ubuntu systems. Although Ubuntu does include MongoDB packages, the 10gen packages are generally more up to date.

This tutorial includes: an overview of the available packages, instructions for configuring the package manager, the process for installing packages from the 10gen repository, and preliminary MongoDB configuration and operation.

Note

If you use an older Ubuntu that does not use Upstart, (i.e. any version before 9.10 “Karmic”) please follow the instructions on the Install MongoDB on Debian tutorial.

Package Options

The 10gen repository provides the mongodb-10gen package, which contains the latest stable release. Additionally you can install previous releases of MongoDB.

You cannot install this package concurrently with the mongodb, mongodb-server, or mongodb-clients packages provided by Ubuntu.

Install MongoDB
Configure Package Management System (APT)

The Ubuntu package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

Create a /etc/apt/sources.list.d/10gen.list file using the following command.

echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen' | sudo tee /etc/apt/sources.list.d/10gen.list

Now issue the following command to reload your repository:

sudo apt-get update
Install Packages

Issue the following command to install the latest stable version of MongoDB:

sudo apt-get install mongodb-10gen

When this command completes, you have successfully installed MongoDB! Continue for configuration and start-up suggestions.

Manage Installed Versions

You can use the mongodb-10gen package to install previous versions of MongoDB. To install a specific release, append the version number to the package name, as in the following example:

apt-get install mongodb-10gen=2.2.3

This will install the 2.2.3 release of MongoDB. You can specify any available version of MongoDB; however apt-get will upgrade the mongodb-10gen package when a newer version becomes available. Use the following pinning procedure to prevent unintended upgrades.

To pin a package, issue the following command at the system prompt to pin the version of MongoDB at the currently installed version:

echo "mongodb-10gen hold" | dpkg --set-selections
Configure MongoDB

These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You will find the control script is at /etc/init.d/mongodb.

This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.

Note

If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.

Controlling MongoDB
Starting MongoDB

You can start the mongod process by issuing the following command:

sudo service mongodb start

You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.

Stopping MongoDB

As needed, you may stop the mongod process by issuing the following command:

sudo service mongodb stop
Restarting MongoDB

You may restart the mongod process by issuing the following command:

sudo service mongodb restart
Controlling mongos

As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.

Using MongoDB

Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:

mongo

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database.

> db.test.save( { a: 1 } )
> db.test.find()

See also

mongo” and “mongo Shell Methods

Install MongoDB on Debian

Synopsis

This tutorial outlines the basic installation process for installing MongoDB on Debian systems. This tutorial uses .deb packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .deb packages for easy installation and management for users of Debian systems. While some of these distributions include their own MongoDB packages, the 10gen packages are generally more up to date.

This tutorial includes: an overview of the available packages, instructions for configuring the package manager, the process for installing packages from the 10gen repository, and preliminary MongoDB configuration and operation.

Note

This tutorial applies to both Debian systems and versions of Ubuntu Linux prior to 9.10 “Karmic” which do not use Upstart. Other Ubuntu users will want to follow the Install MongoDB on Ubuntu tutorial.

Package Options

The 10gen repository provides the mongodb-10gen package, which contains the latest stable release. Additionally you can install previous releases of MongoDB.

You cannot install this package concurrently with the mongodb, mongodb-server, or mongodb-clients packages that your release of Debian may include.

Install MongoDB
Configure Package Management System (APT)

The Debian package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

Create a /etc/apt/sources.list.d/10gen.list file using the following command.

echo 'deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen' | sudo tee /etc/apt/sources.list.d/10gen.list

Now issue the following command to reload your repository:

sudo apt-get update
Install Packages

Issue the following command to install the latest stable version of MongoDB:

sudo apt-get install mongodb-10gen

When this command completes, you have successfully installed MongoDB!

Manage Installed Versions

You can use the mongodb-10gen package to install previous versions of MongoDB. To install a specific release, append the version number to the package name, as in the following example:

apt-get install mongodb-10gen=2.2.3

This will install the 2.2.3 release of MongoDB. You can specify any available version of MongoDB; however apt-get will upgrade the mongodb-10gen package when a newer version becomes available. Use the following pinning procedure to prevent unintended upgrades.

To pin a package, issue the following command at the system prompt to pin the version of MongoDB at the currently installed version:

echo "mongodb-10gen hold" | dpkg --set-selections
Configure MongoDB

These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You can find the control script at /etc/init.d/mongodb.

This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.

Note

If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.

Controlling MongoDB
Starting MongoDB

Issue the following command to start mongod:

sudo /etc/init.d/mongodb start

You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.

Stopping MongoDB

Issue the following command to stop mongod:

sudo /etc/init.d/mongodb stop
Restarting MongoDB

Issue the following command to restart mongod:

sudo /etc/init.d/mongodb restart
Controlling mongos

As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.

Using MongoDB

Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:

mongo

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database.

> db.test.save( { a: 1 } )
> db.test.find()

See also

mongo” and “mongo Shell Methods

Install MongoDB on Linux

Synopsis

10gen provides compiled versions of MongoDB for use on Linux that provides a simple option for users who cannot use packages. This tutorial outlines the basic installation of MongoDB using these compiled versions and an initial usage guide.

Download MongoDB

Note

You should place the MongoDB binaries in a central location on the file system that is easy to access and control. Consider /opt or /usr/local/bin.

In a terminal session, begin by downloading the latest release. In most cases you will want to download the 64-bit version of MongoDB.

curl http://downloads.mongodb.org/linux/mongodb-linux-x86_64-2.4.3.tgz > mongodb.tgz

If you need to run the 32-bit version, use the following command.

curl http://downloads.mongodb.org/linux/mongodb-linux-i686-2.4.3.tgz > mongodb.tgz

Once you’ve downloaded the release, issue the following command to extract the files from the archive:

tar -zxvf mongodb.tgz

Optional

You may use the following command to copy the extracted folder into a more generic location.

cp -R -n  mongodb-linux-????-??-??/ mongodb

You can find the mongod binary, and the binaries all of the associated MongoDB utilities, in the bin/ directory within the extracted directory.

Using MongoDB

Before you start mongod for the first time, you will need to create the data directory. By default, mongod writes data to the /data/db/ directory. To create this directory, use the following command:

mkdir -p /data/db

Note

Ensure that the system account that will run the mongod process has read and write permissions to this directory. If mongod runs under the mongodb user account, issue the following command to change the owner of this folder:

chown mongodb /data/db

If you use an alternate location for your data directory, ensure that this user can write to your chosen data path.

You can specify, and create, an alternate path using the --dbpath option to mongod and the above command.

The 10gen builds of MongoDB contain no control scripts or method to control the mongod process. You may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your /usr/local/bin or /usr/bin directory for easier use.

For testing purposes, you can start a mongod directly in the terminal without creating a control script:

mongod --config /etc/mongod.conf

Note

The above command assumes that the mongod binary is accessible via your system’s search path, and that you have created a default configuration file located at /etc/mongod.conf.

Among the tools included with this MongoDB distribution, is the mongo shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt:

./bin/mongo

Note

The ./bin/mongo command assumes that the mongo binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz file.

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database and then retrieve that record:

> db.test.save( { a: 1 } )
> db.test.find()

See also

mongo” and “mongo Shell Methods

Install MongoDB on OS X

Platform Support

MongoDB only supports OS X versions 10.6 (Snow Leopard) and later.

Changed in version 2.4.

Synopsis

This tutorial outlines the basic installation process for deploying MongoDB on Macintosh OS X systems. This tutorial provides two main methods of installing the MongoDB server (i.e. “mongod”) and associated tools: first using the community package management tools, and second using builds of MongoDB provided by 10gen.

Install with Package Management

Both community package management tools: Homebrew and MacPorts require some initial setup and configuration. This configuration is beyond the scope of this document. You only need to use one of these tools.

If you want to use package management, and do not already have a system installed, Homebrew is typically easier and simpler to use.

Homebrew

Homebrew installs binary packages based on published “formula.” Issue the following command at the system shell to update the brew package manager:

brew update

Use the following command to install the MongoDB package into your Homebrew system.

brew install mongodb

Later, if you need to upgrade MongoDB, you can issue the following sequence of commands to update the MongoDB installation on your system:

brew update
brew upgrade mongodb
MacPorts

MacPorts distributes build scripts that allow you to easily build packages and their dependencies on your own system. The compilation process can take significant period of time depending on your system’s capabilities and existing dependencies. Issue the following command in the system shell:

port install mongodb
Using MongoDB from Homebrew and MacPorts

The packages installed with Homebrew and MacPorts contain no control scripts or interaction with the system’s process manager.

If you have configured Homebrew and MacPorts correctly, including setting your PATH, the MongoDB applications and utilities will be accessible from the system shell. Start the mongod process in a terminal (for testing or development) or using a process management tool.

mongod

Then open the mongo shell by issuing the following command at the system prompt:

mongo

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database and then retrieve that record.

> db.test.save( { a: 1 } )
> db.test.find()

See also

mongo” and “mongo Shell Methods

Install from 10gen Builds

10gen provides compiled binaries of all MongoDB software compiled for OS X, which may provide a more straightforward installation process.

Download MongoDB

In a terminal session, begin by downloading the latest release. Use the following command at the system prompt:

curl http://downloads.mongodb.org/osx/mongodb-osx-x86_64-2.4.3.tgz > mongodb.tgz

Note

The mongod process will not run on older Macintosh computers with PowerPC (i.e. non-Intel) processors.

Once you’ve downloaded the release, issue the following command to extract the files from the archive:

tar -zxvf mongodb.tgz

Optional

You may use the following command to move the extracted folder into a more generic location.

mv -n mongodb-osx-[platform]-[version]/ /path/to/new/location/

Replace [platform] with i386 or x86_64 depending on your system and the version you downloaded, and [version] with 2.4 or the version of MongoDB that you are installing.

You can find the mongod binary, and the binaries all of the associated MongoDB utilities, in the bin/ directory within the archive.

Using MongoDB from 10gen Builds

Before you start mongod for the first time, you will need to create the data directory. By default, mongod writes data to the /data/db/ directory. To create this directory, and set the appropriate permissions use the following commands:

sudo mkdir -p /data/db
sudo chown `id -u` /data/db

You can specify an alternate path for data files using the --dbpath option to mongod.

The 10gen builds of MongoDB contain no control scripts or method to control the mongod process. You may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your /usr/local/bin directory for easier use.

For testing purposes, you can start a mongod directly in the terminal without creating a control script:

mongod --config /etc/mongod.conf

Note

This command assumes that the mongod binary is accessible via your system’s search path, and that you have created a default configuration file located at /etc/mongod.conf.

Among the tools included with this MongoDB distribution, is the mongo shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt from inside of the directory where you extracted mongo:

./bin/mongo

Note

The ./bin/mongo command assumes that the mongo binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz file.

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database and then retrieve that record:

> db.test.save( { a: 1 } )
> db.test.find()

See also

mongo” and “mongo Shell Methods

Install MongoDB on Windows

Synopsis

This tutorial provides a method for installing and running the MongoDB server (i.e. “mongod.exe”) on the Microsoft Windows platform through the Command Prompt and outlines the process for setting up MongoDB as a Windows Service.

Operating MongoDB with Windows is similar to MongoDB on other platforms. Most components share the same operational patterns.

Procedure

Important

If you are running any edition of Windows Server 2008 R2 or Windows 7, please install a hotfix to resolve an issue with memory mapped files on Windows.

Download MongoDB for Windows

Download the latest production release of MongoDB from the MongoDB downloads page.

There are three builds of MongoDB for Windows:

  • MongoDB for Windows Server 2008 R2 edition (i.e. 2008R2) only runs on Windows Server 2008 R2, Windows 7 64-bit, and newer versions of Windows. This build takes advantage of recent enhancements to the Windows Platform and cannot operate on older versions of Windows.
  • MongoDB for Windows 64-bit runs on any 64-bit version of Windows newer than Windows XP, including Windows Server 2008 R2 and Windows 7 64-bit.
  • MongoDB for Windows 32-bit runs on any 32-bit version of Windows newer than Windows XP. 32-bit versions of MongoDB are only intended for older systems and for use in testing and development systems.

Changed in version 2.2: MongoDB does not support Windows XP. Please use a more recent version of Windows to use more recent releases of MongoDB.

Note

Always download the correct version of MongoDB for your Windows system. The 64-bit versions of MongoDB will not work with 32-bit Windows.

32-bit versions of MongoDB are suitable only for testing and evaluation purposes and only support databases smaller than 2GB.

You can find the architecture of your version of Windows platform using the following command in the Command Prompt:

wmic os get osarchitecture

In Windows Explorer, find the MongoDB download file, typically in the default Downloads directory. Extract the archive to C:\ by right clicking on the archive and selecting Extract All and browsing to C:\.

Note

The folder name will be either:

C:\mongodb-win32-i386-[version]

Or:

C:\mongodb-win32-x86_64-[version]

In both examples, replace [version] with the version of MongoDB downloaded.

Set up the Environment

Start the Command Prompt by selecting the Start Menu, then All Programs, then Accessories, then right click Command Prompt, and select Run as Administrator from the popup menu. In the Command Prompt, issue the following commands:

cd \
move C:\mongodb-win32-* C:\mongodb

Note

MongoDB is self-contained and does not have any other system dependencies. You can run MongoDB from any folder you choose. You may install MongoDB in any directory (e.g. D:\test\mongodb)

MongoDB requires a data folder to store its files. The default location for the MongoDB data directory is C:\data\db. Create this folder using the Command Prompt. Issue the following command sequence:

md data
md data\db

Note

You may specify an alternate path for \data\db with the dbpath setting for mongod.exe, as in the following example:

C:\mongodb\bin\mongod.exe --dbpath d:\test\mongodb\data

If your path includes spaces, enclose the entire path in double quotations, for example:

C:\mongodb\bin\mongod.exe --dbpath "d:\test\mongo db data"
Start MongoDB

To start MongoDB, execute from the Command Prompt:

C:\mongodb\bin\mongod.exe

This will start the main MongoDB database process. The waiting for connections message in the console output indicates that the mongod.exe process is running successfully.

Note

Depending on the security level of your system, Windows will issue a Security Alert dialog box about blocking “some features” of C:\\mongodb\bin\mongod.exe from communicating on networks. All users should select Private Networks, such as my home or work network and click Allow access. For additional information on security and MongoDB, please read the Security Practices and Management page.

Warning

Do not allow mongod.exe to be accessible to public networks without running in “Secure Mode” (i.e. auth.) MongoDB is designed to be run in “trusted environments” and the database does not enable authentication or “Secure Mode” by default.

Connect to MongoDB using the mongo.exe shell. Open another Command Prompt and issue the following command:

C:\mongodb\bin\mongo.exe

Note

Executing the command start C:\mongodb\bin\mongo.exe will automatically start the mongo.exe shell in a separate Command Prompt window.

The mongo.exe shell will connect to mongod.exe running on the localhost interface and port 27017 by default. At the mongo.exe prompt, issue the following two commands to insert a record in the test collection of the default test database and then retrieve that record:

> db.test.save( { a: 1 } )
> db.test.find()

See also

mongo” and “mongo Shell Methods.” If you want to develop applications using .NET, see the documentation of C# and MongoDB for more information.

MongoDB as a Windows Service

New in version 2.0.

Setup MongoDB as a Windows Service, so that the database will start automatically following each reboot cycle.

Note

mongod.exe added support for running as a Windows service in version 2.0, and mongos.exe added support for running as a Windows Service in version 2.1.1.

Configure the System

You should specify two options when running MongoDB as a Windows Service: a path for the log output (i.e. logpath) and a configuration file.

  1. Create a specific directory for MongoDB log files:

    md C:\mongodb\log
  2. Create a configuration file for the logpath option for MongoDB in the Command Prompt by issuing this command:

    echo logpath=C:\mongodb\log\mongo.log > C:\mongodb\mongod.cfg

While these optional steps are optional, creating a specific location for log files and using the configuration file are good practice.

Note

Consider setting the logappend option. If you do not, mongod.exe will delete the contents of the existing log file when starting.

Changed in version 2.2: The default logpath and logappend behavior changed in the 2.2 release.

Install and Run the MongoDB Service

Run all of the following commands in Command Prompt with “Administrative Privileges:”

  1. To install the MongoDB service:

    C:\mongodb\bin\mongod.exe --config C:\mongodb\mongod.cfg --install

    Modify the path to the mongod.cfg file as needed. For the --install option to succeed, you must specify a logpath setting or the --logpath run-time option.

  2. To run the MongoDB service:

    net start MongoDB
    

Note

If you wish to use an alternate path for your dbpath specify it in the config file (e.g. C:\mongodb\mongod.cfg) on that you specified in the --install operation. You may also specify --dbpath on the command line; however, always prefer the configuration file.

If the dbpath directory does not exist, mongod.exe will not be able to start. The default value for dbpath is \data\db.

Stop or Remove the MongoDB Service
  • To stop the MongoDB service:

    net stop MongoDB
    
  • To remove the MongoDB service:

    C:\mongodb\bin\mongod.exe --remove

Install MongoDB Enterprise

New in version 2.2.

MongoDB Enterprise is available on four platforms and contains support for several features related to security and monitoring.

Required Packages

Changed in version 2.4: MongoDB Enterprise requires libgsasl.

To use MongoDB Enterprise, you must install several prerequisites. The names of the packages vary by distribution and are as follows:

  • Ubuntu 12.04 requires libssl0.9.8, libgsasl, snmp, and snmpd. Issue a command such as the following to install these packages:

    sudo apt-get install libssl0.9.8 libgsasl7 snmp snmpd
    
  • Red Hat Enterprise Linux 6.x series and Amazon Linux AMI require libssl, libgsasl7, net-snmp, net-snmp-libs, and net-snmp-utils. To download libgsasl you must enable the EPEL repository by issuing the following sequence of commands to add and update the system repositories:

    sudo rpm -ivh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
    
    sudo yum update -y
    

    When you have installed and updated the EPEL repositories, issue the following install these packages:

    sudo yum install libssl net-snmp net-snmp-libs net-snmp-utils libgsasl
    
  • SUSE Enterprise Linux requires libopenssl0_9_8, libsnmp15, slessp1-libsnmp15, and snmp-mibs. Issue a command such as the following to install these packages:

    sudo zypper install libopenssl0_9_8 libsnmp15 slessp1-libsnmp15 snmp-mibs
    

    Note

    For the 2.4 release, the MongoDB Enterprise for SUSE requires libgsasl which is not available in the default repositories for SUSE.

Install MongoDB Enterprise Binaries

When you have installed the required packages, and downloaded the Enterprise packages you can install the packages using the same procedure as a standard installation of MongoDB on Linux Systems.

Download and Extract Package

Use the sequence of commands below to download and extract MongoDB Enterprise packages appropriate for your distribution:

Ubuntu 12.04
curl http://downloads.10gen.com/linux/mongodb-linux-x86_64-subscription-ubuntu1204-2.4.3.tgz > mongodb.tgz
tar -zxvf mongodb.tgz
cp -R -n mongodb-linux-x86_64-subscription-ubuntu1204-2.4.3/ mongodb
Red Hat Enterprise Linux 6.x
curl http://downloads.10gen.com/linux/mongodb-linux-x86_64-subscription-rhel62-2.4.3.tgz > mongodb.tgz
tar -zxvf mongodb.tgz
cp -R -n mongodb-linux-x86_64-subscription-rhel62-2.4.3/ mongodb
Amazon Linux AMI
curl http://downloads.10gen.com/linux/mongodb-linux-x86_64-subscription-amzn64-2.4.3.tgz > mongodb.tgz
tar -zxvf mongodb.tgz
cp -R -n mongodb-linux-x86_64-subscription-amzn64-2.4.3/ mongodb
SUSE Enterprise Linux
curl http://downloads.10gen.com/linux/mongodb-linux-x86_64-subscription-suse11-2.4.3.tgz > mongodb.tgz
tar -zxvf mongodb.tgz
cp -R -n mongodb-linux-x86_64-subscription-suse11-2.4.3/ mongodb
Running and Using MongoDB

Before you start mongod for the first time, you will need to create the data directory. By default, mongod writes data to the /data/db/ directory. To create this directory, use the following command:

mkdir -p /data/db

Note

Ensure that the system account that will run the mongod process has read and write permissions to this directory. If mongod runs under the mongodb user account, issue the following command to change the owner of this folder:

chown mongodb /data/db

If you use an alternate location for your data directory, ensure that this user can write to your chosen data path.

You can specify, and create, an alternate path using the --dbpath option to mongod and the above command.

The 10gen builds of MongoDB contain no control scripts or method to control the mongod process. You may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your /usr/local/bin or /usr/bin directory for easier use.

For testing purposes, you can start a mongod directly in the terminal without creating a control script:

mongod --config /etc/mongod.conf

Note

The above command assumes that the mongod binary is accessible via your system’s search path, and that you have created a default configuration file located at /etc/mongod.conf.

Among the tools included with this MongoDB distribution, is the mongo shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt:

./bin/mongo

Note

The ./bin/mongo command assumes that the mongo binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz file.

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database and then retrieve that record:

> db.test.save( { a: 1 } )
> db.test.find()

See also

mongo” and “mongo Shell Methods

Further Reading

As you begin to use MongoDB, consider the Getting Started with MongoDB and MongoDB Tutorials resources. To read about features only available in MongoDB Enterprise, consider: Monitor MongoDB with SNMP and Deploy MongoDB with Kerberos Authentication.

After you have installed MongoDB, consider the following documents as you begin to learn about MongoDB:

Getting Started with MongoDB

This tutorial provides an introduction to basic database operations using the mongo shell. mongo is a part of the standard MongoDB distribution and provides a full JavaScript environment with a complete access to the JavaScript language and all standard functions as well as a full database interface for MongoDB. See the mongo JavaScript API documentation and the mongo shell JavaScript Method Reference.

The tutorial assumes that you’re running MongoDB on a Linux or OS X operating system and that you have a running database server; MongoDB does support Windows and provides a Windows distribution with identical operation. For instructions on installing MongoDB and starting the database server see the appropriate installation document.

Connect to a Database

In this section you connect to the database server, which runs as mongod, and begin using the mongo shell to select a logical database within the database instance and access the help text in the mongo shell.

Connect to a mongod

From a system prompt, start mongo by issuing the mongo command, as follows:

mongo

By default, mongo looks for a database server listening on port 27017 on the localhost interface. To connect to a server on a different port or interface, use the --port and --host options.

Select a Database

After starting the mongo shell your session will use the test database for context, by default. At any time issue the following operation at the mongo to report the current database:

db

db returns the name of the current database.

  1. From the mongo shell, display the list of databases with the following operation:

    show dbs
    
  2. Switch to a new database named mydb with the following operation:

    use mydb
    
  3. Confirm that your session has the mydb database as context, using the db operation, which returns the name of the current database as follows:

    db
    

At this point, if you issue the show dbs operation again, it will not include mydb, because MongoDB will not create a database until you insert data into that database. The Create a Collection and Insert Documents section describes the process for inserting data.

New in version 2.4: show databases also returns a list of databases.

Display mongo Help

At any point you can access help for the mongo shell using the following operation:

help

Furthermore, you can append the .help() method to some JavaScript methods, any cursor object, as well as the db and db.collection objects to return additional help information.

Create a Collection and Insert Documents

In this section, you insert documents into a new collection named things within the new database named mydb.

MongoDB will create collections and databases implicitly upon their first use: you do not need to create the database or collection before inserting data. Furthermore, because MongoDB uses dynamic schemas, you do not need to specify the structure of your documents before inserting them into the collection.

Insert Individual Documents
  1. From the mongo shell, confirm that the current context is the mydb database with the following operation:

    db
    
  2. If mongo does not return mydb for the previous operation, set the context to the mydb database with the following operation:

    use mydb
    
  3. Create two documents, named j and k, with the following sequence of JavaScript operations:

    j = { name : "mongo" }
    k = { x : 3 }
    
  4. Insert the j and k documents into the collection things with the following sequence of operations:

    db.things.insert( j )
    db.things.insert( k )
    

    When you insert the first document, the mongod will create both the mydb database and the things collection.

  5. Confirm that the collection named things exists using the following operation:

    show collections
    

    The mongo shell will return the list of the collections in the current (i.e. mydb) database. At this point, the only collection is things. All mongod databases also have a system.indexes collection.

  6. Confirm that the documents exist in the collection things by issuing query on the things collection. Using the find() method in an operation that resembles the following:

    db.things.find()
    

    This operation returns the following results. The ObjectId values will be unique:

    { "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
    { "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
    

    All MongoDB documents must have an _id field with a unique value. These operations do not explicitly specify a value for the _id field, so mongo creates a unique ObjectId value for the field before inserting it into the collection.

Insert Multiple Documents Using a For Loop
  1. From the mongo shell, add more documents to the things collection using the following for loop:

    for (var i = 1; i <= 20; i++) db.things.insert( { x : 4 , j : i } )
    
  2. Query the collection by issuing the following command:

    db.things.find()
    

    The mongo shell displays the first 20 documents in the collection. Your ObjectId values will be different:

    { "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
    { "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
    { "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
    { "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
    { "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
    { "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
    { "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
    { "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
    { "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
    { "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
    { "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
    { "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
    { "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
    { "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
    
  1. The find() returns a cursor. To iterate the cursor and return more documents use the it operation in the mongo shell. The mongo shell will exhaust the cursor, and return the following documents:

    { "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
    { "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
    

For more information on inserting new documents, see the insert() documentation.

Working with the Cursor

When you query a collection, MongoDB returns a “cursor” object that contains the results of the query. The mongo shell then iterates over the cursor to display the results. Rather than returning all results at once, the shell iterates over the cursor 20 times to display the first 20 results and then waits for a request to iterate over the remaining results. This prevents mongo from displaying thousands or millions of results at once.

The it operation allows you to iterate over the next 20 results in the shell. In the previous procedure, the cursor only contained two more documents, and so only two more documents displayed.

The procedures in this section show other ways to work with a cursor. For comprehensive documentation on cursors, see Iterate the Returned Cursor.

Iterate over the Cursor with a Loop
  1. In the MongoDB JavaScript shell, query the things collection and assign the resulting cursor object to the c variable:

    var c = db.things.find()
    
  2. Print the full result set by using a while loop to iterate over the c variable:

    while ( c.hasNext() ) printjson( c.next() )
    

    The hasNext() function returns true if the cursor has documents. The next() method returns the next document. The printjson() method renders the document in a JSON-like format.

    The result of this operation follows, although if the ObjectId values will be unique:

    { "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
    { "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
    { "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
    { "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
    { "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
    { "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
    { "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
    { "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
    { "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
    { "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
    { "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
    { "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
    { "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
    { "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
    { "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
    { "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
    
Use Array Operations with the Cursor

You can manipulate a cursor object as if it were an array. Consider the following procedure:

  1. In the mongo shell, query the things collection and assign the resulting cursor object to the c variable:

    var c = db.things.find()
    
  2. To find the document at the array index 4, use the following operation:

    printjson( c [ 4 ] )
    

    MongoDB returns the following:

    { "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
    

    When you access documents in a cursor using the array index notation, mongo first calls the cursor.toArray() method and loads into RAM all documents returned by the cursor. The index is then applied to the resulting array. This operation iterates the cursor completely and exhausts the cursor.

    For very large result sets, mongo may run out of available memory.

For more information on the cursor, see Iterate the Returned Cursor.

Query for Specific Documents

MongoDB has a rich query system that allows you to select and filter the documents in a collection along specific fields and values. See Query Document and Read for a full account of queries in MongoDB.

In this procedure, you query for specific documents in the things collection by passing a “query document” as a parameter to the find() method. A query document specifies the criteria the query must match to return a document.

To query for specific documents, do the following:

  1. In the mongo shell, query for all documents where the name field has a value of mongo by passing the { name : "mongo" } query document as a parameter to the find() method:

    db.things.find( { name : "mongo" } )
    

    MongoDB returns one document that fits this criteria. The ObjectId value will be different:

    { "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
    
  2. Query for all documents where x has a value of 4 by passing the { x : 4 } query document as a parameter to find():

    db.things.find( { x : 4 } )
    

    MongoDB returns the following result set:

    { "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
    { "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
    { "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
    { "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
    { "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
    { "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
    { "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
    { "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
    { "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
    { "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
    { "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
    { "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
    { "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
    { "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }
    

    ObjectId values are always unique.

  3. Query for all documents where x has a value of 4, as in the previous query, but only return only the value of j. MongoDB will also return the _id field, unless explicitly excluded. To do this, you add the { j : 1 } document as the projection in the second parameter to find(). This operation would resemble the following:

    db.things.find( { x : 4 } , { j : 1 } )
    

    MongoDB returns the following results:

    { "_id" : ObjectId("4c220a42f3924d31102bd856"), "j" : 1 }
    { "_id" : ObjectId("4c220a42f3924d31102bd857"), "j" : 2 }
    { "_id" : ObjectId("4c220a42f3924d31102bd858"), "j" : 3 }
    { "_id" : ObjectId("4c220a42f3924d31102bd859"), "j" : 4 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85a"), "j" : 5 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85b"), "j" : 6 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85c"), "j" : 7 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85d"), "j" : 8 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85e"), "j" : 9 }
    { "_id" : ObjectId("4c220a42f3924d31102bd85f"), "j" : 10 }
    { "_id" : ObjectId("4c220a42f3924d31102bd860"), "j" : 11 }
    { "_id" : ObjectId("4c220a42f3924d31102bd861"), "j" : 12 }
    { "_id" : ObjectId("4c220a42f3924d31102bd862"), "j" : 13 }
    { "_id" : ObjectId("4c220a42f3924d31102bd863"), "j" : 14 }
    { "_id" : ObjectId("4c220a42f3924d31102bd864"), "j" : 15 }
    { "_id" : ObjectId("4c220a42f3924d31102bd865"), "j" : 16 }
    { "_id" : ObjectId("4c220a42f3924d31102bd866"), "j" : 17 }
    { "_id" : ObjectId("4c220a42f3924d31102bd867"), "j" : 18 }
    { "_id" : ObjectId("4c220a42f3924d31102bd868"), "j" : 19 }
    { "_id" : ObjectId("4c220a42f3924d31102bd869"), "j" : 20 }
    
Return a Single Document from a Collection

With the db.collection.findOne() method you can return a single document from a MongoDB collection. The findOne() method takes the same parameters as find(), but returns a document rather than a cursor.

To retrieve one document from the things collection, issue the following command:

db.things.findOne()

For more information on querying for documents, see the Read and Read Operations documentation.

Limit the Number of Documents in the Result Set

You can constrain the size of the result set to increase performance by limiting the amount of data your application must receive over the network.

To specify the maximum number of documents in the result set, call the limit() method on a cursor, as in the following command:

db.things.find().limit(3)

MongoDB will return the following result, with different ObjectId values:

{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
Next Steps with MongoDB

For more information on manipulating the documents in a database as you continue to learn MongoDB, consider the following resources:

Release Notes

You should always install the latest, stable version of MongoDB. Stable versions have an even-numbered minor version number. For example: v2.4 is stable, v2.2, and v2.0 were previously the stable, while v2.1 and v2.3 are a development versions.

Administration

The documentation in this section outlines core administrative tasks and practices that operators of MongoDB will want to consider.

Run-time Database Configuration

The command line and configuration file interfaces provide MongoDB administrators with a large number of options and settings for controlling the operation of the database system. This document provides an overview of common configurations and examples of best-practice configurations for common use cases.

While both interfaces provide access to the same collection of options and settings, this document primarily uses the configuration file interface. If you run MongoDB using a control script or installed from a package for your operating system, you likely already have a configuration file located at /etc/mongodb.conf. Confirm this by checking the content of the /etc/init.d/mongod or /etc/rc.d/mongod script to insure that the control scripts start the mongod with the appropriate configuration file (see below.)

To start MongoDB instance using this configuration issue a command in the following form:

mongod --config /etc/mongodb.conf
mongod -f /etc/mongodb.conf

Modify the values in the /etc/mongodb.conf file on your system to control the configuration of your database instance.

Configure the Database

Consider the following basic configuration:

fork = true
bind_ip = 127.0.0.1
port = 27017
quiet = true
dbpath = /srv/mongodb
logpath = /var/log/mongodb/mongod.log
logappend = true
journal = true

For most standalone servers, this is a sufficient base configuration. It makes several assumptions, but consider the following explanation:

  • fork is true, which enables a daemon mode for mongod, which detaches (i.e. “forks”) the MongoDB from the current session and allows you to run the database as a conventional server.

  • bind_ip is 127.0.0.1, which forces the server to only listen for requests on the localhost IP. Only bind to secure interfaces that the application-level systems can access with access control provided by system network filtering (i.e. “firewall”).

  • port is 27017, which is the default MongoDB port for database instances. MongoDB can bind to any port. You can also filter access based on port using network filtering tools.

    Note

    UNIX-like systems require superuser privileges to attach processes to ports lower than 1024.

  • quiet is true. This disables all but the most critical entries in output/log file. In normal operation this is the preferable operation to avoid log noise. In diagnostic or testing situations, set this value to false. Use setParameter to modify this setting during run time.

  • dbpath is /srv/mongodb, which specifies where MongoDB will store its data files. /srv/mongodb and /var/lib/mongodb are popular locations. The user account that mongod runs under will need read and write access to this directory.

  • logpath is /var/log/mongodb/mongod.log which is where mongod will write its output. If you do not set this value, mongod writes all output to standard output (e.g. stdout.)

  • logappend is true, which ensures that mongod does not overwrite an existing log file following the server start operation.

  • journal is true, which enables journaling. Journaling ensures single instance write-durability. 64-bit builds of mongod enable journaling by default. Thus, this setting may be redundant.

Given the default configuration, some of these values may be redundant. However, in many situations explicitly stating the configuration increases overall system intelligibility.

Security Considerations

The following collection of configuration options are useful for limiting access to a mongod instance. Consider the following:

bind_ip = 127.0.0.1,10.8.0.10,192.168.4.24
nounixsocket = true
auth = true

Consider the following explanation for these configuration decisions:

  • bind_ip” has three values: 127.0.0.1, the localhost interface; 10.8.0.10, a private IP address typically used for local networks and VPN interfaces; and 192.168.4.24, a private network interface typically used for local networks.

    Because production MongoDB instances need to be accessible from multiple database servers, it is important to bind MongoDB to multiple interfaces that are accessible from your application servers. At the same time it’s important to limit these interfaces to interfaces controlled and protected at the network layer.

  • nounixsocket” to true disables the UNIX Socket, which is otherwise enabled by default. This limits access on the local system. This is desirable when running MongoDB on systems with shared access, but in most situations has minimal impact.

  • auth” is true enables the authentication system within MongoDB. If enabled you will need to log in by connecting over the localhost interface for the first time to create user credentials.

Replication and Sharding Configuration

Replication Configuration

Replica set configuration is straightforward, and only requires that the replSet have a value that is consistent among all members of the set. Consider the following:

replSet = set0

Use descriptive names for sets. Once configured use the mongo shell to add hosts to the replica set.

To enable authentication for the replica set, add the following option:

keyFile = /srv/mongodb/keyfile

New in version 1.8: for replica sets, and 1.9.1 for sharded replica sets.

Setting keyFile enables authentication and specifies a key file for the replica set member use to when authenticating to each other. The content of the key file is arbitrary, but must be the same on all members of the replica set and mongos instances that connect to the set. The keyfile must be less than one kilobyte in size and may only contain characters in the base64 set and the file must not have group or “world” permissions on UNIX systems.

See also

The “Replica set Reconfiguration” section for information regarding the process for changing replica set during operation.

Additionally, consider the “Replica Set Security” section for information on configuring authentication with replica sets.

Finally, see the “Replication” index and the “Replica Set Fundamental Concepts” document for more information on replication in MongoDB and replica set configuration in general.

Sharding Configuration

Sharding requires a number of mongod instances with different configurations. The config servers store the cluster’s metadata, while the cluster distributes data among one or more shard servers.

To set up one or three “config server” instances as normal mongod instances, and then add the following configuration option:

configsvr = true

bind_ip = 10.8.0.12
port = 27001

This creates a config server running on the private IP address 10.8.0.12 on port 27001. Make sure that there are no port conflicts, and that your config server is accessible from all of your “mongos” and “mongod” instances.

To set up shards, configure two or more mongod instance using your base configuration, adding the shardsvr setting:

shardsvr = true

Finally, to establish the cluster, configure at least one mongos process with the following settings:

configdb = 10.8.0.12:27001
chunkSize = 64

You can specify multiple configdb instances by specifying hostnames and ports in the form of a comma separated list. In general, avoid modifying the chunkSize from the default value of 64, [1] and should ensure this setting is consistent among all mongos instances.

[1]Chunk size is 64 megabytes by default, which provides the ideal balance between the most even distribution of data, for which smaller chunk sizes are best, and minimizing chunk migration, for which larger chunk sizes are optimal.

See also

The “Sharding” section of the manual for more information on sharding and cluster configuration.

Run Multiple Database Instances on the Same System

In many cases running multiple instances of mongod on a single system is not recommended. On some types of deployments [2] and for testing purposes you may need to run more than one mongod on a single system.

In these cases, use a base configuration for each instance, but consider the following configuration values:

dbpath = /srv/mongodb/db0/
pidfilepath = /srv/mongodb/db0.pid

The dbpath value controls the location of the mongod instance’s data directory. Ensure that each database has a distinct and well labeled data directory. The pidfilepath controls where mongod process places it’s process id file. As this tracks the specific mongod file, it is crucial that file be unique and well labeled to make it easy to start and stop these processes.

Create additional control scripts and/or adjust your existing MongoDB configuration and control script as needed to control these processes.

[2]Single-tenant systems with SSD or other high performance disks may provide acceptable performance levels for multiple mongod instances. Additionally, you may find that multiple databases with small working sets may function acceptably on a single system.

Diagnostic Configurations

The following configuration options control various mongod behaviors for diagnostic purposes. The following settings have default values that tuned for general production purposes:

slowms = 50
profile = 3
verbose = true
diaglog = 3
objcheck = true
cpu = true

Use the base configuration and add these options if you are experiencing some unknown issue or performance problem as needed:

  • slowms configures the threshold for the database profiler to consider a query “slow.” The default value is 100 milliseconds. Set a lower value if the database profiler does not return useful results. See Optimization Strategies for MongoDB for more information on optimizing operations in MongoDB.

  • profile sets the database profiler level. The profiler is not active by default because of the possible impact on the profiler itself on performance. Unless this setting has a value, queries are not profiled.

  • verbose enables a verbose logging mode that modifies mongod output and increases logging to include a greater number of events. Only use this option if you are experiencing an issue that is not reflected in the normal logging level. If you require additional verbosity, consider the following options:

    v = true
    vv = true
    vvv = true
    vvvv = true
    vvvvv = true
    

    Each additional level v adds additional verbosity to the logging. The verbose option is equal to v = true.

  • diaglog enables diagnostic logging. Level 3 logs all read and write options.

  • objcheck forces mongod to validate all requests from clients upon receipt. Use this option to ensure that invalid requests are not causing errors, particularly when running a database with untrusted clients. This option may affect database performance.

  • cpu forces mongod to report the percentage of the last interval spent in write-lock. The interval is typically 4 seconds, and each output line in the log includes both the actual interval since the last report and the percentage of time spent in write lock.

Backup and Recovery Operations for MongoDB

Backup Strategies for MongoDB Systems

Backups are an important part of any operational disaster recovery plan. A good backup plan must be able to capture data in a consistent and usable state, and operators must be able to automate both the backup and the recovery operations. Also test all components of the backup system to ensure that you can recover backed up data as needed. If you cannot effectively restore your database from the backup, then your backups are useless. This document addresses higher level backup strategies, for more information on specific backup procedures consider the following documents:

Backup Considerations

As you develop a backup strategy for your MongoDB deployment consider the following factors:

  • Geography. Ensure that you move some backups away from the your primary database infrastructure.
  • System errors. Ensure that your backups can survive situations where hardware failures or disk errors impact the integrity or availability of your backups.
  • Production constraints. Backup operations themselves sometimes require substantial system resources. It is important to consider the time of the backup schedule relative to peak usage and maintenance windows.
  • System capabilities. Some of the block-level snapshot tools require special support on the operating-system or infrastructure level.
  • Database configuration. Replication and sharding can affect the process and impact of the backup implementation. See Sharded Cluster Backup Considerations and Replica Set Backup Considerations.
  • Actual requirements. You may be able to save time, effort, and space by including only crucial data in the most frequent backups and backing up less crucial data less frequently.
Approaches to Backing Up MongoDB Systems

There are two main methodologies for backing up MongoDB instances. Creating binary “dumps” of the database using mongodump or creating filesystem level snapshots. Both methodologies have advantages and disadvantages:

  • binary database dumps are comparatively small, because they don’t include index content or pre-allocated free space, and record padding. However, it’s impossible to capture a copy of a running system that reflects a single moment in time using a binary dump.
  • filesystem snapshots, sometimes called block level backups, produce larger backup sizes, but complete quickly and can reflect a single moment in time on a running system. However, snapshot systems require filesystem and operating system support and tools.

The best option depends on the requirements of your deployment and disaster recovery needs. Typically, filesystem snapshots are because of their accuracy and simplicity; however, mongodump is a viable option used often to generate backups of MongoDB systems.

The following documents provide details and procedures on the two approaches:

In some cases, taking backups is difficult or impossible because of large data volumes, distributed architectures, and data transmission speeds. In these situations, increase the number of members in your replica set or sets.

Backup Strategies for MongoDB Deployments
Sharded Cluster Backup Considerations

Important

To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a running production system, you can only capture an approximation of point-in-time snapshot.

Sharded clusters complicate backup operations, as distributed systems. True point-in-time backups are only possible when stopping all write activity from the application. To create a precise moment-in-time snapshot of a cluster, stop all application write activity to the database, capture a backup, and allow only write operations to the database after the backup is complete.

However, you can capture a backup of a cluster that approximates a point-in-time backup by capturing a backup from a secondary member of the replica sets that provide the shards in the cluster at roughly the same moment. If you decide to use an approximate-point-in-time backup method, ensure that your application can operate using a copy of the data that does not reflect a single moment in time.

The following documents describe sharded cluster related backup procedures:

Replica Set Backup Considerations

In most cases, backing up data stored in a replica set is similar to backing up data stored in a single instance. It is possible to lock a single secondary database and then create a backup from that instance. When you unlock the database, the secondary will catch up with the primary. You may also choose to deploy a dedicated hidden member for backup purposes.

If you have a sharded cluster where each shard is itself a replica set, you can use this method to create a backup of the entire cluster without disrupting the operation of the node. In these situations you should still turn off the balancer when you create backups.

For any cluster, using a non-primary node to create backups is particularly advantageous in that the backup operation does not affect the performance of the primary. Replication itself provides some measure of redundancy. Nevertheless, keeping point-in time backups of your cluster to provide for disaster recovery and as an additional layer of protection is crucial.

For an overview of backup strategies and considerations for all MongoDB deployments, consider, Backup Strategies for MongoDB Systems. For practical instructions and example backup procedures consider the following documents:

Backup and Recovery Procedures

Use mongodump and mongorestore to Backup and Restore MongoDB Databases

This document describes the process for writing the entire contents of your MongoDB instance to a file in a binary format. If disk-level snapshots are not available, this approach provides the best option for full system database backups. If your system has disk level snapshot capabilities, consider the backup methods described in Use Filesystem Snapshots to Backup and Restore MongoDB Databases.

Backup a Database with mongodump
Basic mongodump Operations

The mongodump utility can back up data by either:

  • connecting to a running mongod or mongos instance, or
  • accessing data files without an active instance.

The utility can create a backup for an entire server, database or collection, or can use a query to backup just part of a collection.

When you run mongodump without any arguments, the command connects to the local database instance (e.g. 127.0.0.1 or localhost) on port 27017 and creates a database backup named dump/ in the current directory.

To backup data from a mongod or mongos instance running on the same machine and on the default port of 27017 use the following command:

mongodump

Note

The format of data created by mongodump tool from the 2.2 distribution or later is different and incompatible with earlier versions of mongod.

To limit the amount of data included in the database dump, you can specify --db and --collection as options to the mongodump command. For example:

mongodump --dbpath /data/db/ --out /data/backup/
mongodump --host mongodb.example.net --port 27017

mongodump will write BSON files that hold a copy of data accessible via the mongod listening on port 27017 of the mongodb.example.net host.

mongodump --collection collection --db test

This command creates a dump of the collection named collection from the database test in a dump/ subdirectory of the current working directory.

Point in Time Operation Using Oplogs

Use the --oplog option with mongodump to collect the oplog entries to build a point-in-time snapshot of a database within a replica set. With --oplog, mongodump copies all the data from the source database as well as all of the oplog entries from the beginning of the backup procedure to until the backup procedure completes. This backup procedure, in conjunction with mongorestore --oplogReplay, allows you to restore a backup that reflects a consistent and specific moment in time.

Create Backups Without a Running mongod Instance

If your MongoDB instance is not running, you can use the --dbpath option to specify the location to your MongoDB instance’s database files. mongodump reads from the data files directly with this operation. This locks the data directory to prevent conflicting writes. The mongod process must not be running or attached to these data files when you run mongodump in this configuration. Consider the following example:

mongodump --dbpath /srv/mongodb
Create Backups from Non-Local mongod Instances

The --host and --port options for mongodump allow you to connect to and backup from a remote host. Consider the following example:

mongodump --host mongodb1.example.net --port 3017 --username user --password pass --out /opt/backup/mongodump-2012-10-24

On any mongodump command you may, as above, specify username and password credentials to specify database authentication.

Restore a Database with mongorestore

The mongorestore utility restores a binary backup created by mongodump. By default, mongorestore looks for a database backup in the dump/ directory.

The mongorestore utility can restore data either by:

  • connecting to a running mongod or mongos directly, or
  • writing to a local database path without use of a running mongod.

The mongorestore utility can restore either an entire database backup or a subset of the backup.

A mongorestore command that connects to an active mongod or mongos has the following prototype form:

mongorestore --port <port number> <path to the backup>

A mongorestore command that writes to data files without using a running mongod has the following prototype form:

mongorestore --dbpath <local database path> <path to the backup>

Consider the following example:

mongorestore dump-2012-10-25/

Here, mongorestore imports the database backup in the dump-2012-10-25 directory to the mongod instance running on the localhost interface.

Restore Point in Time Oplog Backup

If you created your database dump using the --oplog option to ensure a point-in-time snapshot, call mongorestore with the --oplogReplay option, as in the following example:

mongorestore --oplogReplay

You may also consider using the mongorestore --objcheck option to check the integrity of objects while inserting them into the database, or you may consider the mongorestore --drop option to drop each collection from the database before restoring from backups.

Restore a Subset of data from a Binary Database Dump

mongorestore also includes the ability to a filter to all input before inserting it into the new database. Consider the following example:

mongorestore --filter '{"field": 1}'

Here, mongorestore only adds documents to the database from the dump located in the dump/ folder if the documents have a field name field that holds a value of 1. Enclose the filter in single quotes (e.g. ') to prevent the filter from interacting with your shell environment.

Restore without a Running mongod

mongorestore can write data to MongoDB data files without needing to connect to a mongod directly.

mongorestore --dbpath /srv/mongodb --journal

Here, mongorestore restores the database dump located in dump/ folder into the data files located at /srv/mongodb. Additionally, the --journal option ensures that mongorestore records all operation in the durability journal. The journal prevents data file corruption if anything (e.g. power failure, disk failure, etc.) interrupts the restore operation.

See also

mongodump and mongorestore.

Restore Backups to Non-Local mongod Instances

By default, mongorestore connects to a MongoDB instance running on the localhost interface (e.g. 127.0.0.1) and on the default port (27017). If you want to restore to a different host or port, use the --host and --port options.

Consider the following example:

mongorestore --host mongodb1.example.net --port 3017 --username user --password pass /opt/backup/mongodump-2012-10-24

As above, you may specify username and password connections if your mongod requires authentication.

Use Filesystem Snapshots to Backup and Restore MongoDB Databases

This document describes a procedure for creating backups of MongoDB systems using system-level tools, such as LVM or storage appliance, as well as the corresponding restoration strategies.

These filesystem snapshots, or “block-level” backup methods use system level tools to create copies of the device that holds MongoDB’s data files. These methods complete quickly and work reliably, but require more system configuration outside of MongoDB.

Snapshots Overview

Snapshots work by creating pointers between the live data and a special snapshot volume. These pointers are theoretically equivalent to “hard links.” As the working data diverges from the snapshot, the snapshot process uses a copy-on-write strategy. As a result the snapshot only stores modified data.

After making the snapshot, you mount the snapshot image on your file system and copy data from the snapshot. The resulting backup contains a full copy of all data.

Snapshots have the following limitations:

  • The database must be in a consistent or recoverable state when the snapshot takes place. This means that all writes accepted by the database need to be fully written to disk: either to the journal or to data files.

    If all writes are not on disk when the backup occurs, the backup will not reflect these changes. If writes are in progress when the backup occurs, the data files will reflect an inconsistent state. With journaling all data-file states resulting from in-progress writes are recoverable; without journaling you must flush all pending writes to disk before running the backup operation and must ensure that no writes occur during the entire backup procedure.

    If you do use journaling, the journal must reside on the same volume as the data.

  • Snapshots create an image of an entire disk image. Unless you need to back up your entire system, consider isolating your MongoDB data files, journal (if applicable), and configuration on one logical disk that doesn’t contain any other data.

    Alternately, store all MongoDB data files on a dedicated device so that you can make backups without duplicating extraneous data.

  • Ensure that you copy data from snapshots and onto other systems to ensure that data is safe from site failures.

  • Although different snapshots methods provide different capability, the LVM method outlined below does not provide any capacity for capturing incremental backups.

Snapshots With Journaling

If your mongod instance has journaling enabled, then you can use any kind of file system or volume/block level snapshot tool to create backups.

If you manage your own infrastructure on a Linux-based system, configure your system with LVM to provide your disk packages and provide snapshot capability. You can also use LVM-based setups within a cloud/virtualized environment.

Note

Running LVM provides additional flexibility and enables the possibility of using snapshots to back up MongoDB.

Snapshots with Amazon EBS in a RAID 10 Configuration

If your deployment depends on Amazon’s Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform’s snapshot tool. As an alternative, you can do one of the following:

Backup and Restore Using LVM on a Linux System

This section provides an overview of a simple backup process using LVM on a Linux system. While the tools, commands, and paths may be (slightly) different on your system the following steps provide a high level overview of the backup operation.

Note

Only use the following procedure as a guideline for a backup system and infrastructure. Production backup systems must consider a number of application specific requirements and factors unique to specific environments.

Create a Snapshot

To create a snapshot with LVM, issue a command as root in the following format:

lvcreate --size 100M --snapshot --name mdb-snap01 /dev/vg0/mongodb

This command creates an LVM snapshot (with the --snapshot option) named mdb-snap01 of the mongodb volume in the vg0 volume group.

This example creates a snapshot named mdb-snap01 located at /dev/vg0/mdb-snap01. The location and paths to your systems volume groups and devices may vary slightly depending on your operating system’s LVM configuration.

The snapshot has a cap of at 100 megabytes, because of the parameter --size 100M. This size does not reflect the total amount of the data on the disk, but rather the quantity of differences between the current state of /dev/vg0/mongodb and the creation of the snapshot (i.e. /dev/vg0/mdb-snap01.)

Warning

Ensure that you create snapshots with enough space to account for data growth, particularly for the period of time that it takes to copy data out of the system or to a temporary image.

If your snapshot runs out of space, the snapshot image becomes unusable. Discard this logical volume and create another.

The snapshot will exist when the command returns. You can restore directly from the snapshot at any time or by creating a new logical volume and restoring from this snapshot to the alternate image.

While snapshots are great for creating high quality backups very quickly, they are not ideal as a format for storing backup data. Snapshots typically depend and reside on the same storage infrastructure as the original disk images. Therefore, it’s crucial that you archive these snapshots and store them elsewhere.

Archive a Snapshot

After creating a snapshot, mount the snapshot and move the data to separate storage. Your system might try to compress the backup images as you move the offline. The following procedure fully archives the data from the snapshot:

umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | gzip > mdb-snap01.gz

The above command sequence does the following:

  • Ensures that the /dev/vg0/mdb-snap01 device is not mounted.

  • Performs a block level copy of the entire snapshot image using the dd command and compresses the result in a gzipped file in the current working directory.

    Warning

    This command will create a large gz file in your current working directory. Make sure that you run this command in a file system that has enough free space.

Restore a Snapshot

To restore a snapshot created with the above method, issue the following sequence of commands:

lvcreate --size 1G --name mdb-new vg0
gzip -d -c mdb-snap01.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb

The above sequence does the following:

  • Creates a new logical volume named mdb-new, in the /dev/vg0 volume group. The path to the new device will be /dev/vg0/mdb-new.

    Warning

    This volume will have a maximum size of 1 gigabyte. The original file system must have had a total size of 1 gigabyte or smaller, or else the restoration will fail.

    Change 1G to your desired volume size.

  • Uncompresses and unarchives the mdb-snap01.gz into the mdb-new disk image.

  • Mounts the mdb-new disk image to the /srv/mongodb directory. Modify the mount point to correspond to your MongoDB data file location, or other location as needed.

Note

The restored snapshot will have a stale mongod.lock file. If you do not remove this file from the snapshot, and MongoDB may assume that the stale lock file indicates an unclean shutdown. If you’re running with journal enabled, and you do not use db.fsyncLock(), you do not need to remove the mongod.lock file. If you use db.fsyncLock() you will need to remove the lock.

Restore Directly from a Snapshot

To restore a backup without writing to a compressed gz file, use the following sequence of commands:

umount /dev/vg0/mdb-snap01
lvcreate --size 1G --name mdb-new vg0
dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
Remote Backup Storage

You can implement off-system backups using the combined process and SSH.

This sequence is identical to procedures explained above, except that it archives and compresses the backup on a remote system using SSH.

Consider the following procedure:

umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | ssh username@example.com gzip > /opt/backup/mdb-snap01.gz
lvcreate --size 1G --name mdb-new vg0
ssh username@example.com gzip -d -c /opt/backup/mdb-snap01.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
Create Backups on Instances that do not have Journaling Enabled

If your mongod instance does not run with journaling enabled, or if your journal is on a separate volume, obtaining a functional backup of a consistent state is more complicated. As described in this section, you must flush all writes to disk and lock the database to prevent writes during the backup process. If you have a replica set configuration, then for your backup use a secondary which is not receiving reads (i.e. hidden member).

  1. To flush writes to disk and to “lock” the database (to prevent further writes), issue the db.fsyncLock() method in the mongo shell:

    db.fsyncLock();
    
  2. Perform the backup operation described in Create a Snapshot.

  3. To unlock the database after the snapshot has completed, use the following command in the mongo shell:

    db.fsyncUnlock();
    

    Note

    Changed in version 2.0: MongoDB 2.0 added db.fsyncLock() and db.fsyncUnlock() helpers to the mongo shell. Prior to this version, use the fsync command with the lock option, as follows:

    db.runCommand( { fsync: 1, lock: true } );
    db.runCommand( { fsync: 1, lock: false } );
    

    Note

    The database cannot be locked with db.fsyncLock() while profiling is enabled. You must disable profiling before locking the database with db.fsyncLock(). Disable profiling using db.setProfilingLevel() as follows in the mongo shell:

    db.setProfilingLevel(0)
    

    Warning

    Changed in version 2.2: When used in combination with fsync or db.fsyncLock(), mongod may block some reads, including those from mongodump, when queued write operation waits behind the fsync lock.

Copy Databases Between Instances
Synopsis

MongoDB provides the copydb and clone database commands to support migrations of entire logical databases between mongod instances. With these commands you can copy data between instances with a simple interface without the need for an intermediate stage. The db.cloneDatabase() and db.copyDatabase() provide helpers for these operations in the mongo shell.

Data migrations that require an intermediate stage or that involve more than one database instance are beyond the scope of this tutorial. copydb and clone are more ideal for use cases that resemble the following use cases:

  • data migrations,
  • data warehousing, and
  • seeding test environments.

Also consider the Backup Strategies for MongoDB Systems and Import and Export MongoDB Data documentation for more related information.

Note

copydb and clone do not produce point-in-time snapshots of the source database. Write traffic to the source or destination database during the copy process will result divergent data sets.

Considerations
  • You must run copydb or clone on the destination server.
  • You cannot use copydb or clone with databases that have a sharded collection in a sharded cluster, or any database via a mongos.
  • You can use copydb or clone with databases that do not have sharded collections in a cluster when you’re connected directly to the mongod instance.
  • You can run copydb or clone commands on a secondary member of a replica set, with properly configured read preference.
  • Each destination mongod instance must have enough free disk space on the destination server for the database you are copying. Use the db.stats() operation to check the size of the database on the source mongod instance. For more information, see db.stats().
Processes
Copy and Rename a Database

To copy a database from one MongoDB instance to another and rename the database in the process, use the copydb command, or the db.copyDatabase() helper in the mongo shell.

Use the following procedure to copy the database named test on server db0.example.net to the server named db1.example.net and rename it to records in the process:

  • Verify that the database, test exists on the source mongod instance running on the db0.example.net host.

  • Connect to the destination server, running on the db1.example.net host, using the mongo shell.

  • Model your operation on the following command:

    db.copyDatabase( "test", "records", db0.example.net )
    
Rename a Database

You can also use copydb or the db.copyDatabase() helper to:

  • rename a database within a single MongoDB instance or
  • create a duplicate database for testing purposes.

Use the following procedure to rename the test database records on a single mongod instance:

  • Connect to the mongod using the mongo shell.

  • Model your operation on the following command:

    db.copyDatabase( "test", "records" )
    
Copy a Database with Authentication

To copy a database from a source MongoDB instance that has authentication enabled, you can specify authentication credentials to the copydb command or the db.copyDatabase() helper in the mongo shell.

In the following operation, you will copy the test database from the mongod running on db0.example.net to the records database on the local instance (e.g. db1.example.net.) Because the mongod instance running on db0.example.net requires authentication for all connections, you will need to pass db.copyDatabase() authentication credentials, as in the following procedure:

  • Connect to the destination mongod instance running on the db1.example.net host using the mongo shell.

  • Issue the following command:

    db.copyDatabase( "test", "records", db0.example.net, "<username>", "<password>")
    

Replace <username> and <password> with your authentication credentials.

Clone a Database

The clone command copies a database between mongod instances like copydb; however, clone preserves the database name from the source instance on the destination mongod.

For many operations, clone is functionally equivalent to copydb, but it has a more simple syntax and a more narrow use. The mongo shell provides the db.cloneDatabase() helper as a wrapper around clone.

You can use the following procedure to clone a database from the mongod instance running on db0.example.net to the mongod running on db1.example.net:

  • Connect to the destination mongod instance running on the db1.example.net host using the mongo shell.

  • Issue the following command to specify the name of the database you want to copy:

    use records
    
  • Use the following operation to initiate the clone operation:

    db.cloneDatabase( "db0.example.net" )
    
Recover MongoDB Data following Unexpected Shutdown

If MongoDB does not shutdown cleanly [1] the on-disk representation of the data files will likely reflect an inconsistent state which could lead to data corruption. [2]

To prevent data inconsistency and corruption, always shut down the database cleanly and use the durability journaling. MongoDB writes data to the journal, by default, every 100 milliseconds, such that MongoDB can always recover to a consistent state even in the case of an unclean shutdown due to power loss or other system failure.

If you are not running as part of a replica set and do not have journaling enabled, use the following procedure to recover data that may be in an inconsistent state. If you are running as part of a replica set, you should always restore from a backup or restart the mongod instance with an empty dbpath and allow MongoDB to perform an initial sync to restore the data.

See also

The Administration documents, including Replica Set Syncing, and the documentation on the repair, repairpath, and journal settings.

[1]To ensure a clean shut down, use the mongod --shutdown option, your control script, “Control-C” (when running mongod in interactive mode,) or kill $(pidof mongod) or kill -2 $(pidof mongod).
[2]You can also use the db.collection.validate() method to test the integrity of a single collection. However, this process is time consuming, and without journaling you can safely assume that the data is in an invalid state and you should either run the repair operation or resync from an intact member of the replica set.
Process
Indications

When you are aware of a mongod instance running without journaling that stops unexpectedly and you’re not running with replication, you should always run the repair operation before starting MongoDB again. If you’re using replication, then restore from a backup and allow replication to perform an initial sync to restore data.

If the mongod.lock file in the data directory specified by dbpath, /data/db by default, is not a zero-byte file, then mongod will refuse to start, and you will find a message that contains the following line in your MongoDB log our output:

Unclean shutdown detected.

This indicates that you need to remove the lockfile and run repair. If you run repair when the mongodb.lock file exists without the mongod --repairpath option, you will see a message that contains the following line:

old lock file: /data/db/mongod.lock. probably means unclean shutdown

You must remove the lockfile and run the repair operation before starting the database normally using the following procedure:

Overview

Warning

Recovering a member of a replica set.

Do not use this procedure to recover a member of a replica set. Instead you should either restore from a backup or perform an initial sync using data from an intact member of the set, as described in Resync a Member of a Replica Set.

There are two processes to repair data files that result from an unexpected shutdown:

  1. Use the --repair option in conjunction with the --repairpath option. mongod will read the existing data files, and write the existing data to new data files. This does not modify or alter the existing data files.

    You do not need to remove the mongod.lock file before using this procedure.

  2. Use the --repair option. mongod will read the existing data files, write the existing data to new files and replace the existing, possibly corrupt, files with new files.

    You must remove the mongod.lock file before using this procedure.

Note

--repair functionality is also available in the shell with the db.repairDatabase() helper for the repairDatabase command.

Procedures

To repair your data files using the --repairpath option to preserve the original data files unmodified:

  1. Start mongod using --repair to read the existing data files.

    mongod --dbpath /data/db --repair --repairpath /data/db0
    

    When this completes, the new repaired data files will be in the /data/db0 directory.

  2. Start mongod using the following invocation to point the dbpath at /data/db0:

    mongod --dbpath /data/db0
    

    Once you confirm that the data files are operational you may delete or archive the data files in the /data/db directory.

To repair your data files without preserving the original files, do not use the --repairpath option, as in the following procedure:

  1. Remove the stale lock file:

    rm /data/db/mongod.lock
    

    Replace /data/db with your dbpath where your MongoDB instance’s data files reside.

    Warning

    After you remove the mongod.lock file you must run the --repair process before using your database.

  2. Start mongod using --repair to read the existing data files.

    mongod --dbpath /data/db --repair
    

    When this completes, the repaired data files will replace the original data files in the /data/db directory.

  3. Start mongod using the following invocation to point the dbpath at /data/db:

    mongod --dbpath /data/db
    
mongod.lock

In normal operation, you should never remove the mongod.lock file and start mongod. Instead consider the one of the above methods to recover the database and remove the lock files. In dire situations you can remove the lockfile, and start the database using the possibly corrupt files, and attempt to recover data from the database; however, it’s impossible to predict the state of the database in these situations.

If you are not running with journaling, and your database shuts down unexpectedly for any reason, you should always proceed as if your database is in an inconsistent and likely corrupt state. If at all possible restore from backup or, if running as a replica set, restore by performing an initial sync using data from an intact member of the set, as described in Resync a Member of a Replica Set.

Backup and Restore Sharded Clusters

Backup a Small Sharded Cluster with mongodump
Overview

If your sharded cluster holds a small data set, you can connect to a mongos using mongodump. You can create backups of your MongoDB cluster, if your backup infrastructure can capture the entire backup in a reasonable amount of time and if you have a storage system that can hold the complete MongoDB data set.

Read Sharded Cluster Backup Considerations for a high-level overview of important considerations as well as a list of alternate backup tutorials.

Important

By default mongodump issue its queries to the non-primary nodes.

Procedure
Capture Data

Note

If you use mongodump without specifying a database or collection, mongodump will capture collection data and the cluster meta-data from the config servers.

You cannot use the --oplog option for mongodump when capturing data from mongos. This option is only available when running directly against a replica set member.

You can perform a backup of a sharded cluster by connecting mongodump to a mongos. Use the following operation at your system’s prompt:

mongodump --host mongos3.example.net --port 27017

mongodump will write BSON files that hold a copy of data stored in the sharded cluster accessible via the mongos listening on port 27017 of the mongos3.example.net host.

Restore Data

Backups created with mongodump do not reflect the chunks or the distribution of data in the sharded collection or collections. Like all mongodump output, these backups contain separate directories for each database and BSON files for each collection in that database.

You can restore mongodump output to any MongoDB instance, including a standalone, a replica set, or a new sharded cluster. When restoring data to sharded cluster, you must deploy and configure sharding before restoring data from the backup. See Deploy a Sharded Cluster for more information.

Create Backup of a Sharded Cluster with Filesystem Snapshots
Overview

This document describes a procedure for taking a backup of all components of a sharded cluster. This procedure uses file system snapshots to capture a copy of the mongod instance. An alternate procedure that uses mongodump to create binary database dumps when file-system snapshots are not available. See Create Backup of a Sharded Cluster with Database Dumps for the alternate procedure.

See Sharded Cluster Backup Considerations for a full higher level overview backing up a sharded cluster as well as links to other tutorials that provide alternate procedures.

Important

To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a running production system, you can only capture an approximation of point-in-time snapshot.

Procedure

In this procedure, you will stop the cluster balancer and take a backup up of the config database, and then take backups of each shard in the cluster using a file-system snapshot tool. If you need an exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the filesystem snapshots; otherwise the snapshot will only approximate a moment in time.

For approximate point-in-time snapshots, you can improve the quality of the backup while minimizing impact on the cluster by taking the backup from a secondary member of the replica set that provides each shard.

  1. Disable the balancer process that equalizes the distribution of data among the shards. To disable the balancer, use the sh.stopBalancer() method in the mongo shell, and see the Disable the Balancer procedure.

    Warning

    It is essential that you stop the balancer before creating backups. If the balancer remains active, your resulting backups could have duplicate data or miss some data, as chunks may migrate while recording backups.

  2. Lock one member of each replica set in each shard so that your backups reflect the state of your database at the nearest possible approximation of a single moment in time. Lock these mongod instances in as short of an interval as possible.

    To lock or freeze a sharded cluster, you must:

    • use the db.fsyncLock() method in the mongo shell connected to a single secondary member of the replica set that provides shard mongod instance.
    • Shutdown one of the config servers, to prevent all metadata changes during the backup process.
  3. Use mongodump to backup one of the config servers. This backs up the cluster’s metadata. You only need to back up one config server, as they all hold the same data.

    Issue this command against one of the config mongod instances or via the mongos:

    mongodump --db config
    
  4. Back up the replica set members of the shards that you locked. You may back up the shards in parallel. For each shard, create a snapshot. Use the procedures in Use Filesystem Snapshots to Backup and Restore MongoDB Databases.

  5. Unlock all locked replica set members of each shard using the db.fsyncUnlock() method in the mongo shell.

  6. Restore the balancer with the sh.startBalancer() method according to the Disable the Balancer procedure.

    Use the following command sequence when connected to the mongos with the mongo shell:

    use config
    sh.startBalancer()
    
Create Backup of a Sharded Cluster with Database Dumps
Overview

This document describes a procedure for taking a backup of all components of a sharded cluster. This procedure uses mongodump to create dumps of the mongod instance. An alternate procedure uses file system snapshots to capture the backup data, and may be more efficient in some situations if your system configuration allows file system backups. See Create Backup of a Sharded Cluster with Filesystem Snapshots.

See Sharded Cluster Backup Considerations for a full higher level overview of backing up a sharded cluster as well as links to other tutorials that provide alternate procedures.

Important

To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a running production system, you can only capture an approximation of point-in-time snapshot.

Procedure

In this procedure, you will stop the cluster balancer and take a backup up of the config database, and then take backups of each shard in the cluster using mongodump to capture the backup data. If you need an exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the filesystem snapshots; otherwise the snapshot will only approximate a moment of time.

For approximate point-in-time snapshots, you can improve the quality of the backup while minimizing impact on the cluster by taking the backup from a secondary member of the replica set that provides each shard.

  1. Disable the balancer process that equalizes the distribution of data among the shards. To disable the balancer, use the sh.stopBalancer() method in the mongo shell, and see the Disable the Balancer procedure.

    Warning

    It is essential that you stop the balancer before creating backups. If the balancer remains active, your resulting backups could have duplicate data or miss some data, as chunks migrate while recording backups.

  2. Lock one member of each replica set in each shard so that your backups reflect the state of your database at the nearest possible approximation of a single moment in time. Lock these mongod instances in as short of an interval as possible.

    To lock or freeze a sharded cluster, you must:

    • Shutdown one member of each replica set.

      Ensure that the oplog has sufficient capacity to allow these secondaries to catch up to the state of the primaries after finishing the backup procedure. See Oplog for more information.

    • Shutdown one of the config servers, to prevent all metadata changes during the backup process.

  3. Use mongodump to backup one of the config servers. This backs up the cluster’s metadata. You only need to back up one config server, as they all hold the same data.

    Issue this command against one of the config mongod instances or via the mongos:

    mongodump --journal --db config
    
  4. Back up the replica set members of the shards that shut down using mongodump and specifying the --dbpath option. You may back up the shards in parallel. Consider the following invocation:

    mongodump --journal --dbpath /data/db/ --out /data/backup/
    

    You must run this command on the system where the mongod ran. This operation will use journaling and create a dump of the entire mongod instance with data files stored in /data/db/. mongodump will write the output of this dump to the /data/backup/ directory.

  5. Restart all stopped replica set members of each shard as normal and allow them to catch up with the state of the primary.

  6. Restore the balancer with the sh.startBalancer() method according to the Disable the Balancer procedure.

    Use the following command sequence when connected to the mongos with the mongo shell:

    use config
    sh.startBalancer()
    
Restore a Single Shard
Overview

Restoring a single shard from backup with other unaffected shards requires a number of special considerations and practices. This document outlines the additional tasks you must perform when restoring a single shard.

Consider the following resources on backups in general as well as backup and restoration of sharded clusters specifically:

Procedure

Always restore sharded clusters as a whole. When you restore a single shard, keep in mind that the balancer process might have moved chunks to or from this shard since the last backup. If that’s the case, you must manually move those chunks, as described in this procedure.

  1. Restore the shard as you would any other mongod instance. See Backup Strategies for MongoDB Systems for overviews of these procedures.
  2. For all chunks that migrate away from this shard, you do not need to do anything at this time. You do not need to delete these documents from the shard because the chunks are automatically filtered out from queries by mongos. You can remove these documents from the shard, if you like, at your leisure.
  3. For chunks that migrate to this shard after the most recent backup, you must manually recover the chunks using backups of other shards, or some other source. To determine what chunks have moved, view the changelog collection in the Config Database.
Restore Sharded Clusters
Overview

The procedure outlined in this document addresses how to restore an entire sharded cluster. For information on related backup procedures consider the following tutorials which describe backup procedures in greater detail:

The exact procedure used to restore a database depends on the method used to capture the backup. See the Backup Strategies for MongoDB Systems document for an overview of backups with MongoDB, as well as Sharded Cluster Backup Considerations which provides an overview of the high level concepts important for backing up sharded clusters.

Procedure
  1. Stop all mongod and mongos processes.

  2. If shard hostnames have changed, you must manually update the shards collection in the Config Database to use the new hostnames. Do the following:

    1. Start the three config servers by issuing commands similar to the following, using values appropriate to your configuration:

      mongod --configsvr --dbpath /data/configdb --port 27019
      
    2. Restore the Config Database on each config server.

    3. Start one mongos instance.

    4. Update the Config Database collection named shards to reflect the new hostnames.

  3. Restore the following:

    • Data files for each server in each shard. Because replica sets provide each production shard, restore all the members of the replica set or use the other standard approaches for restoring a replica set from backup. See the Restore a Snapshot and Restore a Database with mongorestore sections for details on these procedures.
    • Data files for each config server, if you have not already done so in the previous step.
  4. Restart all the mongos instances.

  5. Restart all the mongod instances.

  6. Connect to a mongos instance from a mongo shell and use the db.printShardingStatus() method to ensure that the cluster is operational, as follows:

    db.printShardingStatus()
    show collections
    
Schedule Backup Window for Sharded Clusters
Overview

In a sharded cluster, the balancer process is responsible for distributing sharded data around the cluster, so that each shard has roughly the same amount of data.

However, when creating backups from a sharded cluster it is important that you disable the balancer while taking backups to ensure that no chunk migrations affect the content of the backup captured by the backup procedure. Using the procedure outlined in the section Disable the Balancer you can manually stop the balancer process temporarily. As an alternative you can use this procedure to define a balancing window so that the balancer is always disabled during your automated backup operation.

Procedure

If you have an automated backup schedule, you can disable all balancing operations for a period of time. For instance, consider the following command:

use config
db.settings.update( { _id : "balancer" }, { $set : { activeWindow : { start : "6:00", stop : "23:00" } } }, true )

This operation configures the balancer to run between 6:00am and 11:00pm, server time. Schedule your backup operation to run and complete outside of this time. Ensure that the backup can complete outside the window when the balancer is running and that the balancer can effectively balance the collection among the shards in the window allotted to each.

Data Center Awareness

MongoDB provides a number of features that allow application developers and database administrators to customize the behavior of a sharded cluster or replica set deployment so that MongoDB may be more “data center aware,” or allow operational and location-based separation.

MongoDB also supports segregation based on functional parameters, to ensure that certain mongod instances are only used for reporting workloads or that certain high-frequency portions of a sharded collection only exist on specific shards.

Consider the following documents:

Operational Segregation in MongoDB Operations and Deployments

Operational Overview

MongoDB includes a number of features that allow database administrators and developers to segregate application operations to MongoDB deployments by functional or geographical groupings.

This capability provides “data center awareness,” which allows applications to target MongoDB deployments with consideration of the physical location of the mongod instances. MongoDB supports segmentation of operations across different dimensions, which may include multiple data centers and geographical regions in multi-data center deployments, racks, networks, or power circuits in single data center deployments.

MongoDB also supports segregation of database operations based on functional or operational parameters, to ensure that certain mongod instances are only used for reporting workloads or that certain high-frequency portions of a sharded collection only exist on specific shards.

Specifically, with MongoDB, you can:

  • ensure write operations propagate to specific members of a replica set, or to specific members of replica sets.
  • ensure that specific members of a replica set respond to queries.
  • ensure that specific ranges of your shard key balance onto and reside on specific shards.
  • combine the above features in a single distributed deployment, on a per-operation (for read and write operations) and collection (for chunk distribution in sharded clusters distribution) basis.

For full documentation of these features, see the following documentation in the MongoDB Manual:

  • Read Preferences, which controls how drivers help applications target read operations to members of a replica set.
  • Write Concerns, which controls how MongoDB ensures that write operations propagate to members of a replica set.
  • Replica Set Tags, which control how applications create and interact with custom groupings of replica set members to create custom application-specific read preferences and write concerns.
  • Tag Aware Sharding, which allows MongoDB administrators to define an application-specific balancing policy, to control how documents belonging to specific ranges of a shard key distribute to shards in the sharded cluster.

See also

Before adding operational segregation features to your application and MongoDB deployment, become familiar with all documentation of replication and sharding, particularly Replica Set Fundamental Concepts and Sharded Cluster Overview.

Tag Aware Sharding

For sharded clusters, MongoDB makes it possible to associate specific ranges of a shard key with a specific shard or subset of shards. This association dictates the policy of the cluster balancer process as it balances the chunks around the cluster. This capability enables the following deployment patterns:

  • isolating a specific subset of data on specific set of shards.
  • controlling the balancing policy so that, in a geographically distributed cluster, the most relevant portions of the data set reside on the shards with the greatest proximity to the application servers.

This document describes the behavior, operation, and use of tag aware sharding in MongoDB deployments.

Note

Shard key range tags are entirely distinct from replica set member tags.

Hash-based sharding does not support tag-aware sharding.

Behavior and Operations

Tags in a sharded cluster are pieces of metadata that dictate the policy and behavior of the cluster balancer. Using tags, you may associate individual shards in a cluster with one or more tags. Then, you can assign this tag string to a range of shard key values for a sharded collection. When migrating a chunk, the balancer will select a destination shard based on the configured tag ranges.

The balancer migrates chunks in tagged ranges to shards with those tags, if tagged shards are not balanced. [1]

Note

Because a single chunk may span different tagged shard key ranges, the balancer may migrate chunks to tagged shards that contain values that exceed the upper bound of the selected tag range.

Example

Given a sharded collection with two configured tag ranges, such that:

  • Shard key values between 100 and 200 have tags to direct corresponding chunks to shards tagged NYC.
  • Shard Key values between 200 and 300 have tags to direct corresponding chunks to shards tagged SFO.

In this cluster, the balancer will migrate a chunk with shard key values ranging between 150 and 220 to a shard tagged NYC, since 150 is closer to 200 than 300.

After configuring tags on the shards and ranges of the shard key, the cluster may take some time to reach the proper distribution of data, depending on the division of chunks (i.e. splits) and the current distribution of data in the cluster. Once configured, the balancer will respect tag ranges during future balancing rounds.

[1]To migrate chunks in a tagged environment, the balancer selects a target shard with a tag range that has an upper bound that is greater than the migrating chunk’s lower bound. If a shard with a matching tagged range exists, the balancer will migrate the chunk to that shard.

Administer and Manage Shard Tags

In a sharded cluster, you can use tags to associate specific ranges of a shard key with a specific shard or subset of shards.

Tag a Shard

Associate tags with a particular shard using the sh.addShardTag() method when connected to a mongos instance. A single shard may have multiple tags, and multiple shards may also have the same tag.

Example

The following example adds the tag NYC to two shards, and the tags SFO and NRT to a third shard:

sh.addShardTag("shard0000", "NYC")
sh.addShardTag("shard0001", "NYC")
sh.addShardTag("shard0002", "SFO")
sh.addShardTag("shard0002", "NRT")

You may remove tags from a particular shard using the sh.removeShardTag() method when connected to a mongos instance, as in the following example, which removes the NRT tag from a shard:

sh.removeShardTag("shard0002", "NRT")
Tag a Shard Key Range

To assign a tag to a range of shard keys use the sh.addTagRange() method when connected to a mongos instance. Any given shard key range may only have one assigned tag. You cannot overlap defined ranges, or tag the same range more than once.

Example

Given a collection named users in the records database, sharded by the zipcode field. The following operations assign:

  • two ranges of zip codes in Manhattan and Brooklyn the NYC tag
  • one range of zip codes in San Francisco the SFO tag
sh.addTagRange("records.users", { zipcode: "10001" }, { zipcode: "10281" }, "NYC")
sh.addTagRange("records.users", { zipcode: "11201" }, { zipcode: "11240" }, "NYC")
sh.addTagRange("records.users", { zipcode: "94102" }, { zipcode: "94135" }, "SFO")

Note

Shard ranges are always inclusive of the lower value and exclusive of the upper boundary.

Remove a Tag From a Shard Key Range

The mongod does not provide a helper for removing a tag range. You may delete tag assignment from a shard key range by removing the corresponding document from the tags collection of the config database.

Each document in the tags holds the namespace of the sharded collection and a minimum shard key value.

Example

The following example removes the NYC tag assignment for the range of zip codes within Manhattan:

use config
db.tags.remove({ _id: { ns: "records.users", min: { zipcode: "10001" }}, tag: "NYC" })
View Existing Shard Tags

The output from sh.status() lists tags associated with a shard, if any, for each shard. A shard’s tags exist in the shard’s document in the shards collection of the config database. To return all shards with a specific tag, use a sequence of operations that resemble the following, which will return only those shards tagged with NYC:

use config
db.shards.find({ tags: "NYC" })

You can find tag ranges for all namespaces in the tags collection of the config database. The output of sh.status() displays all tag ranges. To return all shard key ranges tagged with NYC, use the following sequence of operations:

use config
db.tags.find({ tags: "NYC" })

Deploy a Geographically Distributed Replica Set

This tutorial outlines the process for deploying a replica set with members in multiple locations. The tutorial addresses three-member sets, four-member sets, and sets with more than four members.

For appropriate background, see Replica Set Fundamental Concepts and Replica Set Architectures and Deployment Patterns. For related tutorials, see Deploy a Replica Set and Add Members to a Replica Set.

Overview

While replica sets provide basic protection against single-instance failure, when all of the members of a replica set reside in a single facility, the replica set is still susceptible to some classes of errors in that facility including power outages, networking distortions, and natural disasters. To protect against these classes of failures, deploy a replica set with one or more members in a geographically distinct facility or data center.

Requirements

For a three-member replica set you need two instances in a primary facility (hereafter, “Site A”) and one member in a secondary facility (hereafter, “Site B”.) Site A should be the same facility or very close to your primary application infrastructure (i.e. application servers, caching layer, users, etc.)

For a four-member replica set you need two members in Site A, two members in Site B (or one member in Site B and one member in Site C,) and a single arbiter in Site A.

For replica sets with additional members in the secondary facility or with multiple secondary facilities, the requirements are the same as above but with the following notes:

  • Ensure that a majority of the voting members are within Site A. This includes secondary-only members and arbiters For more information on the need to keep the voting majority on one site, see Elections.
  • If you deploy a replica set with an uneven number of members, deploy an arbiter on Site A. The arbiter must be on site A to keep the majority there.

For all configurations in this tutorial, deploy each replica set member on a separate system. Although you may deploy more than one replica set member on a single system, doing so reduces the redundancy and capacity of the replica set. Such deployments are typically for testing purposes and beyond the scope of this tutorial.

Procedures
Deploy a Distributed Three-Member Replica Set

A geographically distributed three-member deployment has the following features:

  • Each member of the replica set resides on its own machine, and the MongoDB processes all bind to port 27017, which is the standard MongoDB port.

  • Each member of the replica set must be accessible by way of resolvable DNS or hostnames in the following scheme:

    • mongodb0.example.net
    • mongodb1.example.net
    • mongodb2.example.net

    Configure DNS names appropriately, or set up your systems’ /etc/hosts file to reflect this configuration. Ensure that one system (e.g. mongodb2.example.net) resides in Site B. Host all other systems in Site A.

  • Ensure that network traffic can pass between all members in the network securely and efficiently. Consider the following:

    • Establish a virtual private network between the systems in Site A and Site B to encrypt all traffic between the sites and remains private. Ensure that your network topology routes all traffic between members within a single site over the local area network.

    • Configure authentication using auth and keyFile, so that only servers and process with authentication can connect to the replica set.

    • Configure networking and firewall rules so that only traffic (incoming and outgoing packets) on the default MongoDB port (e.g. 27017) from within your deployment.

      See also

      For more information on security and firewalls, see Security.

  • Specify run-time configuration on each system in a configuration file stored in /etc/mongodb.conf or in a related location. Do not specify run-time configuration through command line options.

    For each MongoDB instance, use the following configuration, with values set appropriate to your systems:

    port = 27017
    
    bind_ip = 10.8.0.10
    
    dbpath = /srv/mongodb/
    
    fork = true
    
    replSet = rs0/mongodb0.example.net,mongodb1.example.net,mongodb2.example.net
    

    Modify bind_ip to reflect a secure interface on your system that is able to access all other members of the set and that is accessible to all other members of the replica set. The DNS or host names need to point and resolve to this IP address. Configure network rules or a virtual private network (i.e. “VPN”) to permit this access.

    Note

    The portion of the replSet following the / provides a “seed list” of known members of the replica set. mongod uses this list to fetch configuration changes following restarts. It is acceptable to omit this section entirely, and have the replSet option resemble:

    replSet = rs0
    

    For more documentation on the above run time configurations, as well as additional configuration options, see Configuration File Options.

To deploy a geographically distributed three-member set:

  1. On each system start the mongod process by issuing a command similar to following:

    mongod --config /etc/mongodb.conf
    

    Note

    In production deployments you likely want to use and configure a control script to manage this process based on this command. Control scripts are beyond the scope of this document.

  2. Open a mongo shell connected to one of the mongod instances:

    mongo
    
  3. Use the rs.initiate() method on one member to initiate a replica set consisting of the current member and using the default configuration:

    rs.initiate()
    
  4. Display the current replica configuration:

    rs.conf()
    
  5. Add the remaining members to the replica set by issuing a sequence of commands similar to the following. The example commands assume the current primary is mongodb0.example.net:

    rs.add("mongodb1.example.net")
    rs.add("mongodb2.example.net")
    
  6. Make sure that you have configured the member located in Site B (i.e. mongodb2.example.net) as a secondary-only member:

    1. Issue the following command to determine the _id value for mongodb2.example.net:

      rs.conf()
      
    2. In the members array, save the _id value. The example in the next step assumes this value is 2.

    3. In the mongo shell connected to the replica set’s primary, issue a command sequence similar to the following:

      cfg = rs.conf()
      cfg.members[2].priority = 0
      rs.reconfig(cfg)
      

      Note

      In some situations, the rs.reconfig() shell method can force the current primary to step down and causes an election. When the primary steps down, all clients will disconnect. This is the intended behavior. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods.

    After these commands return you have a geographically distributed three-member replica set.

  7. To check the status of your replica set, issue rs.status().

See also

The documentation of the following shell functions for more information:

Deploy a Distributed Four-Member Replica Set

A geographically distributed four-member deployment has the following features:

  • Each member of the replica set, except for the arbiter (see below), resides on its own machine, and the MongoDB processes all bind to port 27017, which is the standard MongoDB port.

  • Each member of the replica set must be accessible by way of resolvable DNS or hostnames in the following scheme:

    • mongodb0.example.net
    • mongodb1.example.net
    • mongodb2.example.net
    • mongodb3.example.net

    Configure DNS names appropriately, or set up your systems’ /etc/host file to reflect this configuration. Ensure that one system (e.g. mongodb2.example.net) resides in Site B. Host all other systems in Site A.

  • One host (e.g. mongodb3.example.net) will be an arbiter and can run on a system that is also used for an application server or some other shared purpose.

  • There are three possible architectures for this replica set:

    • Two members in Site A, two secondary-only members in Site B, and an arbiter in Site A.
    • Three members in Site A and one secondary-only member in Site B.
    • Two members in Site A, one secondary-only member in Site B, one secondary-only member in Site C, and an arbiter in site A.

    In most cases the first architecture is preferable because it is the least complex.

  • Ensure that network traffic can pass between all members in the network securely and efficiently. Consider the following:

    • Establish a virtual private network between the systems in Site A and Site B (and Site C if it exists) to encrypt all traffic between the sites and remains private. Ensure that your network topology routes all traffic between members within a single site over the local area network.

    • Configure authentication using auth and keyFile, so that only servers and process with authentication can connect to the replica set.

    • Configure networking and firewall rules so that only traffic (incoming and outgoing packets) on the default MongoDB port (e.g. 27017) from within your deployment.

      See also

      For more information on security and firewalls, see Security.

  • Specify run-time configuration on each system in a configuration file stored in /etc/mongodb.conf or in a related location. Do not specify run-time configuration through command line options.

    For each MongoDB instance, use the following configuration, with values set appropriate to your systems:

    port = 27017
    
    bind_ip = 10.8.0.10
    
    dbpath = /srv/mongodb/
    
    fork = true
    
    replSet = rs0/mongodb0.example.net,mongodb1.example.net,mongodb2.example.net,mongodb3.example.net
    

    Modify bind_ip to reflect a secure interface on your system that is able to access all other members of the set and that is accessible to all other members of the replica set. The DNS or host names need to point and resolve to this IP address. Configure network rules or a virtual private network (i.e. “VPN”) to permit this access.

    Note

    The portion of the replSet following the / provides a “seed list” of known members of the replica set. mongod uses this list to fetch configuration changes following restarts. It is acceptable to omit this section entirely, and have the replSet option resemble:

    replSet = rs0
    

    For more documentation on the above run time configurations, as well as additional configuration options, see doc:/reference/configuration-options.

To deploy a geographically distributed four-member set:

  1. On each system start the mongod process by issuing a command similar to following:

    mongod --config /etc/mongodb.conf
    

    Note

    In production deployments you likely want to use and configure a control script to manage this process based on this command. Control scripts are beyond the scope of this document.

  2. Open a mongo shell connected to this host:

    mongo
    
  3. Use rs.initiate() to initiate a replica set consisting of the current member and using the default configuration:

    rs.initiate()
    
  4. Display the current replica configuration:

    rs.conf()
    
  5. Add the remaining members to the replica set by issuing a sequence of commands similar to the following. The example commands assume the current primary is mongodb0.example.net:

    rs.add("mongodb1.example.net")
    rs.add("mongodb2.example.net")
    rs.add("mongodb3.example.net")
    
  6. In the same shell session, issue the following command to add the arbiter (e.g. mongodb4.example.net):

    rs.addArb("mongodb4.example.net")
    
  7. Make sure that you have configured each member located in Site B (e.g. mongodb3.example.net) as a secondary-only member:

    1. Issue the following command to determine the _id value for the member:

      rs.conf()
      
    2. In the members array, save the _id value. The example in the next step assumes this value is 2.

    3. In the mongo shell connected to the replica set’s primary, issue a command sequence similar to the following:

      cfg = rs.conf()
      cfg.members[2].priority = 0
      rs.reconfig(cfg)
      

      Note

      In some situations, the rs.reconfig() shell method can force the current primary to step down and causes an election. When the primary steps down, all clients will disconnect. This is the intended behavior. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods.

    After these commands return you have a geographically distributed four-member replica set.

  8. To check the status of your replica set, issue rs.status().

See also

The documentation of the following shell functions for more information:

Deploy a Distributed Set with More than Four Members

The procedure for deploying a geographically distributed set with more than four members is similar to the above procedures, with the following differences:

  • Never deploy more than seven voting members.
  • Use the procedure for a four-member set if you have an even number of members (see Deploy a Distributed Four-Member Replica Set). Ensure that Site A always has a majority of the members by deploying the arbiter within Site A. For six member sets, deploy at least three voting members in addition to the arbiter in Site A, the remaining members in alternate sites.
  • Use the procedure for a three-member set if you have an odd number of members (see Deploy a Distributed Three-Member Replica Set). Ensure that Site A always has a majority of the members of the set. For example, if a set has five members, deploy three members within the primary facility and two members in other facilities.
  • If you have a majority of the members of the set outside of Site A and the network partitions to prevent communication between sites, the current primary in Site A will step down, even if none of the members outside of Site A are eligible to become primary.

Additionally, consider the Write Concern and Read Preference documents, which addresses capabilities related to data center awareness.

Journaling

MongoDB uses write ahead logging to an on-disk journal to guarantee write operation durability and to provide crash resiliency. Before applying a change to the data files, MongoDB writes the change operation to the journal. If MongoDB should terminate or encounter an error before it can write the changes from the journal to the data files, MongoDB can re-apply the write operation and maintain a consistent state.

Without a journal, if mongod exits unexpectedly, you must assume your data is in an inconsistent state, and you must run either repair or, preferably, resync from a clean member of the replica set.

With journaling enabled, if mongod stops unexpectedly, the program can recover everything written to the journal, and the data remains in a consistent state. By default, the greatest extent of lost writes, i.e., those not made to the journal, are those made in the last 100 milliseconds. See journalCommitInterval for more information on the default.

With journaling, if you want a data set to reside entirely in RAM, you need enough RAM to hold the dataset plus the “write working set.” The “write working set” is the amount of unique data you expect to see written between re-mappings of the private view. For information on views, see Storage Views used in Journaling.

Important

Changed in version 2.0: For 64-bit builds of mongod, journaling is enabled by default. For other platforms, see journal.

Procedures

Enable Journaling

Changed in version 2.0: For 64-bit builds of mongod, journaling is enabled by default.

To enable journaling, start mongod with the --journal command line option.

If no journal files exist, when mongod starts, it must preallocate new journal files. During this operation, the mongod is not listening for connections until preallocation completes: for some systems this may take a several minutes. During this period your applications and the mongo shell are not available.

Disable Journaling

Warning

Do not disable journaling on production systems. If your mongod instance stops without shutting down cleanly unexpectedly for any reason, (e.g. power failure) and you are not running with journaling, then you must recover from an unaffected replica set member or backup, as described in repair.

To disable journaling, start mongod with the --nojournal command line option.

Get Commit Acknowledgment

You can get commit acknowledgment with the getLastError command and the j option. For details, see Write Concern Reference.

Avoid Preallocation Lag

To avoid preallocation lag, you can preallocate files in the journal directory by copying them from another instance of mongod.

Preallocated files do not contain data. It is safe to later remove them. But if you restart mongod with journaling, mongod will create them again.

Example

The following sequence preallocates journal files for an instance of mongod running on port 27017 with a database path of /data/db.

For demonstration purposes, the sequence starts by creating a set of journal files in the usual way.

  1. Create a temporary directory into which to create a set of journal files:

    mkdir ~/tmpDbpath
    
  2. Create a set of journal files by staring a mongod instance that uses the temporary directory:

    mongod --port 10000 --dbpath ~/tmpDbpath --journal
    
  3. When you see the following log output, indicating mongod has the files, press CONTROL+C to stop the mongod instance:

    web admin interface listening on port 11000
    
  4. Preallocate journal files for the new instance of mongod by moving the journal files from the data directory of the existing instance to the data directory of the new instance:

    mv ~/tmpDbpath/journal /data/db/
    
  5. Start the new mongod instance:

    mongod --port 27017 --dbpath /data/db --journal
    
Monitor Journal Status

Use the following commands and methods to monitor journal status:

  • serverStatus

    The serverStatus command returns database status information that is useful for assessing performance.

  • journalLatencyTest

    Use journalLatencyTest to measure how long it takes on your volume to write to the disk in an append-only fashion. You can run this command on an idle system to get a baseline sync time for journaling. You can also run this command on a busy system to see the sync time on a busy system, which may be higher if the journal directory is on the same volume as the data files.

    The journalLatencyTest command also provides a way to check if your disk drive is buffering writes in its local cache. If the number is very low (i.e., less than 2 milliseconds) and the drive is non-SSD, the drive is probably buffering writes. In that case, enable cache write-through for the device in your operating system, unless you have a disk controller card with battery backed RAM.

Change the Group Commit Interval

Changed in version 2.0.

You can set the group commit interval using the --journalCommitInterval command line option. The allowed range is 2 to 300 milliseconds.

Lower values increase the durability of the journal at the expense of disk performance.

Recover Data After Unexpected Shutdown

On a restart after a crash, MongoDB replays all journal files in the journal directory before the server becomes available. If MongoDB must replay journal files, mongod notes these events in the log output.

There is no reason to run repairDatabase in these situations.

Journaling Internals

When running with journaling, MongoDB stores and applies write operations in memory and in the journal before the changes are in the data files.

Journal Files

With journaling enabled, MongoDB creates a journal directory within the directory defined by dbpath, which is /data/db by default. The journal directory holds journal files, which contain write-ahead redo logs. The directory also holds a last-sequence-number file. A clean shutdown removes all the files in the journal directory.

Journal files are append-only files and have file names prefixed with j._. When a journal file holds 1 gigabyte of data, MongoDB creates a new journal file. Once MongoDB applies all the write operations in the journal files, it deletes these files. Unless you write many bytes of data per-second, the journal directory should contain only two or three journal files.

To limit the size of each journal file to 128 megabytes, use the smallfiles run time option when starting mongod.

To speed the frequent sequential writes that occur to the current journal file, you can ensure that the journal directory is on a different system.

Important

If you place the journal on a different filesystem from your data files you cannot use a filesystem snapshot to capture consistent backups of a dbpath directory.

Note

Depending on your file system, you might experience a preallocation lag the first time you start a mongod instance with journaling enabled. MongoDB preallocates journal files if it is faster on your file system to create files of a pre-defined. The amount of time required to pre-allocate lag might last several minutes, during which you will not be able to connect to the database. This is a one-time preallocation and does not occur with future invocations.

To avoid preallocation lag, see Avoid Preallocation Lag.

Storage Views used in Journaling

Journaling adds three storage views to MongoDB.

The shared view stores modified data for upload to the MongoDB data files. The shared view is the only view with direct access to the MongoDB data files. When running with journaling, mongod asks the operating system to map your existing on-disk data files to the shared view memory view. The operating system maps the files but does not load them. MongoDB later loads data files to shared view as needed.

The private view stores data for use in read operations. MongoDB maps private view to the shared view and is the first place MongoDB applies new write operations.

The journal is an on-disk view that stores new write operations after MongoDB applies the operation to the private cache but before applying them to the data files. The journal provides durability. If the mongod instance were to crash without having applied the writes to the data files, the journal could replay the writes to the shared view for eventual upload to the data files.

How Journaling Records Write Operations

MongoDB copies the write operations to the journal in batches called group commits. See journalCommitInterval for more information on the default commit interval. These “group commits” help minimize the performance impact of journaling.

Journaling stores raw operations that allow MongoDB to reconstruct the following:

  • document insertion/updates
  • index modifications
  • changes to the namespace files

As write operations occur, MongoDB writes the data to the private view in RAM and then copies the write operations in batches to the journal. The journal stores the operations on disk to ensure durability. MongoDB adds the operations as entries on the journal’s forward pointer. Each entry describes which bytes the write operation changed in the data files.

MongoDB next applies the journal’s write operations to the shared view. At this point, the shared view becomes inconsistent with the data files.

At default intervals of 60 seconds, MongoDB asks the operating system to flush the shared view to disk. This brings the data files up-to-date with the latest write operations.

When MongoDB flushes write operations to the data files, MongoDB removes the write operations from the journal’s behind pointer. The behind pointer is always far back from advanced pointer.

As part of journaling, MongoDB routinely asks the operating system to remap the shared view to the private view, for consistency.

Note

The interaction between the shared view and the on-disk data files is similar to how MongoDB works without journaling, which is that MongoDB asks the operating system to flush in-memory changes back to the data files every 60 seconds.

Connect to MongoDB with SSL

This document outlines the use and operation of MongoDB’s SSL support. SSL allows MongoDB clients to support encrypted connections to mongod instances.

Note

The default distribution of MongoDB does not contain support for SSL. To use SSL, you must either build MongoDB locally passing the “--ssl” option to scons or use MongoDB Enterprise.

These instructions outline the process for getting started with SSL and assume that you have already installed a build of MongoDB that includes SSL support and that your client driver supports SSL.

Configure mongod and mongos for SSL

Combine SSL Certificate and Key File

Before you can use SSL, you must have a .pem file that contains the public key certificate and private key. MongoDB can use any valid SSL certificate. To generate a self-signed certificate and private key, use a command that resembles the following:

cd /etc/ssl/
openssl req -new -x509 -days 365 -nodes -out mongodb-cert.crt -keyout mongodb-cert.key

This operation generates a new, self-signed certificate with no passphrase that is valid for 365 days. Once you have the certificate, concatenate the certificate and private key to a .pem file, as in the following example:

cat mongodb-cert.key mongodb-cert.crt > mongodb.pem
Set Up mongod and mongos with SSL Certificate and Key

To use SSL in your MongoDB deployment, include the following run-time options with mongod and mongos:

Consider the following syntax for mongod:

mongod --sslOnNormalPorts --sslPEMKeyFile <pem>

For example, given an SSL certificate located at /etc/ssl/mongodb.pem, configure mongod to use SSL encryption for all connections with the following command:

mongod --sslOnNormalPorts --sslPEMKeyFile /etc/ssl/mongodb.pem

Note

  • Specify <pem> with the full path name to the certificate.

  • If the private key portion of the <pem> is encrypted, specify the encryption password with the sslPEMKeyPassword option.

  • You may also specify these options in the configuration file, as in the following example:

    sslOnNormalPorts = true
    sslPEMKeyFile = /etc/ssl/mongodb.pem
    

To connect, to mongod and mongos instances using SSL, the mongo shell and MongoDB tools must include the --ssl option. See SSL Configuration for Clients for more information on connecting to mongod and mongos running with SSL.

Set Up mongod and mongos with Certificate Validation

To set up mongod or mongos for SSL encryption using an SSL certificate signed by a certificate authority, include the following run-time options during startup:

  • sslOnNormalPorts
  • sslPEMKeyFile with the name of the .pem file that contains the signed SSL certificate and key.
  • sslCAFile with the name of the .pem file that contains the root certificate chain from the Certificate Authority.

Consider the following syntax for mongod:

mongod --sslOnNormalPorts --sslPEMKeyFile <pem> --sslCAFile <ca>

For example, given a signed SSL certificate located at /etc/ssl/mongodb.pem and the certificate authority file at /etc/ssl/ca.pem, you can configure mongod for SSL encryption as follows:

mongod --sslOnNormalPorts --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem

Note

  • Specify the <pem> file and the <ca> file with either the full path name or the relative path name.

  • If the <pem> is encrypted, specify the encryption password with the sslPEMKeyPassword option.

  • You may also specify these options in the configuration file, as in the following example:

    sslOnNormalPorts = true
    sslPEMKeyFile = /etc/ssl/mongodb.pem
    sslCAFile = /etc/ssl/ca.pem
    

To connect, to mongod and mongos instances using SSL, the mongo tools must include the both the --ssl and --sslPEMKeyFile option. See SSL Configuration for Clients for more information on connecting to mongod and mongos running with SSL.

Block Revoked Certificates for Clients

To prevent clients with revoked certificates from connecting, include the sslCRLFile to specify a .pem file that contains revoked certificates.

For example, the following mongod with SSL configuration includes the sslCRLFile setting:

mongod --sslOnNormalPorts --sslCRLFile /etc/ssl/ca-crl.pem --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem

Clients with revoked certificates in the /etc/ssl/ca-crl.pem will not be able to connect to this mongod instance.

Validate Only if a Client Presents a Certificate

In most cases it is important to ensure that clients present valid certificates. However, if you have clients that cannot present a client certificate, or are transitioning to using a certificate authority you may only want to validate certificates from clients that present a certificate.

If you want to bypass validation for clients that don’t present certificates, include the sslWeakCertificateValidation run-time option with mongod and mongos. If the client does not present a certificate, no validation occurs. These connections, though not validated, are still encrypted using SSL.

For example, consider the following mongod with an SSL configuration that includes the sslWeakCertificateValidation setting:

mongod --sslOnNormalPorts --sslWeakCertificateValidation --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem

Then, clients can connect either with the option --ssl and no certificate or with the option --ssl and a valid certificate. See SSL Configuration for Clients for more information on SSL connections for clients.

Note

If the client presents a certificate, the certificate must be a valid certificate.

All connections, including those that have not presented certificates are encrypted using SSL.

Run in FIPS Mode

If your mongod or mongos is running on a system with an OpenSSL library configured with the FIPS 140-2 module, you can run mongod or mongos in FIPS mode, with the sslFIPSMode setting.

SSL Configuration for Clients

Clients must have support for SSL to work with a mongod or a mongos instance that has SSL support enabled. The current versions of the Python, Java, Ruby, Node.js, .NET, and C++ drivers have support for SSL, with full support coming in future releases of other drivers.

mongo SSL Configuration

For SSL connections, you must use the mongo shell built with SSL support or distributed with MongoDB Enterprise. To support SSL, mongo has the following settings:

  • --ssl
  • --sslPEMKeyFile with the name of the .pem file that contains the SSL certificate and key.
  • --sslCAFile with the name of the .pem file that contains the certificate from the Certificate Authority.
  • --sslPEMKeyPassword option if the client certificate-key file is encrypted.
Connect to MongoDB Instance with SSL Encryption

To connect to a mongod or mongos instance that requires only a SSL encryption mode, start mongo shell with --ssl, as in the following:

mongo --ssl
Connect to MongoDB Instance that Requires Client Certificates

To connect to a mongod or mongos that requires CA-signed client certificates, start the mongo shell with --ssl and the --sslPEMKeyFile option to specify the signed certificate-key file, as in the following:

mongo --ssl --sslPEMKeyFile /etc/ssl/client.pem
Connect to MongoDB Instance that Validates when Presented with a Certificate

To connect to a mongod or mongos instance that only requires valid certificates when the client presents a certificate, start mongo shell either with the --ssl ssl and no certificate or with the --ssl ssl and a valid signed certificate.

For example, if mongod is running with weak certificate validation, both of the following mongo shell clients can connect to that mongod:

mongo --ssl
mongo --ssl --sslPEMKeyFile /etc/ssl/client.pem

Important

If the client presents a certificate, the certificate must be valid.

MMS

The MMS agent will also have to connect via SSL in order to gather its stats. Because the agent already utilizes SSL for its communications to the MMS servers, this is just a matter of enabling SSL support in MMS itself on a per host basis.

Use the “Edit” host button (i.e. the pencil) on the Hosts page in the MMS console and is currently enabled on a group by group basis by 10gen.

Please see the MMS Manual for more information about MMS configuration.

PyMongo

Add the “ssl=True” parameter to a PyMongo MongoClient to create a MongoDB connection to an SSL MongoDB instance:

from pymongo import MongoClient
c = MongoClient(host="mongodb.example.net", port=27017, ssl=True)

To connect to a replica set, use the following operation:

from pymongo import MongoReplicaSetClient
c = MongoReplicaSetClient("mongodb.example.net:27017",
                          replicaSet="mysetname", ssl=True)

PyMongo also supports an “ssl=true” option for the MongoDB URI:

mongodb://mongodb.example.net:27017/?ssl=true
Java

Consider the following example “SSLApp.java” class file:

import com.mongodb.*;
import javax.net.ssl.SSLSocketFactory;

public class SSLApp {

    public static void main(String args[])  throws Exception {

        MongoClientOptions o = new MongoClientOptions.Builder()
                .socketFactory(SSLSocketFactory.getDefault())
                .build();

        MongoClient m = new MongoClient("localhost", o);

        DB db = m.getDB( "test" );
        DBCollection c = db.getCollection( "foo" );

        System.out.println( c.findOne() );
    }
}
Ruby

The recent versions of the Ruby driver have support for connections to SSL servers. Install the latest version of the driver with the following command:

gem install mongo

Then connect to a standalone instance, using the following form:

require 'rubygems'
require 'mongo'

connection = MongoClient.new('localhost', 27017, :ssl => true)

Replace connection with the following if you’re connecting to a replica set:

connection = MongoReplicaSetClient.new(['localhost:27017'],
                                       ['localhost:27018'],
                                       :ssl => true)

Here, mongod instance run on “localhost:27017” and “localhost:27018”.

Node.JS (node-mongodb-native)

In the node-mongodb-native driver, use the following invocation to connect to a mongod or mongos instance via SSL:

var db1 = new Db(MONGODB, new Server("127.0.0.1", 27017,
                                     { auto_reconnect: false, poolSize:4, ssl:ssl } );

To connect to a replica set via SSL, use the following form:

var replSet = new ReplSetServers( [
    new Server( RS.host, RS.ports[1], { auto_reconnect: true } ),
    new Server( RS.host, RS.ports[0], { auto_reconnect: true } ),
    ],
  {rs_name:RS.name, ssl:ssl}
);
.NET

As of release 1.6, the .NET driver supports SSL connections with mongod and mongos instances. To connect using SSL, you must add an option to the connection string, specifying ssl=true as follows:

var connectionString = "mongodb://localhost/?ssl=true";
var server = MongoServer.Create(connectionString);

The .NET driver will validate the certificate against the local trusted certificate store, in addition to providing encryption of the server. This behavior may produce issues during testing if the server uses a self-signed certificate. If you encounter this issue, add the sslverifycertificate=false option to the connection string to prevent the .NET driver from validating the certificate, as follows:

var connectionString = "mongodb://localhost/?ssl=true&sslverifycertificate=false";
var server = MongoServer.Create(connectionString);

Monitor MongoDB with SNMP

New in version 2.2.

Enterprise Feature

This feature is only available in MongoDB Enterprise.

This document outlines the use and operation of MongoDB’s SNMP extension, which is only available in MongoDB Enterprise.

Prerequisites

Install MongoDB Enterprise

MongoDB Enterprise

Included Files

The Enterprise packages contain the following files:

  • MONGO-MIB.txt:

    The MIB file that describes the data (i.e. schema) for MongoDB’s SNMP output

  • mongod.conf:

    The SNMP configuration file for reading the SNMP output of MongoDB. The SNMP configures the community names, permissions, access controls, etc.

Required Packages

To use SNMP, you must install several prerequisites. The names of the packages vary by distribution and are as follows:

  • Ubuntu 11.04 requires libssl0.9.8, snmp-mibs-downloader, snmp, and snmpd. Issue a command such as the following to install these packages:

    sudo apt-get install libssl0.9.8 snmp snmpd snmp-mibs-downloader
    
  • Red Hat Enterprise Linux 6.x series and Amazon Linux AMI require libssl, net-snmp, net-snmp-libs, and net-snmp-utils. Issue a command such as the following to install these packages:

    sudo yum install libssl net-snmp net-snmp-libs net-snmp-utils
    
  • SUSE Enterprise Linux requires libopenssl0_9_8, libsnmp15, slessp1-libsnmp15, and snmp-mibs. Issue a command such as the following to install these packages:

    sudo zypper install libopenssl0_9_8 libsnmp15 slessp1-libsnmp15 snmp-mibs
    

Configure SNMP

Install MIB Configuration Files

Ensure that the MIB directory /usr/share/snmp/mibs exists. If not, issue the following command:

sudo mkdir -p /usr/share/snmp/mibs

Use the following command to create a symbolic link:

sudo ln -s [/path/to/mongodb/distribution/]MONGO-MIB.txt /usr/share/snmp/mibs/

Replace [/path/to/mongodb/distribution/] with the path to your MONGO-MIB.txt configuration file.

Copy the mongod.conf file into the /etc/snmp directory with the following command:

cp mongod.conf /etc/snmp/mongod.conf
Start Up

You can control MongoDB Enterprise using default or custom control scripts, just as with any other mongod:

Use the following command to view all SNMP options available in your MongoDB:

mongod --help | grep snmp

The above command should return the following output:

Module snmp options:
  --snmp-subagent       run snmp subagent
  --snmp-master         run snmp as master

Ensure that the following directories exist:

  • /data/db/ (This is the path where MongoDB stores the data files.)
  • /var/log/mongodb/ (This is the path where MongoDB writes the log output.)

If they do not, issue the following command:

mkdir -p /var/log/mongodb/ /data/db/

Start the mongod instance with the following command:

mongod --snmp-master --port 3001 --fork --dbpath /data/db/  --logpath /var/log/mongodb/1.log

Optionally, you can set these options in a configuration file.

To check if mongod is running with SNMP support, issue the following command:

ps -ef | grep 'mongod --snmp'

The command should return output that includes the following line. This indicates that the proper mongod instance is running:

systemuser 31415 10260  0 Jul13 pts/16   00:00:00 mongod --snmp-master --port 3001 # [...]
Test SNMP

Check for the snmp agent process listening on port 1161 with the following command:

sudo lsof -i :1161

which return the following output:

COMMAND  PID     USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
mongod  9238 sysadmin   10u  IPv4  96469      0t0  UDP localhost:health-polling

Similarly, this command:

netstat -an | grep 1161

should return the following output:

udp        0      0 127.0.0.1:1161              0.0.0.0:*
Run snmpwalk Locally

snmpwalk provides tools for retrieving and parsing the SNMP data according to the MIB. If you installed all of the required packages above, your system will have snmpwalk.

Issue the following command to collect data from mongod using SNMP:

snmpwalk -m MONGO-MIB -v 2c -c mongodb 127.0.0.1:1161 1.3.6.1.4.1.37601

You may also choose to specify the path to the MIB file:

snmpwalk -m /usr/share/snmp/mibs/MONGO-MIB -v 2c -c mongodb 127.0.0.1:1161 1.3.6.1.4.1.37601

Use this command only to ensure that you can retrieve and validate SNMP data from MongoDB.

Troubleshooting

Always check the logs for errors if something does not run as expected; see the log at /var/log/mongodb/1.log. The presence of the following line indicates that the mongod cannot read the /etc/snmp/mongod.conf file:

[SNMPAgent] warning: error starting SNMPAgent as master err:1

Manage mongod Processes

MongoDB runs as a standard program. You can start MongoDB from a command line by issuing the mongod command and specifying options. For a list of options, see mongod. MongoDB can also run as a Windows service. For details, see MongoDB as a Windows Service. To install MongoDB, see Install MongoDB.

The following examples assume the directory containing the mongod process is in your system paths. The mongod process is the primary database process that runs on an individual server. mongos provides a coherent MongoDB interface equivalent to a mongod from the perspective of a client. The mongo binary provides the administrative shell.

This document page discusses the mongod process; however, some portions of this document may be applicable to mongos instances.

Start mongod

By default, MongoDB stores data in the /data/db directory. On Windows, MongoDB stores data in C:\data\db. On all platforms, MongoDB listens for connections from clients on port 27017.

To start MongoDB using all defaults, issue the following command at the system shell:

mongod
Specify a Data Directory

If you want mongod to store data files at a path other than /data/db you can specify a dbpath. The dbpath must exist before you start mongod. If it does not exist, create the directory and the permissions so that mongod can read and write data to this path. For more information on permissions, see the security operations documentation.

To specify a dbpath for mongod to use as a data directory, use the --dbpath option. The following invocation will start a mongod instance and store data in the /srv/mongodb path

mongod --dbpath /srv/mongodb/
Specify a TCP Port

Only a single process can listen for connections on a network interface at a time. If you run multiple mongod processes on a single machine, or have other processes that must use this port, you must assign each a different port to listen on for client connections.

To specify a port to mongod, use the --port option on the command line. The following command starts mongod listening on port 12345:

mongod --port 12345

Use the default port number when possible, to avoid confusion.

Start mongod as a Daemon

To run a mongod process as a daemon (i.e. fork,) and write its output to a log file, use the --fork and --logpath options. You must create the log directory; however, mongod will create the log file if it does not exist.

The following command starts mongod as a daemon and records log output to /var/log/mongodb.log.

mongod --fork --logpath /var/log/mongodb.log
Additional Configuration Options

For an overview of common configurations and common configuration deployments. configurations for common use cases, see Run-time Database Configuration.

Stop mongod

To stop a mongod instance not running as a daemon, press Control+C. MongoDB stops when all ongoing operations are complete and does a clean exit, flushing and closing all data files.

To stop a mongod instance running in the background or foreground, issue the shutdownServer() helper in the mongo shell. Use the following sequence:

  1. To open the mongo shell for a mongod instance running on the default port of 27017, issue the following command:

    mongo
    
  2. To switch to the admin database and shutdown the mongod instance, issue the following commands:

    use admin
    db.shutdownServer()
    

You may only use db.shutdownServer() when connected to the mongod when authenticated to the admin database or on systems without authentication connected via the localhost interface.

Alternately, you can shut down the mongod instance:

mongod Shutdown and Replica Sets

If the mongod is the primary in a replica set, the shutdown process for these mongod instances has the following steps:

  1. Check how up-to-date the secondaries are.
  2. If no secondary is within 10 seconds of the primary, mongod will return a message that it will not shut down. You can pass the shutdown command a timeoutSecs argument to wait for a secondary to catch up.
  3. If there is a secondary within 10 seconds of the primary, the primary will step down and wait for the secondary to catch up.
  4. After 60 seconds or once the secondary has caught up, the primary will shut down.

If there is no up-to-date secondary and you want the primary to shut down, issue the shutdown command with the force argument, as in the following mongo shell operation:

db.adminCommand({shutdown : 1, force : true})

To keep checking the secondaries for a specified number of seconds if none are immediately up-to-date, issue shutdown with the timeoutSecs argument. MongoDB will keep checking the secondaries for the specified number of seconds if none are immediately up-to-date. If any of the secondaries catch up within the allotted time, the primary will shut down. If no secondaries catch up, it will not shut down.

The following command issues shutdown with timeoutSecs set to 5:

db.adminCommand({shutdown : 1, timeoutSecs : 5})

Alternately you can use the timeoutSecs argument with the shutdownServer() method:

db.shutdownServer({timeoutSecs : 5})

Sending a UNIX INT or TERM Signal

You can cleanly stop mongod using a SIGINT or SIGTERM signal on UNIX-like systems. Either ^C for a non-daemon mongod instance, kill -2 <pid>, or kill -15 <pid> will cleanly terminate the mongod instance.

Terminating a mongod instance that is not running with journaling with kill -9 <pid> (i.e. SIGKILL) will probably cause data corruption.

To recover data in situations where mongod instances have not terminated cleanly without journaling see Recover MongoDB Data following Unexpected Shutdown.

Rotate Log Files

Overview

Log rotation archives the current log file and starts a new one. Specifically, log rotation renames the current log file by appending the filename with a timestamp, [1] opens a new log file, and finally closes the old log. MongoDB will only rotate logs, when you use the logRotate command, or issue the process a SIGUSR1 signal as described in this procedure.

See also

For information on logging, see the Process Logging section.

Procedure

The following steps create and rotate a log file:

  1. Start a mongod with verbose logging, with appending enabled, and with the following log file:

    mongod -v --logpath /var/log/mongodb/server1.log --logappend
  2. In a separate terminal, list the matching files:

    ls /var/log/mongodb/server1.log*

    For results, you get:

    server1.log
    
  3. Rotate the log file using one of the following methods.

    • From the mongo shell, issue the logRotate command from the admin database:

      use admin
      db.runCommand( { logRotate : 1 } )
      

      This is the only available method to rotate log files on Windows systems.

    • From the UNIX shell, rotate logs for a single process by issuing the following command:

      kill -SIGUSR1 <mongod process id>
      
    • From the UNIX shell, rotate logs for all mongod processes on a machine by issuing the following command:

      killall -SIGUSR1 mongod
      
  4. List the matching files again:

    ls /var/log/mongodb/server1.log*

    For results you get something similar to the following. The timestamps will be different.

    server1.log  server1.log.2011-11-24T23-30-00
    

    The example results indicate a log rotation performed at exactly 11:30 pm on November 24th, 2011 UTC, which is the local time offset by the local time zone. The original log file is the one with the timestamp. The new log is server1.log file.

    If you issue a second logRotate command an hour later, then an additional file would appear when listing matching files, as in the following example:

    server1.log  server1.log.2011-11-24T23-30-00  server1.log.2011-11-25T00-30-00
    

    This operation does not modify the server1.log.2011-11-24T23-30-00 file created earlier, while server1.log.2011-11-25T00-30-00 is the previous server1.log file, renamed. server1.log is a new, empty file that receives all new log output.

[1]MongoDB renders this timestamp in UTC (GMT) and formatted as ISODate.

Monitoring for MongoDB

Monitoring is a critical component of all database administration. A firm grasp of MongoDB’s reporting will allow you to assess the state of your database and maintain your deployment without crisis. Additionally, a sense of MongoDB’s normal operational parameters will allow you to diagnose issues as you encounter them, rather than waiting for a crisis or failure.

This document provides an overview of the available tools and data provided by MongoDB as well as an introduction to diagnostic strategies, and suggestions for monitoring instances in MongoDB’s replica sets and sharded clusters.

Note

10gen provides a hosted monitoring service which collects and aggregates these data to provide insight into the performance and operation of MongoDB deployments. See the MongoDB Monitoring Service (MMS) and the MMS documentation for more information.

Monitoring Tools

There are two primary methods for collecting data regarding the state of a running MongoDB instance. First, there are a set of tools distributed with MongoDB that provide real-time reporting of activity on the database. Second, several database commands return statistics regarding the current database state with greater fidelity. Both methods allow you to collect data that answers a different set of questions, and are useful in different contexts.

This section provides an overview of these utilities and statistics, along with an example of the kinds of questions that each method is most suited to help you address.

Utilities

The MongoDB distribution includes a number of utilities that return statistics about instances’ performance and activity quickly. These are typically most useful for diagnosing issues and assessing normal operation.

mongotop

mongotop tracks and reports the current read and write activity of a MongoDB instance. mongotop provides per-collection visibility into use. Use mongotop to verify that activity and use match expectations. See the mongotop manual for details.

mongostat

mongostat captures and returns counters of database operations. mongostat reports operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the distribution of load on the server. Use mongostat to understand the distribution of operation types and to inform capacity planning. See the mongostat manual for details.

REST Interface

MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page. Enable this by setting rest to true, and access this page via the local host interface using the port numbered 1000 more than that the database port. In default configurations the REST interface is accessible on 28017. For example, to access the REST interface on a locally running mongod instance: http://localhost:28017

Statistics

MongoDB provides a number of commands that return statistics about the state of the MongoDB instance. These data may provide finer granularity regarding the state of the MongoDB instance than the tools above. Consider using their output in scripts and programs to develop custom alerts, or to modify the behavior of your application in response to the activity of your instance.

serverStatus

Access serverStatus data by way of the serverStatus command. This document contains a general overview of the state of the database, including disk usage, memory use, connection, journaling, index accesses. The command returns quickly and does not impact MongoDB performance.

While this output contains a (nearly) complete account of the state of a MongoDB instance, in most cases you will not run this command directly. Nevertheless, all administrators should be familiar with the data provided by serverStatus.

replSetGetStatus

View the replSetGetStatus data with the replSetGetStatus command (rs.status() from the shell). The document returned by this command reflects the state and configuration of the replica set. Use this data to ensure that replication is properly configured, and to check the connections between the current host and the members of the replica set.

dbStats

The dbStats data is accessible by way of the dbStats command (db.stats() from the shell). This command returns a document that contains data that reflects the amount of storage used and data contained in the database, as well as object, collection, and index counters. Use this data to check and track the state and storage of a specific database. This output also allows you to compare utilization between databases and to determine average document size in a database.

collStats

The collStats data is accessible using the collStats command (db.printCollectionStats() from the shell). It provides statistics that resemble dbStats on the collection level: this includes a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about the indexes.

Introspection Tools

In addition to status reporting, MongoDB provides a number of introspection tools that you can use to diagnose and analyze performance and operational conditions. Consider the following documentation:

Third Party Tools

A number of third party monitoring tools have support for MongoDB, either directly, or through their own plugins.

Self Hosted Monitoring Tools

These are monitoring tools that you must install, configure and maintain on your own servers, usually open source.

Tool Plugin Description
Ganglia mongodb-ganglia Python script to report operations per second, memory usage, btree statistics, master/slave status and current connections.
Ganglia gmond_python_modules Parses output from the serverStatus and replSetGetStatus commands.
Motop None Realtime monitoring tool for several MongoDB servers. Shows current operations ordered by durations every second.
mtop None A top like tool.
Munin mongo-munin Retrieves server statistics.
Munin mongomon Retrieves collection statistics (sizes, index sizes, and each (configured) collection count for one DB).
Munin munin-plugins Ubuntu PPA Some additional munin plugins not in the main distribution.
Nagios nagios-plugin-mongodb A simple Nagios check script, written in Python.
Zabbix mikoomi-mongodb Monitors availability, resource utilization, health, performance and other important metrics.

Also consider dex, an index and query analyzing tool for MongoDB that compares MongoDB log files and indexes to make indexing recommendations.

Hosted (SaaS) Monitoring Tools

These are monitoring tools provided as a hosted service, usually on a subscription billing basis.

Name Notes
Scout Several plugins including: MongoDB Monitoring, MongoDB Slow Queries and MongoDB Replica Set Monitoring.
Server Density Dashboard for MongoDB, MongoDB specific alerts, replication failover timeline and iPhone, iPad and Android mobile apps.

Process Logging

During normal operation, mongod and mongos instances report information that reflect current operation to standard output, or a log file. The following runtime settings control these options.

  • quiet. Limits the amount of information written to the log or output.

  • verbose. Increases the amount of information written to the log or output.

    You can also specify this as v (as in -v.) Set multiple v, as in vvvv = True for higher levels of verbosity. You can also change the verbosity of a running mongod or mongos instance with the setParameter command.

  • logpath. Enables logging to a file, rather than standard output. Specify the full path to the log file to this setting.

  • logappend. Adds information to a log file instead of overwriting the file.

Note

You can specify these configuration operations as the command line arguments to mongod or mongos

Additionally, the following database commands affect logging:

Diagnosing Performance Issues

Degraded performance in MongoDB can be the result of an array of causes, and is typically a function of the relationship among the quantity of data stored in the database, the amount of system RAM, the number of connections to the database, and the amount of time the database spends in a lock state.

In some cases performance issues may be transient and related to traffic load, data access patterns, or the availability of hardware on the host system for virtualized environments. Some users also experience performance limitations as a result of inadequate or inappropriate indexing strategies, or as a consequence of poor schema design patterns. In other situations, performance issues may indicate that the database may be operating at capacity and that it is time to add additional capacity to the database.

Locks

MongoDB uses a locking system to ensure consistency. However, if certain operations are long-running, or a queue forms, performance slows as requests and operations wait for the lock. Because lock related slow downs can be intermittent, look to the data in the globalLock section of the serverStatus response to assess if the lock has been a challenge to your performance. If globalLock.currentQueue.total is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that might affect performance.

If globalLock.totalTime is high in context of uptime then the database has existed in a lock state for a significant amount of time. If globalLock.ratio is also high, MongoDB has likely been processing a large number of long running queries. Long queries are often the result of a number of factors: ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufficient RAM resulting in page faults and disk reads.

Memory Usage

Because MongoDB uses memory mapped files to store data, given a data set of sufficient size, the MongoDB process will allocate all memory available on the system for its use. Because of the way operating systems function, the amount of allocated RAM is not a useful reflection of MongoDB’s state.

While this is part of the design, and affords MongoDB superior performance, the memory mapped files make it difficult to determine if the amount of RAM is sufficient for the data set. Consider memory usage statuses to better understand MongoDB’s memory utilization. Check the resident memory use (i.e. mem.resident:) if this exceeds the amount of system memory and there’s a significant amount of data on disk that isn’t in RAM, you may have exceeded the capacity of your system.

Also check the amount of mapped memory (i.e. mem.mapped.) If this value is greater than the amount of system memory, some operations will require disk access page faults to read data from virtual memory with deleterious effects on performance.

Page Faults

Page faults represent the number of times that MongoDB requires data not located in physical memory, and must read from virtual memory. To check for page faults, see the extra_info.page_faults value in the serverStatus command. This data is only available on Linux systems.

Alone, page faults are minor and complete quickly; however, in aggregate, large numbers of page fault typically indicate that MongoDB is reading too much data from disk and can indicate a number of underlying causes and recommendations. In many situations, MongoDB’s read locks will “yield” after a page fault to allow other processes to read and avoid blocking while waiting for the next page to read into memory. This approach improves concurrency, and in high volume systems this also improves overall throughput.

If possible, increasing the amount of RAM accessible to MongoDB may help reduce the number of page faults. If this is not possible, you may want to consider deploying a sharded cluster and/or adding one or more shards to your deployment to distribute load among mongod instances.

Number of Connections

In some cases, the number of connections between the application layer (i.e. clients) and the database can overwhelm the ability of the server to handle requests which can produce performance irregularities. Check the following fields in the serverStatus document:

  • globalLock.activeClients contains a counter of the total number of clients with active operations in progress or queued.
  • connections is a container for the following two fields:
    • current the total number of current clients that connect to the database instance.
    • available the total number of unused collections available for new clients.

Note

Unless limited by system-wide limits MongoDB has a hard connection limit of 20 thousand connections. You can modify system limits using the ulimit command, or by editing your system’s /etc/sysctl file.

If requests are high because there are many concurrent application requests, the database may have trouble keeping up with demand. If this is the case, then you will need to increase the capacity of your deployment. For read-heavy applications increase the size of your replica set and distribute read operations to secondary members. For write heavy applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod instances.

Spikes in the number of connections can also be the result of application or driver errors. All of the MongoDB drivers supported by 10gen implement connection pooling, which allows clients to use and reuse connections more efficiently. Extremely high numbers of connections, particularly without corresponding workload is often indicative of a driver or other configuration error.

Database Profiling

MongoDB contains a database profiling system that can help identify inefficient queries and operations. Enable the profiler by setting the profile value using the following command in the mongo shell:

db.setProfilingLevel(1)

See also

See

The documentation of db.setProfilingLevel() for more information about this command.

Note

Because the database profiler can have an impact on the performance, only enable profiling for strategic intervals and as minimally as possible on production systems.

You may enable profiling on a per-mongod basis. This setting will not propagate across a replica set or sharded cluster.

The following profiling levels are available:

Level Setting
0 Off. No profiling.
1 On. Only includes slow operations.
2 On. Includes all operations.

See the output of the profiler in the system.profile collection of your database. You can specify the slowms setting to set a threshold above which the profiler considers operations “slow” and thus included in the level 1 profiling data. You may configure slowms at runtime, as an argument to the db.setProfilingLevel() operation.

Additionally, mongod records all “slow” queries to its log, as defined by slowms. The data in system.profile does not persist between mongod restarts.

You can view the profiler’s output by issuing the show profile command in the mongo shell, with the following operation.

db.system.profile.find( { millis : { $gt : 100 } } )

This returns all operations that lasted longer than 100 milliseconds. Ensure that the value specified here (i.e. 100) is above the slowms threshold.

See also

Optimization Strategies for MongoDB addresses strategies that may improve the performance of your database queries and operations.

Replication and Monitoring

The primary administrative concern that requires monitoring with replica sets, beyond the requirements for any MongoDB instance, is “replication lag.” This refers to the amount of time that it takes a write operation on the primary to replicate to a secondary. Some very small delay period may be acceptable; however, as replication lag grows, two significant problems emerge:

  • First, operations that have occurred in the period of lag are not replicated to one or more secondaries. If you’re using replication to ensure data persistence, exceptionally long delays may impact the integrity of your data set.
  • Second, if the replication lag exceeds the length of the operation log (oplog) then MongoDB will have to perform an initial sync on the secondary, copying all data from the primary and rebuilding all indexes. In normal circumstances this is uncommon given the typical size of the oplog, but it’s an issue to be aware of.

For causes of replication lag, see Replication Lag.

Replication issues are most often the result of network connectivity issues between members or the result of a primary that does not have the resources to support application and replication traffic. To check the status of a replica, use the replSetGetStatus or the following helper in the shell:

rs.status()

See the replSetGetStatus document for a more in depth overview view of this output. In general watch the value of optimeDate. Pay particular attention to the difference in time between the primary and the secondary members.

The size of the operation log is only configurable during the first run using the --oplogSize argument to the mongod command, or preferably the oplogSize in the MongoDB configuration file. If you do not specify this on the command line before running with the --replSet option, mongod will create a default sized oplog.

By default the oplog is 5% of total available disk space on 64-bit systems.

Sharding and Monitoring

In most cases the components of sharded clusters benefit from the same monitoring and analysis as all other MongoDB instances. Additionally, clusters require monitoring to ensure that data is effectively distributed among nodes and that sharding operations are functioning appropriately.

See also

See the Sharding page for more information.

Config Servers

The config database provides a map of documents to shards. The cluster updates this map as chunks move between shards. When a configuration server becomes inaccessible, some sharding operations like moving chunks and starting mongos instances become unavailable. However, clusters remain accessible from already-running mongos instances.

Because inaccessible configuration servers can have a serious impact on the availability of a sharded cluster, you should monitor the configuration servers to ensure that the cluster remains well balanced and that mongos instances can restart.

Balancing and Chunk Distribution

The most effective sharded cluster deployments require that chunks are evenly balanced among the shards. MongoDB has a background balancer process that distributes data such that chunks are always optimally distributed among the shards. Issue the db.printShardingStatus() or sh.status() command to the mongos by way of the mongo shell. This returns an overview of the entire cluster including the database name, and a list of the chunks.

Stale Locks

In nearly every case, all locks used by the balancer are automatically released when they become stale. However, because any long lasting lock can block future balancing, it’s important to insure that all locks are legitimate. To check the lock status of the database, connect to a mongos instance using the mongo shell. Issue the following command sequence to switch to the config database and display all outstanding locks on the shard database:

use config
db.locks.find()

For active deployments, the above query might return a useful result set. The balancing process, which originates on a randomly selected mongos, takes a special “balancer” lock that prevents other balancing activity from transpiring. Use the following command, also to the config database, to check the status of the “balancer” lock.

db.locks.find( { _id : "balancer" } )

If this lock exists, make sure that the balancer process is actively using this lock.

Analyze Performance of Database Operations

The database profiler collects fine grained data about MongoDB write operations, cursors, database commands on a running mongod instance. You can enable profiling on a per-database or per-instance basis. The profiling level is also configurable when enabling profiling.

The database profiler writes all the data it collects to the system.profile collection, which is a capped collection. See Database Profiler Output for overview of the data in the system.profile documents created by the profiler.

This document outlines a number of key administration options for the database profiler. For additional related information, consider the following resources:

Profiling Levels

The following profiling levels are available:

  • 0 - the profiler is off, does not collect any data.

  • 1 - collects profiling data for slow operations only. By default slow operations are those slower than 100 milliseconds.

    You can modify the threshold for “slow” operations with the slowms runtime option or the setParameter command. See the Specify the Threshold for Slow Operations section for more information.

  • 2 - collects profiling data for all database operations.

Enable Database Profiling and Set the Profiling Level

You can enable database profiling from the mongo shell or through a driver using the profile command. This section will describe how to do so from the mongo shell. See your driver documentation if you want to control the profiler from within your application.

When you enable profiling, you also set the profiling level. The profiler records data in the system.profile collection. MongoDB creates the system.profile collection in a database after you enable profiling for that database.

To enable profiling and set the profiling level, issue use the db.setProfilingLevel() helper in the mongo shell, passing the profiling level as a parameter. For example, to enable profiling for all database operations, consider the following operation in the mongo shell:

db.setProfilingLevel(2)

The shell returns a document showing the previous level of profiling. The "ok" : 1 key-value pair indicates the operation succeeded:

{ "was" : 0, "slowms" : 100, "ok" : 1 }

To verify the new setting, see the Check Profiling Level section.

Specify the Threshold for Slow Operations

The threshold for slow operations applies to the entire mongod instance. When you change the threshold, you change it for all databases on the instance.

Important

Changing the slow operation threshold for the database profiler also affects the profiling subsystem’s slow operation threshold for the entire mongod instance. Always set the threshold to the highest useful value.

By default the slow operation threshold is 100 milliseconds. Databases with a profiling level of 1 will log operations slower than 100 milliseconds.

To change the threshold, pass two parameters to the db.setProfilingLevel() helper in the mongo shell. The first parameter sets the profiling level for the current database, and the second sets the default slow operation threshold for the entire mongod instance.

For example, the following command sets the profiling level for the current database to 0, which disables profiling, and sets the slow-operation threshold for the mongod instance to 20 milliseconds. Any database on the instance with a profiling level of 1 will use this threshold:

db.setProfilingLevel(0,20)
Check Profiling Level

To view the profiling level, issue the following from the mongo shell:

db.getProfilingStatus()

The shell returns a document similar to the following:

{ "was" : 0, "slowms" : 100 }

The was field indicates the current level of profiling.

The slowms field indicates how long an operation must exist in milliseconds for an operation to pass the “slow” threshold. MongoDB will log operations that take longer than the threshold if the profiling level is 1. This document returns the profiling level in the was field. For an explanation of profiling levels, see Profiling Levels.

To return only the profiling level, use the db.getProfilingLevel() helper in the mongo as in the following:

db.getProfilingLevel()
Disable Profiling

To disable profiling, use the following helper in the mongo shell:

db.setProfilingLevel(0)
Enable Profiling for an Entire mongod Instance

For development purposes in testing environments, you can enable database profiling for an entire mongod instance. The profiling level applies to all databases provided by the mongod instance.

To enable profiling for a mongod instance, pass the following parameters to mongod at startup or within the configuration file:

mongod --profile=1 --slowms=15

This sets the profiling level to 1, which collects profiling data for slow operations only, and defines slow operations as those that last longer than 15 milliseconds.

See also

profile and slowms.

Database Profiling and Sharding

You cannot enable profiling on a mongos instance. To enable profiling in a shard cluster, you must enable profiling for each mongod instance in the cluster.

View Profiler Data

The database profiler logs information about database operations in the system.profile collection.

To view profiling information, query the system.profile collection. To view example queries, see Profiler Overhead

For an explanation of the output data, see Database Profiler Output.

Example Profiler Data Queries

This section displays example queries to the system.profile collection. For an explanation of the query output, see Database Profiler Output.

To return the most recent 10 log entries in the system.profile collection, run a query similar to the following:

db.system.profile.find().limit(10).sort( { ts : -1 } ).pretty()

To return all operations except command operations ($cmd), run a query similar to the following:

db.system.profile.find( { op: { $ne : 'command' } } ).pretty()

To return operations for a particular collection, run a query similar to the following. This example returns operations in the mydb database’s test collection:

db.system.profile.find( { ns : 'mydb.test' } ).pretty()

To return operations slower than 5 milliseconds, run a query similar to the following:

db.system.profile.find( { millis : { $gt : 5 } } ).pretty()

To return information from a certain time range, run a query similar to the following:

db.system.profile.find(
                       {
                        ts : {
                              $gt : new ISODate("2012-12-09T03:00:00Z") ,
                              $lt : new ISODate("2012-12-09T03:40:00Z")
                             }
                       }
                      ).pretty()

The following example looks at the time range, suppresses the user field from the output to make it easier to read, and sorts the results by how long each operation took to run:

db.system.profile.find(
                       {
                         ts : {
                               $gt : new ISODate("2011-07-12T03:00:00Z") ,
                               $lt : new ISODate("2011-07-12T03:40:00Z")
                              }
                       },
                       { user : 0 }
                      ).sort( { millis : -1 } )
Show the Five Most Recent Events

On a database that has profiling enabled, the show profile helper in the mongo shell displays the 5 most recent operations that took at least 1 millisecond to execute. Issue show profile from the mongo shell, as follows:

show profile

Profiler Overhead

When enabled, profiling has a minor effect on performance. The system.profile collection is a capped collection with a default size of 1 megabyte. A collection of this size can typically store several thousand profile documents, but some application may use more or less profiling data per operation.

To change the size of the system.profile collection, you must:

  1. Disable profiling.
  2. Drop the system.profile collection.
  3. Create a new system.profile collection.
  4. Re-enable profiling.

For example, to create a new system.profile collections that’s 4000000 bytes, use the following sequence of operations in the mongo shell:

db.setProfilingLevel(0)

db.system.profile.drop()

db.createCollection( "system.profile", { capped: true, size:4000000 } )

db.setProfilingLevel(1)

Import and Export MongoDB Data

This document provides an overview of the import and export programs included in the MongoDB distribution. These tools are useful when you want to backup or export a portion of your data without capturing the state of the entire database, or for simple data ingestion cases. For more complex data migration tasks, you may want to write your own import and export scripts using a client driver to interact with the database itself. For disaster recovery protection and routine database backup operation, use full database instance backups.

Warning

Because these tools primarily operate by interacting with a running mongod instance, they can impact the performance of your running database.

Not only do these processes create traffic for a running database instance, they also force the database to read all data through memory. When MongoDB reads infrequently used data, it can supplant more frequently accessed data, causing a deterioration in performance for the database’s regular workload.

mongoimport and mongoexport do not reliably preserve all rich BSON data types, because BSON is a superset of JSON. Thus, mongoimport and mongoexport cannot represent BSON data accurately in JSON. As a result data exported or imported with these tools may lose some measure of fidelity. See MongoDB Extended JSON for more information about MongoDB Extended JSON.

See also

See the “Backup Strategies for MongoDB Systems” document for more information on backing up MongoDB instances. Additionally, consider the following references for commands addressed in this document:

If you want to transform and process data once you’ve imported it in MongoDB consider the documents in the Aggregation section, including:

Data Type Fidelity

JSON does not have the following data types that exist in BSON documents: data_binary, data_date, data_timestamp, data_regex, data_oid and data_ref. As a result using any tool that decodes BSON documents into JSON will suffer some loss of fidelity.

If maintaining type fidelity is important, consider writing a data import and export system that does not force BSON documents into JSON form as part of the process. The following list of types contain examples for how MongoDB will represent how BSON documents render in JSON.

  • data_binary

    { "$binary" : "<bindata>", "$type" : "<t>" }
    

    <bindata> is the base64 representation of a binary string. <t> is the hexadecimal representation of a single byte indicating the data type.

  • data_date

    Date( <date> )
    

    <date> is the JSON representation of a 64-bit signed integer for milliseconds since epoch.

  • data_timestamp

    Timestamp( <t>, <i> )
    

    <t> is the JSON representation of a 32-bit unsigned integer for milliseconds since epoch. <i> is a 32-bit unsigned integer for the increment.

  • data_regex

    /<jRegex>/<jOptions>
    

    <jRegex> is a string that may contain valid JSON characters and unescaped double quote (i.e. ") characters, but may not contain unescaped forward slash (i.e. /) characters. <jOptions> is a string that may contain only the characters g, i, m, and s.

  • data_oid

    ObjectId( "<id>" )
    

    <id> is a 24 character hexadecimal string. These representations require that data_oid values have an associated field named “_id.”

  • data_ref

    DBRef( "<name>", "<id>" )
    

    <name> is a string of valid JSON characters. <id> is a 24 character hexadecimal string.

Data Import and Export and Backups Operations

For resilient and non-disruptive backups, use a file system or block-level disk snapshot function, such as the methods described in the “Backup Strategies for MongoDB Systems” document. The tools and operations discussed provide functionality that’s useful in the context of providing some kinds of backups.

By contrast, use import and export tools to backup a small subset of your data or to move data to or from a 3rd party system. These backups may capture a small crucial set of data or a frequently modified section of data, for extra insurance, or for ease of access. No matter how you decide to import or export your data, consider the following guidelines:

  • Label files so that you can identify what point in time the export or backup reflects.
  • Labeling should describe the contents of the backup, and reflect the subset of the data corpus, captured in the backup or export.
  • Do not create or apply exports if the backup process itself will have an adverse effect on a production system.
  • Make sure that they reflect a consistent data state. Export or backup processes can impact data integrity (i.e. type fidelity) and consistency if updates continue during the backup process.
  • Test backups and exports by restoring and importing to ensure that the backups are useful.

Human Intelligible Import/Export Formats

This section describes a process to import/export your database, or a portion thereof, to a file in a JSON or CSV format.

See also

The mongoimport and mongoexport documents contain complete documentation of these tools. If you have questions about the function and parameters of these tools not covered here, please refer to these documents.

If you want to simply copy a database or collection from one instance to another, consider using the copydb, clone, or cloneCollection commands, which may be more suited to this task. The mongo shell provides the db.copyDatabase() method.

These tools may also be useful for importing data into a MongoDB database from third party applications.

Collection Export with mongoexport

With the mongoexport utility you can create a backup file. In the most simple invocation, the command takes the following form:

mongoexport --collection collection --out collection.json

This will export all documents in the collection named collection into the file collection.json. Without the output specification (i.e. “--out collection.json”,) mongoexport writes output to standard output (i.e. “stdout.”) You can further narrow the results by supplying a query filter using the “--query” and limit results to a single database using the “--db” option. For instance:

mongoexport --db sales --collection contacts --query '{"field": 1}'

This command returns all documents in the sales database’s contacts collection, with a field named field with a value of 1. Enclose the query in single quotes (e.g. ') to ensure that it does not interact with your shell environment. The resulting documents will return on standard output.

By default, mongoexport returns one JSON document per MongoDB document. Specify the “--jsonArray” argument to return the export as a single JSON array. Use the “--csv” file to return the result in CSV (comma separated values) format.

If your mongod instance is not running, you can use the “--dbpath” option to specify the location to your MongoDB instance’s database files. See the following example:

mongoexport --db sales --collection contacts --dbpath /srv/MongoDB/

This reads the data files directly. This locks the data directory to prevent conflicting writes. The mongod process must not be running or attached to these data files when you run mongoexport in this configuration.

The “--host” and “--port” options allow you to specify a non-local host to connect to capture the export. Consider the following example:

mongoexport --host mongodb1.example.net --port 37017 --username user --password pass --collection contacts --file mdb1-examplenet.json

On any mongoexport command you may, as above specify username and password credentials as above.

Collection Import with mongoimport

To restore a backup taken with mongoexport. Most of the arguments to mongoexport also exist for mongoimport. Consider the following command:

mongoimport --collection collection --file collection.json

This imports the contents of the file collection.json into the collection named collection. If you do not specify a file with the “--file” option, mongoimport accepts input over standard input (e.g. “stdin.”)

If you specify the “--upsert” option, all of mongoimport operations will attempt to update existing documents in the database and insert other documents. This option will cause some performance impact depending on your configuration.

You can specify the database option --db to import these documents to a particular database. If your MongoDB instance is not running, use the “--dbpath” option to specify the location of your MongoDB instance’s database files. Consider using the “--journal” option to ensure that mongoimport records its operations in the journal. The mongod process must not be running or attached to these data files when you run mongoimport in this configuration.

Use the “--ignoreBlanks” option to ignore blank fields. For CSV and TSV imports, this option provides the desired functionality in most cases: it avoids inserting blank fields in MongoDB documents.

Linux ulimit Settings

The Linux kernel provides a system to limit and control the number of threads, connections, and open files on a per-process and per-user basis. These limits prevent single users from using too many system resources. Sometimes, these limits, as configured by the distribution developers, are too low for MongoDB and can cause a number of issues in the course of normal MongoDB operation. Generally, MongoDB should be the only user process on a system, to prevent resource contention.

Resource Utilization

mongod and mongos each use threads and file descriptors to track connections and manage internal operations. This section outlines the general resource utilization patterns for MongoDB. Use these figures in combination with the actual information about your deployment and its use to determine ideal ulimit settings.

Generally, all mongod and mongos instances, like other processes:

  • track each incoming connection with a file descriptor and a thread.
  • track each internal thread or pthread as a system process.
mongod
  • 1 file descriptor for each data file in use by the mongod instance.
  • 1 file descriptor for each journal file used by the mongod instance when journal is true.
  • In replica sets, each mongod maintains a connection to all other members of the set.

mongod uses background threads for a number of internal processes, including TTL collections, replication, and replica set health checks, which may require a small number of additional resources.

mongos

In addition to the threads and file descriptors for client connections, mongos must maintain connects to all config servers and all shards, which includes all members of all replica sets.

For mongos, consider the following behaviors:

  • mongos instances maintain a connection pool to each shard so that the mongos can reuse connections and quickly fulfill requests without needing to create new connections.

  • You can limit the number of incoming connections using the maxConns run-time option.

    By restricting the number of incoming connections you can prevent a cascade effect where the mongos creates too many connections on the mongod instances.

    Note

    You cannot set maxConns to a value higher than 20000.

Review and Set Resource Limits

ulimit

You can use the ulimit command at the system prompt to check system limits, as in the following example:

$ ulimit -a
-t: cpu time (seconds)         unlimited
-f: file size (blocks)         unlimited
-d: data seg size (kbytes)     unlimited
-s: stack size (kbytes)        8192
-c: core file size (blocks)    0
-m: resident set size (kbytes) unlimited
-u: processes                  192276
-n: file descriptors           21000
-l: locked-in-memory size (kb) 40000
-v: address space (kb)         unlimited
-x: file locks                 unlimited
-i: pending signals            192276
-q: bytes in POSIX msg queues  819200
-e: max nice                   30
-r: max rt priority            65
-N 15:                         unlimited

ulimit refers to the per-user limitations for various resources. Therefore, if your mongod instance executes as a user that is also running multiple processes, or multiple mongod processes, you might see contention for these resources. Also, be aware that the processes value (i.e. -u) refers to the combined number of distinct processes and sub-process threads.

You can change ulimit settings by issuing a command in the following form:

ulimit -n <value>

For many distributions of Linux you can change values by substituting the -n option for any possible value in the output of ulimit -a. See your operating system documentation for the precise procedure for changing system limits on running systems.

Note

After changing the ulimit settings, you must restart the process to take advantage of the modified settings. You can use the /proc file system to see the current limitations on a running process.

Depending on your system’s configuration, and default settings, any change to system limits made using ulimit may revert following system a system restart. Check your distribution and operating system documentation for more information.

/proc File System

Note

This section applies only to Linux operating systems.

The /proc file-system stores the per-process limits in the file system object located at /proc/<pid>/limits, where <pid> is the process’s PID or process identifier. You can use the following bash function to return the content of the limits object for a process or processes with a given name:

return-limits(){

     for process in $@; do
          process_pids=`ps -C $process -o pid --no-headers | cut -d " " -f 2`

          if [ -z $@ ]; then
             echo "[no $process running]"
          else
             for pid in $process_pids; do
                   echo "[$process #$pid -- limits]"
                   cat /proc/$pid/limits
             done
          fi

     done

}

You can copy and paste this function into a current shell session or load it as part of a script. Call the function with one the following invocations:

return-limits mongod
return-limits mongos
return-limits mongod mongos

The output of the first command may resemble the following:

[mongod #6809 -- limits]
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8720000              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             192276               192276               processes
Max open files            1024                 4096                 files
Max locked memory         40960000             40960000             bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       192276               192276               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         30                   30
Max realtime priority     65                   65
Max realtime timeout      unlimited            unlimited            us

Production Notes

This page details system configurations that affect MongoDB, especially in production.

Backups

To make backups of your MongoDB database, please refer to Backup Strategies for MongoDB Systems.

Networking

Always run MongoDB in a trusted environment, with network rules that prevent access from all unknown machines, systems, or networks. As with any sensitive system dependent on network access, your MongoDB deployment should only be accessible to specific systems that require access: application servers, monitoring services, and other MongoDB components.

See documents in the Security section for additional information, specifically:

For Windows users, consider the Windows Server Technet Article on TCP Configuration when deploying MongoDB on Windows.

MongoDB on Linux

If you use the Linux kernel, the MongoDB user community has recommended Linux kernel 2.6.36 or later for running MongoDB in production.

Because MongoDB preallocates its database files before using them and because MongoDB uses very large files on average, you should use the Ext4 and XFS file systems if using the Linux kernel:

  • If you use the Ext4 file system, use at least version 2.6.23 of the Linux Kernel.
  • If you use the XFS file system, use at least version 2.6.25 of the Linux Kernel.
  • If you are using a Red Hat derived distribution, use at least version 2.6.245.el5 of the Linux Kernel.

For MongoDB on Linux use the following recommended configurations:

  • Turn off atime for the storage volume with the database files.
  • Set the file descriptor limit and the user process limit above 20,000, according to the suggestions in Linux ulimit Settings. A low ulimit will affect MongoDB when under heavy use and will produce weird errors.
  • Do not use hugepages virtual memory pages, MongoDB performs better with normal virtual memory pages.
  • Disable NUMA in your BIOS. If that is not possible see NUMA.
  • Ensure that readahead settings for the block devices that store the database files are acceptable. See the Readahead section
  • Use NTP to synchronize time among your hosts. This is especially important in sharded clusters.

Readahead

For random access use patterns set readahead values low, for example setting readahead to a small value such as 32 (16KB) often works well.

MongoDB on Virtual Environments

The section describes considerations when running MongoDB in some of the more common virtual environments.

EC2

MongoDB is compatible with EC2 and requires no configuration changes specific to the environment.

VMWare

MongoDB is compatible with VMWare. Some in the MongoDB community have run into issues with VMWare’s memory overcommit feature and suggest disabling the feature.

You can clone a virtual machine running MongoDB. You might use this to spin up a new virtual host that will be added as a member of a replica set. If journaling is enabled, the clone snapshot will be consistent. If not using journaling, stop mongod, clone, and then restart.

OpenVZ

The MongoDB community has encountered issues running MongoDB on OpenVZ.

Disk and Storage Systems

Swap

Configure swap space for your systems. Having swap can prevent issues with memory contention and can prevent the OOM Killer on Linux systems from killing mongod. Because of the way mongod maps memory files to memory, the operating system will never store MongoDB data in swap.

RAID

Most MongoDB deployments should use disks backed by RAID-10.

RAID-5 and RAID-6 do not typically provide sufficient performance to support a MongoDB deployment.

RAID-0 provides good write performance but provides limited availability, and reduced performance on read operations, particularly using Amazon’s EBS volumes: as a result, avoid RAID-0 with MongoDB deployments.

Remote Filesystems

Some versions of NFS perform very poorly with MongoDB and NFS is not recommended for use with MongoDB. Performance problems arise when both the data files and the journal files are both hosted on NFS: you may experience better performance if you place the journal on local or iscsi volumes. If you must use NFS, add the following NFS options to your /etc/fstab file: bg, nolock, and noatime.

Many MongoDB deployments work successfully with Amazon’s Elastic Block Store (EBS) volumes. There are certain intrinsic performance characteristics, with EBS volumes that users should consider.

Hardware Requirements and Limitations

MongoDB is designed specifically with commodity hardware in mind and has few hardware requirements or limitations. MongoDB core components runs on little-endian hardware primarily x86/x86_64 processors. Client libraries (i.e. drivers) can run on big or little endian systems.

When installing hardware for MongoDB, consider the following:

  • As with all software, more RAM and a faster CPU clock speed are important to productivity.
  • Because databases do not perform high amounts of computation, increasing the number cores helps but does not provide a high level of marginal return.
  • MongoDB has good results and good price/performance with SATA SSD (Solid State Disk) and with PCI (Peripheral Component Interconnect).
  • Commodity (SATA) spinning drives are often a good option as the speed increase for random I/O for more expensive drives is not that dramatic (only on the order of 2x), spending that money on SSDs or RAM may be more effective.
MongoDB on NUMA Hardware

MongoDB and NUMA, Non-Uniform Access Memory, do not work well together. When running MongoDB on NUMA hardware, disable NUMA for MongoDB and run with an interleave memory policy. NUMA can cause a number of operational problems with MongoDB, including slow performance for periods of time or high system processor usage.

Note

On Linux, MongoDB version 2.0 and greater checks these settings on start up and prints a warning if the system is NUMA-based.

To disable NUMA for MongoDB, use the numactl command and start mongod in the following manner:

numactl --interleave=all /usr/bin/local/mongod

Adjust the proc settings using the following command:

echo 0 > /proc/sys/vm/zone_reclaim_mode

To fully disable NUMA you must perform both operations. However, you can change zone_reclaim_mode without restarting mongod. For more information, see documentation on Proc/sys/vm.

See the The MySQL “swap insanity” problem and the effects of NUMA post, which describes the effects of NUMA on databases. This blog post addresses the impact of NUMA for MySQL; however, the issues for MongoDB are similar. The post introduces NUMA its goals, and illustrates how these goals are not compatible with production databases.

Performance Monitoring

iostat

On Linux, use the iostat command to check if disk I/O is a bottleneck for your database. Specify a number of seconds when running iostat to avoid displaying stats covering the time since server boot.

For example:

iostat -xm 2

Use the mount command to see what device your data directory resides on.

Key fields from iostat:

  • %util: this is the most useful field for a quick check, it indicates what percent of the time the device/drive is in use.
  • avgrq-sz: average request size. Smaller number for this value reflect more random IO operations.
bwm-ng

bwm-ng is a command-line tool for monitoring network use. If you suspect a network-based bottleneck, you may use bwm-ng to begin your diagnostic process.

Production Checklist

64-bit Builds for Production

Always use 64-bit Builds for Production. MongoDB uses memory mapped files. See the 32-bit limitations for more information.

32-bit builds exist to support use on development machines and also for other miscellaneous things such as replica set arbiters.

BSON Document Size Limit

There is a BSON Document Size limit – at the time of this writing 16MB per document. If you have large objects, use GridFS instead.

Set Appropriate Write Concern for Write Operations

See Write Concern for more information.

Dynamic Schema

Data in MongoDB has a dynamic schema. Collections do not enforce document structure. This facilitates iterative development and polymorphism. However, collections often hold documents with highly homogeneous structures. See Data Modeling Considerations for MongoDB Applications for more information.

Some operational considerations include:

  • the exact set of collections to be used
  • the indexes to be used, which are created explicitly except for the _id index
  • shard key declarations, which are explicit and quite important as it is hard to change shard keys later

One very simple rule-of-thumb is not to import data from a relational database unmodified: you will generally want to “roll up” certain data into richer documents that use some embedding of nested documents and arrays (and/or arrays of subdocuments).

Updates by Default Affect Only one Document

Set the multi parameter to true to update() multiple documents that meet the query criteria. The mongo shell syntax is:

db.records.update(my_query, my_update_expression, bool_upsert, bool_multi)

Set bool_multi to true when updating many documents. Otherwise only the first matched will update.

Case Sensitive Strings

MongoDB strings are case sensitive. So a search for "joe" will not find "Joe".

Consider:

Type Sensitive Fields

MongoDB data – which is JSON-style, specifically, BSON format – have several data types.

Consider the following document which has a field x with the string value "123":

{ x : "123" }

Then the following query which looks for a number value 123 will not return that document:

db.mycollection.find( { x : 123 } )
Locking

Older versions of MongoDB used a “global lock”; use MongoDB v2.2+ for better results. See the Concurrency page for more information.

Packages

Be sure you have the latest stable release if you are using a package manager. You can see what is current on the Downloads page, even if you then choose to install via a package manager.

Use Odd Number of Replica Set Members

Replica sets perform consensus elections. Use either an odd number of members (e.g., three) or else use an arbiter to get up to an odd number of votes.

Don’t disable journaling

See Journaling for more information.

Keep Replica Set Members Up-to-Date

This is important as MongoDB replica sets support automatic failover. Thus you want your secondaries to be up-to-date. You have a few options here:

  1. Monitoring and alerts for any lagging can be done via various means. MMS shows a graph of replica set lag
  2. Using getLastError with w:'majority', you will get a timeout or no return if a majority of the set is lagging. This is thus another way to guard against lag and get some reporting back of its occurrence.
  3. Or, if you want to fail over manually, you can set your secondaries to priority:0 in their configuration. Then manual action would be required for a failover. This is practical for a small cluster; for a large cluster you will want automation.

Additionally, see information on replica set rollbacks.

Additional Deployment Considerations
  • Pick your shard keys carefully! There is no way to modify a shard key on a collection that is already sharded.
  • You cannot shard an existing collection over 256 gigabytes. To shard large amounts of data, create a new empty sharded collection, and ingest the data from the source collection using an application level import operation.
  • Unique indexes are not enforced across shards except for the shard key itself. See Enforce Unique Keys for Sharded Collections.
  • Consider pre-splitting a sharded collection before a massive bulk import. Usually this isn’t necessary but on a bulk import of size it is helpful.
  • Use security/auth mode if you need it. By default auth is not enabled and mongod assumes a trusted environment.
  • You do not have fully generalized transactions. Create rich documents and read the preceding link and consider the use case – often there is a good fit.
  • Disable NUMA for best results. If you have NUMA enabled, mongod will print a warning when it starts.
  • Avoid excessive prefetch/readahead on the filesystem. Check your prefetch settings. Note on linux the parameter is in sectors, not bytes. 32KBytes (a setting of 64 sectors) is pretty reasonable.
  • Check ulimit settings.
  • Use SSD if available and economical. Spinning disks can work well but SSDs’ capacity for random I/O operations work well with the update model of mongod. See Remote Filesystems for more info.
  • Ensure that clients keep reasonable pool sizes to avoid overloading the connection tracking capacity of a single mongod or mongos instance.

Use Database Commands

The MongoDB command interface provides access to all non CRUD database operations. Fetching server stats, initializing a replica set, and running a map-reduce job are all accomplished with commands.

See Database Commands for list of all commands sorted by function, and Database Commands for a list of all commands sorted alphabetically.

Database Command Form

You specify a command first by constructing a standard BSON document whose first key is the name of the command. For example, specify the isMaster command using the following BSON document:

{ isMaster: 1 }

Issue Commands

The mongo shell provides a helper method for running commands called db.runCommand(). The following operation in mongo runs the above command:

db.runCommand( { isMaster: 1 } )

Many drivers provide an equivalent for the db.runCommand() method. Internally, running commands with db.runCommand() is equivalent to a special query against the $cmd collection.

Many common commands have their own shell helpers or wrappers in the mongo shell and drivers, such as the db.isMaster() method in the mongo JavaScript shell.

admin Database Commands

You must run some commands on the admin database. Normally, these operations resemble the followings:

use admin
db.runCommand( {buildInfo: 1} )

However, there’s also a command helper that automatically runs the command in the context of the admin database:

db._adminCommand( {buildInfo: 1} )

Command Responses

All commands return, at minimum, a document with an ok field indicating whether the command has succeeded:

{ 'ok': 1 }

Failed commands return the ok field with a value of 0.

MongoDB Tutorials

This page lists the tutorials available as part of the MongoDB Manual. In addition to these documents, you can refer to the introductory MongoDB Tutorial. If there is a process or pattern that you would like to see included here, please open a Jira Case.

Administration

Security

The documentation in this section outlines basic security, risk management, and access control, and includes specific tasks for configuring firewalls, authentication, and system privileges. User roles in MongoDB provide granular control over user authorization and access.

If you believe you have discovered a vulnerability in MongoDB, please see Create a Vulnerability Report.

Security Concepts and Strategies

Security Practices and Management

This document describes risk mitigation in MongoDB deployments. As with all software running in a networked environment, administrators of MongoDB must consider security and risk exposures for a MongoDB deployment. There are no magic solutions for risk mitigation, and maintaining a secure MongoDB deployment is an ongoing process. This document takes a Defense in Depth approach to securing MongoDB deployments and addresses a number of different methods for managing risk and reducing risk exposure.

The intent of a Defense In Depth approach is to ensure there are no exploitable points of failure in your deployment that could allow an intruder or un-trusted party to access the data stored in the MongoDB database. The easiest and most effective way to reduce the risk of exploitation is to run MongoDB in a trusted environment, limit access, follow a system of least privilege, and follow best development and deployment practices. See the Strategies for Reducing Risk section.

For an outline of all security, authentication, and authorization documentation, see Security.

Strategies for Reducing Risk

The most effective way to reduce risk for MongoDB deployments is to run your entire MongoDB deployment, including all MongoDB components (i.e. mongod, mongos and application instances) in a trusted environment. Trusted environments use the following strategies to control access:

  • network filter (e.g. firewall) rules that block all connections from unknown systems to MongoDB components.
  • bind mongod and mongos instances to specific IP addresses to limit accessibility.
  • limit MongoDB programs to non-public local networks, and virtual private networks.

You may further reduce risk by:

  • requiring authentication for access to MongoDB instances.
  • requiring strong, complex, single purpose authentication credentials. This should be part of your internal security policy.
  • deploying a model of least privilege, where all users have only the amount of access they need to accomplish required tasks, and no more.
  • following the best application development and deployment practices, which includes: validating all inputs, managing sessions, and application-level access control.

Continue reading this document for more information on specific strategies and configurations to help reduce the risk exposure of your application.

Vulnerability Notification

10gen takes the security of MongoDB and associated products very seriously. If you discover a vulnerability in MongoDB or another 10gen product, or would like to know more about our vulnerability reporting and response process, see the Create a Vulnerability Report document.

Runtime Security Configuration

For configuration settings that affect security, see Security Considerations.

Networking Risk Exposure
Interfaces and Port Numbers

The following list includes all default ports used by MongoDB:

27017
This is the default port for mongod and mongos instances. You can change this port with port or --port.
27018
This is the default port when running with --shardsvr runtime operation or shardsvr setting.
27019
This is the default port when running with --configsvr runtime operation or configsvr setting.
28017
This is the default port for the web status page. This is always accessible at a port that is 1000 greater than the port determined by port.

By default MongoDB programs (i.e. mongos and mongod) will bind to all available network interfaces (i.e. IP addresses) on a system. The next section outlines various runtime options that allow you to limit access to MongoDB programs.

Network Interface Limitation

You can limit the network exposure with the following configuration options:

  • the nohttpinterface setting for mongod and mongos instances.

    Disables the “home” status page, which would run on port 28017 by default. The status interface is read-only by default. You may also specify this option on the command line as mongod --nohttpinterface or mongos --nohttpinterface. Authentication does not control or affect access to this interface.

    Important

    Disable this option for production deployments. If you do leave this interface enabled, you should only allow trusted clients to access this port. See Firewalls.

  • the port setting for mongod and mongos instances.

    Changes the main port on which the mongod or mongos instance listens for connections. Changing the port does not meaningfully reduce risk or limit exposure.

    You may also specify this option on the command line as mongod --port or mongos --port.

    Whatever port you attach mongod and mongos instances to, you should only allow trusted clients to connect to this port. See Firewalls.

  • the rest setting for mongod.

    Enables a fully interactive administrative REST interface, which is disabled by default. The status interface, which is enabled by default, is read-only. This configuration makes that interface fully interactive. The REST interface does not support any authentication and you should always restrict access to this interface to only allow trusted clients to connect to this port.

    You may also enable this interface on the command line as mongod --rest.

    Important

    Disable this option for production deployments. If do you leave this interface enabled, you should only allow trusted clients to access this port.

  • the bind_ip setting for mongod and mongos instances.

    Limits the network interfaces on which MongoDB programs will listen for incoming connections. You can also specify a number of interfaces by passing bind_ip a comma separated list of IP addresses. You can use the mongod --bind_ip and mongos --bind_ip option on the command line at run time to limit the network accessibility of a MongoDB program.

    Important

    Make sure that your mongod and mongos instances are only accessible on trusted networks. If your system has more than one network interface, bind MongoDB programs to the private or internal network interface.

Firewalls

Firewalls allow administrators to filter and control access to a system by providing granular control over what network communications. For administrators of MongoDB, the following capabilities are important:

  • limiting incoming traffic on a specific port to specific systems.
  • limiting incoming traffic from untrusted hosts.

On Linux systems, the iptables interface provides access to the underlying netfilter firewall. On Windows systems netsh command line interface provides access to the underlying Windows Firewall. For additional information about firewall configuration consider the following documents:

For best results and to minimize overall exposure, ensure that only traffic from trusted sources can reach mongod and mongos instances and that the mongod and mongos instances can only connect to trusted outputs.

See also

For MongoDB deployments on Amazon’s web services, see the Amazon EC2 page, which addresses Amazon’s Security Groups and other EC2-specific security features.

Virtual Private Networks

Virtual private networks, or VPNs, make it possible to link two networks over an encrypted and limited-access trusted network. Typically MongoDB users who use VPNs use SSL rather than IPSEC VPNs for performance issues.

Depending on configuration and implementation VPNs provide for certificate validation and a choice of encryption protocols, which requires a rigorous level of authentication and identification of all clients. Furthermore, because VPNs provide a secure tunnel, using a VPN connection to control access to your MongoDB instance, you can prevent tampering and “man-in-the-middle” attacks.

Operations

Always run the mongod or mongos process as a unique user with the minimum required permissions and access. Never run a MongoDB program as a root or administrative users. The system users that run the MongoDB processes should have robust authentication credentials that prevent unauthorized or casual access.

To further limit the environment, you can run the mongod or mongos process in a chroot environment. Both user-based access restrictions and chroot configuration follow recommended conventions for administering all daemon processes on Unix-like systems.

You can disable anonymous access to the database by enabling MongoDB authentication. See Access Control.

Interfaces

Simply limiting access to a mongod is not sufficient for totally controlling risk exposure. Consider the recommendations in the following section, for limiting exposure other interface-related risks.

JavaScript and the Security of the mongo Shell

Be aware of the following capabilities and behaviors of the mongo shell:

  • mongo will evaluate a .js file passed to the mongo --eval option. The mongo shell does not validate the input of JavaScript input to --eval.

  • mongo will evaluate a .mongorc.js file before starting. You can disable this behavior by passing the mongo --norc option.

    On Linux and Unix systems, mongo reads the .mongorc.js file from $HOME/.mongorc.js (i.e. ~/.mongorc.js), and Windows mongo.exe reads the .mongorc.js file from %HOME%.mongorc.js or %HOMEDRIVE%%HOMEPATH%.mongorc.js.

HTTP Status Interface

The HTTP status interface provides a web-based interface that includes a variety of operational data, logs, and status reports regarding the mongod or mongos instance. The HTTP interface is always available on the port numbered 1000 greater than the primary mongod port. By default this is 28017, but is indirectly set using the port option which allows you to configure the primary mongod port.

Without the rest setting, this interface is entirely read-only, and limited in scope; nevertheless, this interface may represent an exposure. To disable the HTTP interface, set the nohttpinterface run time option or the --nohttpinterface command line option.

REST API

The REST API to MongoDB provides additional information and write access on top of the HTTP Status interface. The REST interface is disabled by default, and is not recommended for production use.

While the REST API does not provide any support for insert, update, or remove operations, it does provide administrative access, and its accessibility represents a vulnerability in a secure environment.

If you must use the REST API, please control and limit access to the REST API. The REST API does not include any support for authentication, even when running with auth enabled.

See the following documents for instructions on restricting access to the REST API interface:

Data Encryption

To support audit requirements, you may need to encrypt data stored in MongoDB. For best results you can encrypt this data in the application layer, by encrypting the content of fields that hold secure data.

Additionally, 10gen has a partnership with Gazzang to encrypt and secure sensitive data within MongoDB. The solution encrypts data in real time and Gazzang provides advanced key management that ensures only authorized processes and can access this data. The Gazzang software ensures that the cryptographic keys remain safe and ensures compliance with standards including HIPAA, PCI-DSS, and FERPA. For more information consider the following resources:

Access Control

MongoDB provides support for authentication and authorization by storing a user’s credentials and privileges in a database’s system.users collection. MongoDB provisions authentication and access on a per-database level. Users exist in the context of a single logical database.

For MongoDB Enterprise installations, MongoDB also provides support for authentication using a Kerberos service. See Deploy MongoDB with Kerberos Authentication.

Authentication

MongoDB provides support for basic authentication by:

  • storing user credentials in a database’s system.users collection, and
  • providing the auth and keyFile configuration settings to enable authentication for a given mongod or mongos instance.

Authentication is disabled by default.

To enable authentication, see the following:

Authorization

MongoDB supports role-based access to databases and database operations by storing each user’s roles in a privilege document in the system.users collection. For a description of privilege documents and of available roles, see User Privilege Roles in MongoDB.

Changed in version 2.4: The schema of system.users changed to accommodate a more sophisticated user privilege model, as defined in privilege documents.

The system.users collection is protected to prevent privilege escalation attacks. To access the collection, you must have the userAdmin or userAdminAnyDatabase role.

To assign user roles, you must first create an admin user in the database. Then you create additional users, assigning them appropriate user roles.

To assign user roles, see the following:

User Roles in the admin Database

The admin database provides roles not available in other databases, including a role that effectively makes a user a MongoDB system superuser. See Database Administration Roles and Administrative Roles.

Authentication to One Database at a Time

You can log in as only one user for a given database, including the admin database. If you authenticate to a database as one user and later authenticate on the same database as a different user, the second authentication invalidates the first. Logging into a different database, however, does not invalidate authentication on other databases.

Tutorials

Network Security

Configure Linux iptables Firewall for MongoDB

On contemporary Linux systems, the iptables program provides methods for managing the Linux Kernel’s netfilter or network packet filtering capabilities. These firewall rules make it possible for administrators to control what hosts can connect to the system, and limit risk exposure by limiting the hosts that can connect to a system.

This document outlines basic firewall configurations for iptables firewalls on Linux. Use these approaches as a starting point for your larger networking organization. For a detailed over view of security practices and risk management for MongoDB, see Security Practices and Management.

See also

For MongoDB deployments on Amazon’s web services, see the Amazon EC2 page, which addresses Amazon’s Security Groups and other EC2-specific security features.

Overview

Rules in iptables configurations fall into chains, which describe the process for filtering and processing specific streams of traffic. Chains have an order, and packets must pass through earlier rules in a chain to reach later rules. This document only the following two chains:

INPUT
Controls all incoming traffic.
OUTPUT
Controls all outgoing traffic.

Given the default ports of all MongoDB processes, you must configure networking rules that permit only required communication between your application and the appropriate mongod and mongos instances.

Be aware that, by default, the default policy of iptables is to allow all connections and traffic unless explicitly disabled. The configuration changes outlined in this document will create rules that explicitly allow traffic from specific addresses and on specific ports, using a default policy that drops all traffic that is not explicitly allowed. When you have properly configured your iptables rules to allow only the traffic that you want to permit, you can Change Default Policy to DROP.

Patterns

This section contains a number of patterns and examples for configuring iptables for use with MongoDB deployments. If you have configured different ports using the port configuration setting, you will need to modify the rules accordingly.

Traffic to and from mongod Instances

This pattern is applicable to all mongod instances running as standalone instances or as part of a replica set.

The goal of this pattern is to explicitly allow traffic to the mongod instance from the application server. In the following examples, replace <ip-address> with the IP address of the application server:

iptables -A INPUT -s <ip-address> -p tcp --destination-port 27017 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27017 -m state --state ESTABLISHED -j ACCEPT

The first rule allows all incoming traffic from <ip-address> on port 27017, which allows the application server to connect to the mongod instance. The second rule, allows outgoing traffic from the mongod to reach the application server.

Optional

If you have only one application server, you can replace <ip-address> with either the IP address itself, such as: 198.51.100.55. You can also express this using CIDR notation as 198.51.100.55/32. If you want to permit a larger block of possible IP addresses you can allow traffic from a /24 using one of the following specifications for the <ip-address>, as follows:

10.10.10.10/24
10.10.10.10/255.255.255.0
Traffic to and from mongos Instances

mongos instances provide query routing for sharded clusters. Clients connect to mongos instances, which behave from the client’s perspective as mongod instances. In turn, the mongos connects to all mongod instances that are components of the sharded cluster.

Use the same iptables command to allow traffic to and from these instances as you would from the mongod instances that are members of the replica set. Take the configuration outlined in the Traffic to and from mongod Instances section as an example.

Traffic to and from a MongoDB Config Server

Config servers, host the config database that stores metadata for sharded clusters. Each production cluster has three config servers, initiated using the mongod --configsvr option. [1] Config servers listen for connections on port 27019. As a result, add the following iptables rules to the config server to allow incoming and outgoing connection on port 27019, for connection to the other config servers.

iptables -A INPUT -s <ip-address> -p tcp --destination-port 27019 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27019 -m state --state ESTABLISHED -j ACCEPT

Replace <ip-address> with the address or address space of all the mongod that provide config servers.

Additionally, config servers need to allow incoming connections from all of the mongos instances in the cluster and all mongod instances in the cluster. Add rules that resemble the following:

iptables -A INPUT -s <ip-address> -p tcp --destination-port 27019 -m state --state NEW,ESTABLISHED -j ACCEPT

Replace <ip-address> with the address of the mongos instances and the shard mongod instances.

[1]You can also run a config server by setting the configsvr option in a configuration file.
Traffic to and from a MongoDB Shard Server

For shard servers, running as mongod --shardsvr [2] Because the default port number when running with shardsvr is 27018, you must configure the following iptables rules to allow traffic to and from each shard:

iptables -A INPUT -s <ip-address> -p tcp --destination-port 27018 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27018 -m state --state ESTABLISHED -j ACCEPT

Replace the <ip-address> specification with the IP address of all mongod. This allows you to permit incoming and outgoing traffic between all shards including constituent replica set members, to:

  • all mongod instances in the shard’s replica sets.
  • all mongod instances in other shards. [3]

Furthermore, shards need to be able make outgoing connections to:

  • all mongos instances.
  • all mongod instances in the config servers.

Create a rule that resembles the following, and replace the <ip-address> with the address of the config servers and the mongos instances:

iptables -A OUTPUT -d <ip-address> -p tcp --source-port 27018 -m state --state ESTABLISHED -j ACCEPT
[2]You can also specify the shard server option using the shardsvr setting in the configuration file. Shard members are also often conventional replica sets using the default port.
[3]All shards in a cluster need to be able to communicate with all other shards to facilitate chunk and balancing operations.
Provide Access For Monitoring Systems
  1. The mongostat diagnostic tool, when running with the --discover needs to be able to reach all components of a cluster, including the config servers, the shard servers, and the mongos instances.

  2. If your monitoring system needs access the HTTP interface, insert the following rule to the chain:

    iptables -A INPUT -s <ip-address> -p tcp --destination-port 28017 -m state --state NEW,ESTABLISHED -j ACCEPT
    

    Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface. For all deployments, you should restrict access to this port to only the monitoring instance.

    Optional

    For shard server mongod instances running with shardsvr, the rule would resemble the following:

    iptables -A INPUT -s <ip-address> -p tcp --destination-port 28018 -m state --state NEW,ESTABLISHED -j ACCEPT
    

    For config server mongod instances running with configsvr, the rule would resemble the following:

    iptables -A INPUT -s <ip-address> -p tcp --destination-port 28019 -m state --state NEW,ESTABLISHED -j ACCEPT
    
Change Default Policy to DROP

The default policy for iptables chains is to allow all traffic. After completing all iptables configuration changes, you must change the default policy to DROP so that all traffic that isn’t explicitly allowed as above will not be able to reach components of the MongoDB deployment. Issue the following commands to change this policy:

iptables -P INPUT DROP

iptables -P OUTPUT DROP
Manage and Maintain iptables Configuration

This section contains a number of basic operations for managing and using iptables. There are various front end tools that automate some aspects of iptables configuration, but at the core all iptables front ends provide the same basic functionality:

Make all iptables Rules Persistent

By default all iptables rules are only stored in memory. When your system restarts, your firewall rules will revert to their defaults. When you have tested a rule set and have guaranteed that it effectively controls traffic you can use the following operations to you should make the rule set persistent.

On Red Hat Enterprise Linux, Fedora Linux, and related distributions you can issue the following command:

service iptables save

On Debian, Ubuntu, and related distributions, you can use the following command to dump the iptables rules to the /etc/iptables.conf file:

iptables-save > /etc/iptables.conf

Run the following operation to restore the network rules:

iptables-restore < /etc/iptables.conf

Place this command in your rc.local file, or in the /etc/network/if-up.d/iptables file with other similar operations.q

List all iptables Rules

To list all of currently applied iptables rules, use the following operation at the system shell.

iptables --L
Flush all iptables Rules

If you make a configuration mistake when entering iptables rules or simply need to revert to the default rule set, you can use the following operation at the system shell to flush all rules:

iptables --F

If you’ve already made your iptables rules persistent, you will need to repeat the appropriate procedure in the Make all iptables Rules Persistent section.

Configure Windows netsh Firewall for MongoDB

On Windows Server systems, the netsh program provides methods for managing the Windows Firewall. These firewall rules make it possible for administrators to control what hosts can connect to the system, and limit risk exposure by limiting the hosts that can connect to a system.

This document outlines basic Windows Firewall configurations. Use these approaches as a starting point for your larger networking organization. For a detailed over view of security practices and risk management for MongoDB, see Security Practices and Management.

See also

Windows Firewall documentation from Microsoft.

Overview

Windows Firewall processes rules in an ordered determined by rule type, and parsed in the following order:

  1. Windows Service Hardening
  2. Connection security rules
  3. Authenticated Bypass Rules
  4. Block Rules
  5. Allow Rules
  6. Default Rules

By default, the policy in Windows Firewall allows all outbound connections and blocks all incoming connections.

Given the default ports of all MongoDB processes, you must configure networking rules that permit only required communication between your application and the appropriate mongod.exe and mongos.exe instances.

The configuration changes outlined in this document will create rules which explicitly allow traffic from specific addresses and on specific ports, using a default policy that drops all traffic that is not explicitly allowed.

You can configure the Windows Firewall with using the netsh command line tool or through a windows application. On Windows Server 2008 this application is Windows Firewall With Advanced Security in Administrative Tools. On previous versions of Windows Server, access the Windows Firewall application in the System and Security control panel.

The procedures in this document use the netsh command line tool.

Patterns

This section contains a number of patterns and examples for configuring Windows Firewall for use with MongoDB deployments. If you have configured different ports using the port configuration setting, you will need to modify the rules accordingly.

Traffic to and from mongod.exe Instances

This pattern is applicable to all mongod.exe instances running as standalone instances or as part of a replica set. The goal of this pattern is to explicitly allow traffic to the mongod.exe instance from the application server.

netsh advfirewall firewall add rule name="Open mongod port 27017" dir=in action=allow protocol=TCP localport=27017

This rule allows all incoming traffic to port 27017, which allows the application server to connect to the mongod.exe instance.

Windows Firewall also allows enabling network access for an entire application rather than to a specific port, as in the following example:

netsh advfirewall firewall add rule name="Allowing mongod" dir=in action=allow program=" C:\mongodb\bin\mongod.exe"

You can allow all access for a mongos.exe server, with the following invocation:

netsh advfirewall firewall add rule name="Allowing mongos" dir=in action=allow program=" C:\mongodb\bin\mongos.exe"
Traffic to and from mongos.exe Instances

mongos.exe instances provide query routing for sharded clusters. Clients connect to mongos.exe instances, which behave from the client’s perspective as mongod.exe instances. In turn, the mongos.exe connects to all mongod.exe instances that are components of the sharded cluster.

Use the same Windows Firewall command to allow traffic to and from these instances as you would from the mongod.exe instances that are members of the replica set.

netsh advfirewall firewall add rule name="Open mongod shard port 27018" dir=in action=allow protocol=TCP localport=27018
Traffic to and from a MongoDB Config Server

Configuration servers, host the config database that stores metadata for sharded clusters. Each production cluster has three configuration servers, initiated using the mongod --configsvr option. [1] Configuration servers listen for connections on port 27019. As a result, add the following Windows Firewall rules to the config server to allow incoming and outgoing connection on port 27019, for connection to the other config servers.

netsh advfirewall firewall add rule name="Open mongod config svr port 27019" dir=in action=allow protocol=TCP localport=27019

Additionally, config servers need to allow incoming connections from all of the mongos.exe instances in the cluster and all mongod.exe instances in the cluster. Add rules that resemble the following:

netsh advfirewall firewall add rule name="Open mongod config svr inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=27019

Replace <ip-address> with the addresses of the mongos.exe instances and the shard mongod.exe instances.

[1]You can also run a config server by setting the configsvr option in a configuration file.
Traffic to and from a MongoDB Shard Server

For shard servers, running as mongod --shardsvr [2] Because the default port number when running with shardsvr is 27018, you must configure the following Windows Firewall rules to allow traffic to and from each shard:

netsh advfirewall firewall add rule name="Open mongod shardsvr inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=27018
netsh advfirewall firewall add rule name="Open mongod shardsvr outbound" dir=out action=allow protocol=TCP remoteip=<ip-address> localport=27018

Replace the <ip-address> specification with the IP address of all mongod.exe instances. This allows you to permit incoming and outgoing traffic between all shards including constituent replica set members to:

Furthermore, shards need to be able make outgoing connections to:

Create a rule that resembles the following, and replace the <ip-address> with the address of the config servers and the mongos.exe instances:

netsh advfirewall firewall add rule name="Open mongod config svr outbound" dir=out action=allow protocol=TCP remoteip=<ip-address> localport=27018
[2]You can also specify the shard server option using the shardsvr setting in the configuration file. Shard members are also often conventional replica sets using the default port.
[3]All shards in a cluster need to be able to communicate with all other shards to facilitate chunk and balancing operations.
Provide Access For Monitoring Systems
  1. The mongostat diagnostic tool, when running with the --discover needs to be able to reach all components of a cluster, including the config servers, the shard servers, and the mongos.exe instances.

  2. If your monitoring system needs access the HTTP interface, insert the following rule to the chain:

    netsh advfirewall firewall add rule name="Open mongod HTTP monitoring inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=28017
    

    Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface. For all deployments, you should restrict access to this port to only the monitoring instance.

    Optional

    For shard server mongod.exe instances running with shardsvr, the rule would resemble the following:

    netsh advfirewall firewall add rule name="Open mongos HTTP monitoring inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=28018
    

    For config server mongod.exe instances running with configsvr, the rule would resemble the following:

    netsh advfirewall firewall add rule name="Open mongod configsvr HTTP monitoring inbound" dir=in action=allow protocol=TCP remoteip=<ip-address> localport=28019
    
Manage and Maintain Windows Firewall Configurations

This section contains a number of basic operations for managing and using netsh. While you can use the GUI front ends to manage the Windows Firewall, all core functionality is accessible is accessible from netsh.

Delete all Windows Firewall Rules

To delete the firewall rule allowing mongod.exe traffic:

netsh advfirewall firewall delete rule name="Open mongod port 27017" protocol=tcp localport=27017

netsh advfirewall firewall delete rule name="Open mongod shard port 27018" protocol=tcp localport=27018
List All Windows Firewall Rules

To return a list of all Windows Firewall rules:

netsh advfirewall firewall show rule name=all
Reset Windows Firewall

To reset the Windows Firewall rules:

netsh advfirewall reset
Backup and Restore Windows Firewall Rules

To simplify administration of larger collection of systems, you can export or import firewall systems from different servers) rules very easily on Windows:

Export all firewall rules with the following command:

netsh advfirewall export "C:\temp\MongoDBfw.wfw"

Replace "C:\temp\MongoDBfw.wfw" with a path of your choosing. You can use a command in the following form to import a file created using this operation:

netsh advfirewall import "C:\temp\MongoDBfw.wfw"
Create a Vulnerability Report

If you believe you have discovered a vulnerability in MongoDB or a related product or have experienced a security incident related to MongoDB, please report the issue so that 10gen can respond appropriately and work to prevent additional issues in the future.

To report an issue, use either jira.mongodb.org (preferred) or email. 10gen responds to vulnerability notifications within 48 hours.

Information to Provide

All vulnerability reports should contain as much information as possible so 10gen can move quickly to resolve the issue. In particular, please include the following:

  • The name of the product.
  • Common Vulnerability information, if applicable, including:
    • CVSS (Common Vulnerability Scoring System) Score.
    • CVE (Common Vulnerability and Exposures) Identifier.
  • Contact information, including an email address and/or phone number, if applicable.
Create the Report in Jira

10gen prefers jira.mongodb.org for all communication regarding MongoDB and related products.

Submit a ticket in the Core Server Security project at: https://jira.mongodb.org/browse/SECURITY/. The ticket number will become the reference identification for the issue for the lifetime of the issue. You can use this identifier for tracking purposes.

Send the Report via Email

While Jira is preferred, you may also report vulnerabilities via email to security@10gen.com.

You may encrypt email using the 10gen public key at http://docs.mongodb.org/10gen-gpg-key.asc.

10gen responds to vulnerability reports sent via email with a response email that contains a reference number for a Jira ticket posted to the SECURITY project.

Evaluation of a Vulnerability Report

10gen validates all submitted vulnerabilities and uses Jira to track all communications regarding a vulnerability, including requests for clarification or additional information. If needed, 10gen representatives set up a conference call to exchange information regarding the vulnerability.

Disclosure

10gen requests that you do not publicly disclose any information regarding the vulnerability or exploit the issue until 10gen has had the opportunity to analyze the vulnerability, to respond to the notification, and to notify key users, customers, and partners.

The amount of time required to validate a reported vulnerability depends on the complexity and severity of the issue. 10gen takes all required vulnerabilities very seriously and will always ensure that there is a clear and open channel of communication with the reporter.

After validating an issue, 10gen coordinates public disclosure of the issue with the reporter in a mutually agreed timeframe and format. If required or requested, the reporter of a vulnerability will receive credit in the published security bulletin.

Access Control

Enable Authentication

Enable authentication using the auth or keyFile settings. Use auth for standalone instances, and keyFile with replica sets and sharded clusters. keyFile implies auth and allows members of a MongoDB deployment to authenticate internally.

Authentication requires at least one administrator user in the admin database. You can create the user before enabling authentication or after enabling authentication.

Also consider the password hashing issue resolved after 2.2.

Procedures

You can enable authentication using either of the following procedures, depending

Create the Administrator Credentials and then Enable Authentication
  1. Start the mongod or mongos instance without the auth or keyFile setting.
  2. Create the administrator user as described in Create a User Administrator.
  3. Re-start the mongod or mongos instance with the auth or keyFile setting.
Enable Authentication and then Create Administrator
  1. Start the mongod or mongos instance with the auth or keyFile setting.
  2. Connect to the instance on the same system so that you can authenticate using the localhost exception.
  3. Create the administrator user as described in Create a User Administrator.
Query Authenticated Users

If you have the userAdmin or userAdminAnyDatabase role on a database, you can query authenticated users in that database with the following operation:

db.system.users.find()
Create a User Administrator

In a MongoDB deployment, users with either the userAdmin or userAdminAnyDatabase roles are effective administrative “superusers”. Users with either of these roles can create and modify any other users and can assign them any privileges. The user also can grant itself any privileges. In production deployments, this user should have no other roles and should only administer users and privileges.

This should be the first user created for a MongoDB deployment. This user can then create all other users in the system.

Important

The userAdminAnyDatabase user can grant itself and any other user full access to the entire MongoDB instance. The credentials to log in as this user should be carefully controlled.

Users with the userAdminAnyDatabase and userAdminAnyDatabase privileges are not the same as the UNIX root superuser in that this role confers no additional access beyond user administration. These users cannot perform administrative operations or read or write data without first conferring themselves with additional permissions.

Note

The userAdmin is a database specific privilege, and only grants a user the ability to administer users on a single database. However, for the admin database, userAdmin allows a user the ability to gain userAdminAnyDatabase, and so for the admin database only these roles are effectively the same.

Create a User Administrator
  1. Connect to the mongod or mongos by either:

  2. Switch to the admin database:

    db = db.getSiblingDB('admin')
    
  3. Add the user with either the userAdmin role or userAdminAnyDatabase role, and only that role, by issuing a command similar to the following, where <username> is the username and <password> is the password:

    db.addUser( { user: "<username>",
                  pwd: "<password>",
                  roles: [ "userAdminAnyDatabase" ] } )
    
Authenticate with Full Administrative Access via Localhost

If there are no users for the admin database, you can connect with full administrative access via the localhost interface. This bypass exists to support bootstrapping new deployments. This approach is useful, for example, if you want to run mongod or mongos with authentication before creating your first user.

To authenticate via localhost, connect to the mongod or mongos from a client running on the same system. Your connection will have full administrative access.

To disable the localhost bypass, set the enableLocalhostAuthBypass parameter using setParameter during startup:

mongod --setParameter enableLocalhostAuthBypass=0

Note

For versions of MongoDB 2.2 prior to 2.2.4, if mongos is running with keyFile, then all users connecting over the localhost interface must authenticate, even if there aren’t any users in the admin database. Connections on localhost are not correctly granted full access on sharded systems that run those versions.

MongoDB 2.2.4 resolves this issue.

Note

In version 2.2, you cannot add the first user to a sharded cluster using the localhost connection. If you are running a 2.2 sharded cluster and want to enable authentication, you must deploy the cluster and add the first user to the admin database before restarting the cluster to run with keyFile.

Add a User to a Database

To add a user to a database you must authenticate to that database as a user with the userAdmin or userAdminAnyDatabase role. If you have not first created a user with one of those roles, do so as described in Create a User Administrator.

When adding a user to multiple databases, you must give the user a unique username and password combination for each database. See Password Hashing Insecurity for important security information.

To add a user, pass the db.addUser() method a well formed privilege document that contains the user’s credentials and privileges. The db.addUser() method adds the document to the database’s system.users collection.

For the structure of a privilege document, see system.users. For descriptions of user roles, see User Privilege Roles in MongoDB.

Example

The following creates a user named Alice in the products database and gives her readWrite and dbAdmin privileges.

use products
db.addUser( { user: "Alice",
              pwd: "Moon1234",
              roles: [ "readWrite", "dbAdmin" ]
            } )

Example

The following creates a user named Bob in the admin database. The privilege document uses Bob’s credentials from the products database and assigns him userAdmin privileges.

use admin
db.addUser( { user: "Bob",
              userSource: "products",
              roles: [ "userAdmin" ]
            } )

Example

The following creates a user named Carlos in the admin database and gives him readWrite access to the config database, which lets him change certain settings for sharded clusters, such as to disable the balancer.

db = db.getSiblingDB('admin')
db.addUser( { user: "Carlos",
              pwd: "Moon1234",
              roles: [ "clusterAdmin" ],
              otherDBRoles: { config: [ "readWrite" ]
            } } )

Only the admin database supports the otherDBRoles field.

Generate a Key File

This section describes how to generate a key file to store authentication information. After generating a key file, specify the key file using the keyFile option when starting a mongod or mongos instance.

A key file must be less than one kilobyte in size and may only contain characters in the base64 set. The key file must not have group or world permissions on UNIX systems. Key file permissions are not checked on Windows systems.

Generate a Key File on a Windows System

Use the following openssl command at the system shell to generate pseudo-random content for a key file for deployments with Windows components:

openssl rand -base64 741
Generate a Key File on a Linux or Unix System

Use the following openssl command at the system shell to generate pseudo-random content for a key file for systems that do not have Windows components (i.e. OS X, Unix, or Linux systems):

openssl rand -base64 753
Key File Properties

Be aware that MongoDB strips whitespace characters (e.g. x0d, x09, and x20,) for cross-platform convenience. As a result, the following operations produce identical keys:

echo -e "my secret key" > key1
echo -e "my secret key\n" > key2
echo -e "my    secret    key" > key3
echo -e "my\r\nsecret\r\nkey\r\n" > key4
Deploy MongoDB with Kerberos Authentication

New in version 2.4.

MongoDB Enterprise supports authentication using a Kerberos service to manage the authentication process. Kerberos is an industry standard authentication protocol for large client/server system. With Kerberos MongoDB and application ecosystems can take advantage of existing authentication infrastructure and processes.

Setting up and configuring a Kerberos deployment is beyond the scope of this document. In order to use MongoDB with Kerberos, you must have a properly configured Kerberos deployment and the ability to generate a valid keytab file for each mongod instance in your MongoDB deployment.

Note

The following assumes that you have a valid Kerberos keytab file for your realm accessible on your system. The examples below assume that the keytab file is valid and is located at /opt/mongodb/mongod.keytab and is only accessible to the user that runs the mongod process.

Process Overview

To run MongoDB with Kerberos support, you must:

  • Configure a Kerberos service principal for each mongod and mongos instance in your MongoDB deployment.
  • Generate and distribute keytab files for each MongoDB component (i.e. mongod and mongos)in your deployment. Ensure that you only transmit keytab files over secure channels.
  • Optional. Start the mongod instance without auth and create users inside of MongoDB that you can use to bootstrap your deployment.
  • Start mongod and mongos with the KRB5_KTNAME environment variable as well as a number of required run time options.
  • If you did not create Kerberos user accounts, you can use the localhost exception to create users at this point until you create the first user on the admin database.
  • Authenticate clients, including the mongo shell using Kerberos.
Operations
Create Users and Privilege Documents

For every user that you want to be able to authenticate using Kerberos, you must create corresponding privilege documents in the system.users collection to provision access to users. Consider the following document:

{
  user: "application/reporting@EXAMPLE.NET",
  roles: ["read"],
  userSource: "$external"
}

This grants the Kerberos user principal application/reporting@EXAMPLE.NET read only access to a database. The userSource $external reference allows mongod to consult an external source (i.e. Kerberos) to authenticate this user.

In the mongo shell you can pass the db.addUser() a user privilege document to provision access to users, as in the following operation:

db = db.getSiblingDB("records")
db.addUser( {
              "user": "application/reporting@EXAMPLE.NET",
              "roles": [ "read" ],
              "userSource": "$external"
            } )

These operations grants the Kerberos user application/reporting@EXAMPLE.NET access to the records database.

To remove access to a user, use the remove() method, as in the following example:

db.system.users.remove( { user: "application/reporting@EXAMPLE.NET" } )

To modify a user document, use update operations on documents in the system.users collection.

Start mongod with Kerberos Support

Once you have provisioned privileges to users in the mongod, and obtained a valid keytab file, you must start mongod using a command in the following form:

env KRB5_KTNAME=<path to keytab file> <mongod invocation>

For successful operation with mongod use the following run time options in addition to your normal default configuration options:

  • --setParameter with the authenticationMechanisms=GSSAPI argument to enable support for Kerberos.
  • --auth to enable authentication.
  • --keyFile to allow components of a single MongoDB deployment to communicate with each other, if needed to support replica set and sharded cluster operations. keyFile implies auth.

For example, consider the following invocation:

env KRB5_KTNAME=/opt/mongodb/mongod.keytab \
    /opt/mongodb/bin/mongod --dbpath /opt/mongodb/data \
    --fork --logpath /opt/mongodb/log/mongod.log \
    --auth --setParameter authenticationMechanisms=GSSAPI

You can also specify these options using the configuration file. As in the following:

# /opt/mongodb/mongod.conf, Example configuration file.

fork = true
auth = true

dbpath = /opt/mongodb/data
logpath = /opt/mongodb/log/mongod.log
setParameter = authenticationMechanisms=GSSAPI

To use this configuration file, start mongod as in the following:

env KRB5_KTNAME=/opt/mongodb/mongod.keytab \
    /opt/mongodb/bin/mongod --config /opt/mongodb/mongod.conf

To start a mongos instance using Kerberos, you must create a Kerberos service principal and deploy a keytab file for this instance, and then start the mongos with the following invocation:

env KRB5_KTNAME=/opt/mongodb/mongos.keytab \
    /opt/mongodb/bin/mongos
    --configdb shard0.example.net,shard1.example.net,shard2.example.net \
    --setParameter authenticationMechanisms=GSSAPI \
    --keyFile /opt/mongodb/mongos.keyfile

If you encounter problems when trying to start mongod or mongos, please see the troubleshooting section for more information.

Important

Before users can authenticate to MongoDB using Kerberos you must create users and grant them privileges within MongoDB. If you have not created users when you start MongoDB with Kerberos you can use the localhost authentication exception to add users. See the Create Users and Privilege Documents section and the User Privilege Roles in MongoDB document for more information.

Authenticate mongo Shell with Kerberos

To connect to a mongod instance using the mongo shell you must begin by using the kinit program to initialize and authenticate a Kerberos session. Then, start a mongo instance, and use the db.auth() method, to authenticate against the special $external database, as in the following operation:

use $external
db.auth( { mechanism: "GSSAPI", user: "application/reporting@EXAMPLE.NET" } )

Alternately, you can authenticate using command line options to mongo, as in the following equivalent example:

mongo --authenticationMechanism=GSSAPI
      --authenticationDatabase='$external' \
      --username application/reporting@EXAMPLE.NET

These operations authenticates the Kerberos principal name application/reporting@EXAMPLE.NET to the connected mongod, and will automatically acquire all available privileges as needed.

Use MongoDB Drivers to Authenticate with Kerberos

At the time of release, the C++, Java, C#, and Python drivers all provide support for Kerberos authentication to MongoDB. Consider the following tutorials for more information:

Troubleshooting
Kerberos Configuration Checklist

If you’re having trouble getting mongod to start with Kerberos, there are a number of Kerberos-specific issues that can prevent successful authentication. As you begin troubleshooting your Kerberos deployment, ensure that:

  • The mongod is from MongoDB Enterprise.
  • You have a valid keytab file specified in the environment running the mongod. For the mongod instance running on the db0.example.net host, the service principal should be mongodb/db0.example.net.
  • DNS allows the mongod to resolve the components of the Kerberos infrastructure. You should have both A and PTR records (i.e. forward and reverse DNS) for the system that runs the mongod instance.
  • The canonical system hostname of the system that runs the mongod instance is the resolvable fully qualified domain for this host. Test system hostname resolution with the hostname -f command at the system prompt.
  • Both the Kerberos KDC and the system running mongod instance must be able to resolve each other using DNS [1]
  • The time systems of the systems running the mongod instances and the Kerberos infrastructure are synchronized. Time differences greater than 5 minutes will prevent successful authentication.

If you still encounter problems with Kerberos, you can start both mongod and mongo (or another client) with the environment variable KRB5_TRACE set to different files to produce more verbose logging of the Kerberos process to help further troubleshooting, as in the following example:

env KRB5_KTNAME=/opt/mongodb/mongod.keytab \
    KRB5_TRACE=/opt/mongodb/log/mongodb-kerberos.log \
    /opt/mongodb/bin/mongod --dbpath /opt/mongodb/data \
    --fork --logpath /opt/mongodb/log/mongod.log \
    --auth --setParameter authenticationMechanisms=GSSAPI
[1]By default, Kerberos attempts to resolve hosts using the content of the /etc/kerb5.conf before using DNS to resolve hosts.
Common Error Messages

In some situations, MongoDB will return error messages from the GSSAPI interface if there is a problem with the Kerberos service.

GSSAPI error in client while negotiating security context.

This error occurs on the client and reflects insufficient credentials or a malicious attempt to authenticate.

If you receive this error ensure that you’re using the correct credentials and the correct fully qualified domain name when connecting to the host.

GSSAPI error acquiring credentials.

This error only occurs when attempting to start the mongod or mongos and reflects improper configuration of system hostname or a missing or incorrectly configured keytab file. If you encounter this problem, consider all the items in the Kerberos Configuration Checklist, in particular:

  • examine the keytab file, with the following command:

    klist -k <keytab>
    

    Replace <keytab> with the path to your keytab file.

  • check the configured hostname for your system, with the following command:

    hostname -f
    

    Ensure that this name matches the name in the keytab file, or use the saslHostName to pass MongoDB the correct hostname.

Enable the Traditional MongoDB Authentication Mechanism

For testing and development purposes you can enable both the Kerberos (i.e. GSSAPI) authentication mechanism in combination with the traditional MongoDB challenge/response authentication mechanism (i.e. MONGODB-CR), using the following setParameter run-time option:

mongod --setParameter authenticationMechanisms=GSSAPI,MONGODB-CR

Warning

All keyFile internal authentication between members of a replica set or sharded cluster still uses the MONGODB-CR authentication mechanism, even if MONGODB-CR is not enabled. All client authentication will still use Kerberos.

Reference

User Privilege Roles in MongoDB

New in version 2.4.

In version 2.4, MongoDB adds support for the following user roles:

Roles

Changed in version 2.4.

Roles in MongoDB provide users with a set of specific privileges, on specific logical databases. Users may have multiple roles and may have different roles on different logical database. Roles only grant privileges and never limit access: if a user has read and readWriteAnyDatabase permissions on the records database, that user will be able to write data to the records database.

Note

By default, MongoDB 2.4 is backwards-compatible with the MongoDB 2.2 access control roles. You can explicitly disable this backwards-compatibility by setting the supportCompatibilityFormPrivilegeDocuments option to 0 during startup, as in the following command-line invocation of MongoDB:

mongod --setParameter supportCompatibilityFormPrivilegeDocuments=0

In general, you should set this option if your deployment does not need to support legacy user documents. Typically legacy user documents are only useful during the upgrade process and while you migrate applications to the updated privilege document form.

See privilege documents and Delegated Credentials for MongoDB Authentication for more information about permissions and authentication in MongoDB.

Database User Roles
read

Provides users with the ability to read data from any collection within a specific logical database. This includes find() and the following database commands:

readWrite

Provides users with the ability to read from or write to any collection within a specific logical database. Users with readWrite have access to all of the operations available to read users, as well as the following basic write operations: insert(), remove(), and update().

Additionally, users with the readWrite have access to the following database commands:

Database Administration Roles
dbAdmin

Provides the ability to perform the following set of administrative operations within the scope of this logical database.

Furthermore, only dbAdmin has the ability to read the system.profile collection.

userAdmin

Allows users to read and write data to the system.users collection of any database. Users with this role will be able to modify permissions for existing users and create new users. userAdmin does not restrict the permissions that a user can grant, and a userAdmin user can grant privileges to themselves or other users in excess of the userAdmin users’ current privileges.

Important

userAdmin is effectively the superuser role for a specific database. Users with userAdmin can grant themselves all privileges. However, userAdmin does not explicitly authorize a user for any privileges beyond user administration.

Note

The userAdmin is a database specific privilege, and only grants a user the ability to administer users on a single database. However, for the admin database, userAdmin allows a user the ability to gain userAdminAnyDatabase, and so for the admin database only these roles are effectively the same.

Administrative Roles
clusterAdmin

clusterAdmin grants access to several administration operations that affect or present information about the whole system, rather than just a single database. These privileges include but are not limited to replica set and sharded cluster administrative functions.

clusterAdmin is only applicable on the admin database.

Specifically, users with the clusterAdmin role have access to the following operations:

Any Database Roles

Note

You must specify the following “any” database roles on the admin databases. These roles apply to all databases in a mongod instance and are roughly equivalent to their single-database equivalents.

If you add any of these roles to a user privilege document outside of the admin database, the privilege will have no effect. However, only the specification of the roles must occur in the admin database, with delegated authentication credentials, users can gain these privileges by authenticating to another database.

readAnyDatabase

readAnyDatabase provides users with the same read-only permissions as read, except it applies to all logical databases in the MongoDB environment.

readWriteAnyDatabase

readWriteAnyDatabase provides users with the same read and write permissions as readWrite, except it applies to all logical databases in the MongoDB environment.

userAdminAnyDatabase

userAdminAnyDatabase provides users with the same access to user administration operations as userAdmin, except it applies to all logical databases in the MongoDB environment.

Important

Because users with userAdminAnyDatabase and userAdmin have the ability to create and modify permissions in addition to their own level of access, this role is effectively the MongoDB system superuser. However, userAdminAnyDatabase and userAdmin do not explicitly authorize a user for any privileges beyond user administration.

dbAdminAnyDatabase

dbAdminAnyDatabase provides users with the same access to database administration operations as dbAdmin, except it applies to all logical databases in the MongoDB environment.

Combined Access

Some operations are only available to users that have multiple roles. Consider the following:

sh.status()
Requires clusterAdmin and read access to the config database.
applyOps, eval [1]
Requires readWriteAnyDatabase, userAdminAnyDatabase, dbAdminAnyDatabase and clusterAdmin (on the admin database.)
[1]The mongo shell provides db.eval() as a helper for the eval command. As a wrapper, db.eval() requires the same privileges.

system.users Privilege Documents

Changed in version 2.4.

Overview

The documents in the <database>.system.users collection store credentials and user privilege information used by the authentication system to provision access to users in the MongoDB system. See User Privilege Roles in MongoDB for more information about access roles, and Security for an overview security in MongoDB.

Data Model
<database>.system.users

Changed in version 2.4.

Documents in the <database>.system.users collection stores credentials and user roles for users who have access to the database. Consider the following prototypes of user privilege documents:

{
   user: "<username>",
   pwd: "<hash>",
   roles: []
}
{
   user: "<username>",
   userSource: "<database>",
   roles: []
}

Note

The pwd and userSource fields are mutually exclusive. A single document cannot contain both.

The following privilege document with the otherDBRoles field is only supported on the admin database:

{
   user: "<username>",
   userSource: "<database>",
   otherDBRoles: {
      <database0> : [],
      <database1> : []
   },
   roles: []
}

Consider the content of the following fields in the system.users documents:

<database>.system.users.user

user is a string that identifies each user. Users exist in the context of a single logical database; however, users from one database may obtain access in another database by way of the otherDBRoles field on the admin database, the userSource field, or the Any Database Roles.

<database>.system.users.pwd

pwd holds a hashed shared secret used to authenticate the user. pwd field is mutually exclusive with the userSource field.

<database>.system.users.roles

roles holds an array of user roles. The available roles are:

See Roles for full documentation of all available user roles.

<database>.system.users.userSource

A string that holds the name of the database that contains the credentials for the user. If userSource is $external, then MongoDB will use an external resource, such as Kerberos, for authentication credentials.

Note

In the current release, the only external authentication source is Kerberos, which is only available in MongoDB Enterprise.

Use userSource to ensure that a single user’s authentication credentials are only stored in a single location in a mongod instance’s data.

A userSource and user pair identifies a unique user in a MongoDB system.

admin.system.users.otherDBRoles

A document that holds one or more fields with a name that is the name of a database in the MongoDB instance with a value that holds a list of roles this user has on other databases. Consider the following example:

{
  user: "admin",
  userSource: "$external",
  roles: [ "clusterAdmin"],
  otherDBRoles:
  {
    config: [ "read" ],
    records: [ "dbadmin" ]
  }
}

This user has the following privileges:

Delegated Credentials for MongoDB Authentication

New in version 2.4.

With a new document format in the system.users collection, MongoDB now supports the ability to delegate authentication credentials to other sources and databases. The userSource field in these documents forces MongoDB to use another source for credentials.

Consider the following document in a system.users collection in a database named accounts:

{
   user: "application0",
   pwd: "YvuolxMtaycghk2GMrzmImkG4073jzAw2AliMRul",
   roles: []
}

Then for every database that the application0 user requires access, add documents to the system.users collection that resemble the following:

{
   user: "application0",
   roles: ['readWrite'],
   userSource: "accounts"
}

To gain privileges to databases where the application0 has access, you must first authenticate to the accounts database.

Disable Legacy Privilege Documents

By default MongoDB 2.4 includes support for both new, role-based privilege documents style as well 2.2 and earlier privilege documents. MongoDB assumes any privilege document without a roles field is a 2.2 or earlier document.

To ensure that mongod instances will only provide access to users defined with the new role-based privilege documents, use the following setParameter run-time option:

mongod --setParameter supportCompatibilityFormPrivilegeDocuments=0

Password Hashing Insecurity

In version 2.2 and earlier:

  • the read-write users of a database all have access to the system.users collection, which contains the user names and user password hashes. [1]

    Note

    In 2.4, only users with the userAdmin role have access to the system.users collection.

  • if a user has the same password for multiple databases, the hash will be the same. A malicious user could exploit this to gain access on a second database using a different user’s credentials.

As a result, always use unique username and password combinations for each database.

[1]Read-only users do not have access to the system.users collection.

Thanks to Will Urbanski, from Dell SecureWorks, for identifying this issue.

Core MongoDB Operations (CRUD)

CRUD stands for create, read, update, and delete, which are the four core database operations used in database driven application development. The CRUD Operations for MongoDB section provides introduction to each class of operation along with complete examples of each operation. The documents in the Read and Write Operations in MongoDB section provide a higher level overview of the behavior and available functionality of these operations.

Read and Write Operations in MongoDB

The Read Operations and Write Operations documents provide higher level introductions and description of the behavior and operations of read and write operations for MongoDB deployments. The BSON Documents provides an overview of documents and document-orientation in MongoDB.

Read Operations

Read operations include all operations that return a cursor in response to application request data (i.e. queries,) and also include a number of aggregation operations that do not return a cursor but have similar properties as queries. These commands include aggregate, count, and distinct.

This document describes the syntax and structure of the queries applications use to request data from MongoDB and how different factors affect the efficiency of reads.

Note

All of the examples in this document use the mongo shell interface. All of these operations are available in an idiomatic interface for each language by way of the MongoDB Driver. See your driver documentation for full API documentation.

Queries in MongoDB

In the mongo shell, the find() and findOne() methods perform read operations. The find() method has the following syntax: [1]

db.collection.find( <query>, <projection> )
  • The db.collection object specifies the database and collection to query. All queries in MongoDB address a single collection.

    You can enter db in the mongo shell to return the name of the current database. Use the show collections operation in the mongo shell to list the current collections in the database.

  • Queries in MongoDB are BSON objects that use a set of query operators to describe query parameters.

    The <query> argument of the find() method holds this query document. A read operation without a query document will return all documents in the collection.

  • The <projection> argument describes the result set in the form of a document. Projections specify or limit the fields to return.

    Without a projection, the operation will return all fields of the documents. Specify a projection if your documents are larger, or when your application only needs a subset of available fields.

  • The order of documents returned by a query is not defined and is not necessarily consistent unless you specify a sort (sort()).

For example, the following operation on the inventory collection selects all documents where the type field equals 'food' and the price field has a value less than 9.95. The projection limits the response to the item and qty, and _id field:

db.inventory.find( { type: 'food', price: { $lt: 9.95 } },
                   { item: 1, qty: 1 } )

The findOne() method is similar to the find() method except the findOne() method returns a single document from a collection rather than a cursor. The method has the syntax:

db.collection.findOne( <query>, <projection> )

For additional documentation and examples of the main MongoDB read operators, refer to the Read page of the Core MongoDB Operations (CRUD) section.

[1]db.collection.find() is a wrapper for the more formal query structure with the $query operator.
Query Document

This section provides an overview of the query document for MongoDB queries. See the preceding section for more information on queries in MongoDB.

The following examples demonstrate the key properties of the query document in MongoDB queries, using the find() method from the mongo shell, and a collection of documents named inventory:

  • An empty query document ({}) selects all documents in the collection:

    db.inventory.find( {} )
    

    Not specifying a query document to the find() is equivalent to specifying an empty query document. Therefore the following operation is equivalent to the previous operation:

    db.inventory.find()
    
  • A single-clause query selects all documents in a collection where a field has a certain value. These are simple “equality” queries.

    In the following example, the query selects all documents in the collection where the type field has the value snacks:

    db.inventory.find( { type: "snacks" } )
    
  • A single-clause query document can also select all documents in a collection given a condition or set of conditions for one field in the collection’s documents. Use the query operators to specify conditions in a MongoDB query.

    In the following example, the query selects all documents in the collection where the value of the type field is either 'food' or 'snacks':

    db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } )
    

    Note

    Although you can express this query using the $or operator, choose the $in operator rather than the $or operator when performing equality checks on the same field.

  • A compound query can specify conditions for more than one field in the collection’s documents. Implicitly, a logical AND conjunction connects the clauses of a compound query so that the query selects the documents in the collection that match all the conditions.

    In the following example, the query document specifies an equality match on a single field, followed by a range of values for a second field using a comparison operator:

    db.inventory.find( { type: 'food', price: { $lt: 9.95 } } )
    

    This query selects all documents where the type field has the value 'food' and the value of the price field is less than ($lt) 9.95.

  • Using the $or operator, you can specify a compound query that joins each clause with a logical OR conjunction so that the query selects the documents in the collection that match at least one condition.

    In the following example, the query document selects all documents in the collection where the field qty has a value greater than ($gt) 100 or the value of the price field is less than ($lt) 9.95:

    db.inventory.find( { $or: [ { qty: { $gt: 100 } },
                                { price: { $lt: 9.95 } } ]
                       } )
    
  • With additional clauses, you can specify precise conditions for matching documents. In the following example, the compound query document selects all documents in the collection where the value of the type field is 'food' and either the qty has a value greater than ($gt) 100 or the value of the price field is less than ($lt) 9.95:

    db.inventory.find( { type: 'food', $or: [ { qty: { $gt: 100 } },
                                              { price: { $lt: 9.95 } } ]
                        } )
    
Subdocuments

When the field holds an embedded document (i.e. subdocument), you can either specify the entire subdocument as the value of a field, or “reach into” the subdocument using dot notation, to specify values for individual fields in the subdocument:

  • Equality matches within subdocuments select documents if the subdocument matches exactly the specified subdocument, including the field order.

    In the following example, the query matches all documents where the value of the field producer is a subdocument that contains only the field company with the value 'ABC123' and the field address with the value '123 Street', in the exact order:

    db.inventory.find( {
                         producer: {
                                     company: 'ABC123',
                                     address: '123 Street'
                                   }
                       }
                     )
    
  • Equality matches for specific fields within subdocuments select documents when the field in the subdocument contains a field that matches the specified value.

    In the following example, the query uses the dot notation to match all documents where the value of the field producer is a subdocument that contains a field company with the value 'ABC123' and may contain other fields:

    db.inventory.find( { 'producer.company': 'ABC123' } )
    
Arrays

When the field holds an array, you can query for values in the array, and if the array holds sub-documents, you query for specific fields within the sub-documents using dot notation:

  • Equality matches can specify an entire array, to select an array that matches exactly. In the following example, the query matches all documents where the value of the field tags is an array and holds three elements, 'fruit', 'food', and 'citrus', in this order:

    db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } )
    
  • Equality matches can specify a single element in the array. If the array contains at least one element with the specified value, as in the following example: the query matches all documents where the value of the field tags is an array that contains, as one of its elements, the element 'fruit':

    db.inventory.find( { tags: 'fruit' } )
    

    Equality matches can also select documents by values in an array using the array index (i.e. position) of the element in the array, as in the following example: the query uses the dot notation to match all documents where the value of the tags field is an array whose first element equals 'fruit':

    db.inventory.find( { 'tags.0' : 'fruit' } )
    

In the following examples, consider an array that contains subdocuments:

  • If you know the array index of the subdocument, you can specify the document using the subdocument’s position.

    The following example selects all documents where the memos contains an array whose first element (i.e. index is 0) is a subdocument with the field by with the value 'shipping':

    db.inventory.find( { 'memos.0.by': 'shipping' } )
    
  • If you do not know the index position of the subdocument, concatenate the name of the field that contains the array, with a dot (.) and the name of the field in the subdocument.

    The following example selects all documents where the memos field contains an array that contains at least one subdocument with the field by with the value 'shipping':

    db.inventory.find( { 'memos.by': 'shipping' } )
    
  • To match by multiple fields in the subdocument, you can use either dot notation or the $elemMatch operator:

    The following example uses dot notation to query for documents where the value of the memos field is an array that has at least one subdocument that contains the field memo equal to 'on time' and the field by equal to 'shipping':

    db.inventory.find(
                       {
                         'memos.memo': 'on time',
                         'memos.by': 'shipping'
                       }
                     )
    

    The following example uses $elemMatch to query for documents where the value of the memos field is an array that has at least one subdocument that contains the field memo equal to 'on time' and the field by equal to 'shipping':

    db.inventory.find( { memos: {
                                  $elemMatch: {
                                                memo : 'on time',
                                                by: 'shipping'
                                              }
                                }
                       }
                     )
    

Refer to the Query, Update and Projection Operators document for the complete list of query operators.

Result Projections

The projection specification limits the fields to return for all matching documents. Restricting the fields to return can minimize network transit costs and the costs of deserializing documents in the application layer.

The second argument to the find() method is a projection, and it takes the form of a document with a list of fields for inclusion or exclusion from the result set. You can either specify the fields to include (e.g. { field: 1 }) or specify the fields to exclude (e.g. { field: 0 }). The _id field is, by default, included in the result set. To exclude the _id field from the result set, you need to specify in the projection document the exclusion of the _id field (i.e. { _id: 0 }).

Note

You cannot combine inclusion and exclusion semantics in a single projection with the exception of the _id field.

Consider the following projection specifications in find() operations:

  • If you specify no projection, the find() method returns all fields of all documents that match the query.

    db.inventory.find( { type: 'food' } )
    

    This operation will return all documents in the inventory collection where the value of the type field is 'food'.

  • A projection can explicitly include several fields. In the following operation, find() method returns all documents that match the query as well as item and qty fields. The results also include the _id field:

    db.inventory.find( { type: 'food' }, { item: 1, qty: 1 } )
    
  • You can remove the _id field from the results by specifying its exclusion in the projection, as in the following example:

    db.inventory.find( { type: 'food' }, { item: 1, qty: 1, _id:0 } )
    

    This operation returns all documents that match the query, and only includes the item and qty fields in the result set.

  • To exclude a single field or group of fields you can use a projection in the following form:

    db.inventory.find( { type: 'food' }, { type:0 } )
    

    This operation returns all documents where the value of the type field is food, but does not include the type field in the output.

    With the exception of the _id field you cannot combine inclusion and exclusion statements in projection documents.

The $elemMatch and $slice projection operators provide more control when projecting only a portion of an array.

Indexes

Indexes improve the efficiency of read operations by reducing the amount of data that query operations need to process and thereby simplifying the work associated with fulfilling queries within MongoDB. The indexes themselves are a special data structure that MongoDB maintains when inserting or modifying documents, and any given index can: support and optimize specific queries, sort operations, and allow for more efficient storage utilization. For more information about indexes in MongoDB see: Indexes and Indexing Overview.

You can create indexes using the db.collection.ensureIndex() method in the mongo shell, as in the following prototype operation:

db.collection.ensureIndex( { <field1>: <order>, <field2>: <order>, ... } )
  • The field specifies the field to index. The field may be a field from a subdocument, using dot notation to specify subdocument fields.

    You can create an index on a single field or a compound index that includes multiple fields in the index.

  • The order option is specifies either ascending ( 1 ) or descending ( -1 ).

    MongoDB can read the index in either direction. In most cases, you only need to specify indexing order to support sort operations in compound queries.

Covering a Query

An index covers a query, a covered query, when:

  • all the fields in the query are part of that index, and
  • all the fields returned in the documents that match the query are in the same index.

For these queries, MongoDB does not need to inspect at documents outside of the index, which is often more efficient than inspecting entire documents.

Example

Given a collection inventory with the following index on the type and item fields:

{ type: 1, item: 1 }

This index will cover the following query on the type and item fields, which returns only the item field:

db.inventory.find( { type: "food", item:/^c/ },
                   { item: 1, _id: 0 } )

However, this index will not cover the following query, which returns the item field and the _id field:

db.inventory.find( { type: "food", item:/^c/ },
                   { item: 1 } )

See Create Indexes that Support Covered Queries for more information on the behavior and use of covered queries.

Measuring Index Use

The explain() cursor method allows you to inspect the operation of the query system, and is useful for analyzing the efficiency of queries, and for determining how the query uses the index. Call the explain() method on a cursor returned by find(), as in the following example:

db.inventory.find( { type: 'food' } ).explain()

Note

Only use explain() to test the query operation, and not the timing of query performance. Because explain() attempts multiple query plans, it does not reflect accurate query performance.

If the above operation could not use an index, the output of explain() would resemble the following:

{
  "cursor" : "BasicCursor",
  "isMultiKey" : false,
  "n" : 5,
  "nscannedObjects" : 4000006,
  "nscanned" : 4000006,
  "nscannedObjectsAllPlans" : 4000006,
  "nscannedAllPlans" : 4000006,
  "scanAndOrder" : false,
  "indexOnly" : false,
  "nYields" : 2,
  "nChunkSkips" : 0,
  "millis" : 1591,
  "indexBounds" : { },
  "server" : "mongodb0.example.net:27017"
}

The BasicCursor value in the cursor field confirms that this query does not use an index. The explain.nscannedObjects value shows that MongoDB must scan 4,000,006 documents to return only 5 documents. To increase the efficiency of the query, create an index on the type field, as in the following example:

db.inventory.ensureIndex( { type: 1 } )

Run the explain() operation, as follows, to test the use of the index:

db.inventory.find( { type: 'food' } ).explain()

Consider the results:

{
  "cursor" : "BtreeCursor type_1",
  "isMultiKey" : false,
  "n" : 5,
  "nscannedObjects" : 5,
  "nscanned" : 5,
  "nscannedObjectsAllPlans" : 5,
  "nscannedAllPlans" : 5,
  "scanAndOrder" : false,
  "indexOnly" : false,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "millis" : 0,
  "indexBounds" : { "type" : [
                                [ "food",
                                  "food" ]
                             ] },
  "server" : "mongodbo0.example.net:27017" }

The BtreeCursor value of the cursor field indicates that the query used an index. This query:

  • returned 5 documents, as indicated by the n field;

  • scanned 5 documents from the index, as indicated by the nscanned field;

  • then read 5 full documents from the collection, as indicated by the nscannedObjects field.

    Although the query uses an index to find the matching documents, if indexOnly is false then an index could not cover the query: MongoDB could not both match the query conditions and return the results using only this index. See Create Indexes that Support Covered Queries for more information.

Query Optimization

The MongoDB query optimizer processes queries and chooses the most efficient query plan for a query given the available indexes. The query system then uses this query plan each time the query runs. The query optimizer occasionally reevaluates query plans as the content of the collection changes to ensure optimal query plans.

To create a new query plan, the query optimizer:

  1. runs the query against several candidate indexes in parallel.

  2. records the matches in a common results buffer or buffers.

    If an index returns a result already returned by another index, the optimizer skips the duplicate match. In the case of the two buffers, both buffers are de-duped.

  3. stops the testing of candidate plans and selects an index when one of the following events occur:

    • An unordered query plan has returned all the matching results; or
    • An ordered query plan has returned all the matching results; or
    • An ordered query plan has returned a threshold number of matching results:
      • Version 2.0: Threshold is the query batch size. The default batch size is 101.
      • Version 2.2: Threshold is 101.

The selected index becomes the index specified in the query plan; future iterations of this query or queries with the same query pattern will use this index. Query pattern refers to query select conditions that differ only in the values, as in the following two queries with the same query pattern:

db.inventory.find( { type: 'food' } )
db.inventory.find( { type: 'utensil' } )

To manually compare the performance of a query using more than one index, you can use the hint() and explain() methods in conjunction, as in the following prototype:

db.collection.find().hint().explain()

The following operations each run the same query but will reflect the use of the different indexes:

db.inventory.find( { type: 'food' } ).hint( { type: 1 } ).explain()
db.inventory.find( { type: 'food' } ).hint( { type: 1, name: 1 }).explain()

This returns the statistics regarding the execution of the query. For more information on the output of explain(), see cursor.explain().

Note

If you run explain() without including hint(), the query optimizer reevaluates the query and runs against multiple indexes before returning the query statistics.

As collections change over time, the query optimizer deletes a query plan and reevaluates the after any of the following events:

  • the collection receives 1,000 write operations.
  • the reIndex rebuilds the index.
  • you add or drop an index.
  • the mongod process restarts.

For more information, see Indexing Strategies.

Query Operations that Cannot Use Indexes Effectively

Some query operations cannot use indexes effectively or cannot use indexes at all. Consider the following situations:

  • The inequality operators $nin and $ne are not very selective, as they often match a large portion of the index.

    As a result, in most cases, a $nin or $ne query with an index may perform no better than a $nin or $ne query that must scan all documents in a collection.

  • Queries that specify regular expressions, with inline JavaScript regular expressions or $regex operator expressions, cannot use an index. However, the regular expression with anchors to the beginning of a string can use an index.

Cursors

The find() method returns a cursor to the results; however, in the mongo shell, if the returned cursor is not assigned to a variable, then the cursor is automatically iterated up to 20 times [2] to print up to the first 20 documents that match the query, as in the following example:

db.inventory.find( { type: 'food' } );

When you assign the find() to a variable:

  • you can call the cursor variable in the shell to iterate up to 20 times [2] and print the matching documents, as in the following example:

    var myCursor = db.inventory.find( { type: 'food' } );
    
    myCursor
    
  • you can use the cursor method next() to access the documents, as in the following example:

    var myCursor = db.inventory.find( { type: 'food' } );
    var myDocument = myCursor.hasNext() ? myCursor.next() : null;
    
    if (myDocument) {
        var myItem = myDocument.item;
        print(tojson(myItem));
    }
    

    As an alternative print operation, consider the printjson() helper method to replace print(tojson()):

    if (myDocument) {
        var myItem = myDocument.item;
        printjson(myItem);
    }
    
  • you can use the cursor method forEach() to iterate the cursor and access the documents, as in the following example:

    var myCursor =  db.inventory.find( { type: 'food' } );
    
    myCursor.forEach(printjson);
    

See JavaScript cursor methods and your driver documentation for more information on cursor methods.

[2](1, 2) You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20. See Executing Queries for more information.
Iterator Index

In the mongo shell, you can use the toArray() method to iterate the cursor and return the documents in an array, as in the following:

var myCursor = db.inventory.find( { type: 'food' } );
var documentArray = myCursor.toArray();
var myDocument = documentArray[3];

The toArray() method loads into RAM all documents returned by the cursor; the toArray() method exhausts the cursor.

Additionally, some drivers provide access to the documents by using an index on the cursor (i.e. cursor[index]). This is a shortcut for first calling the toArray() method and then using an index on the resulting array.

Consider the following example:

var myCursor = db.inventory.find( { type: 'food' } );
var myDocument = myCursor[3];

The myCursor[3] is equivalent to the following example:

myCursor.toArray() [3];
Cursor Behaviors

Consider the following behaviors related to cursors:

  • By default, the server will automatically close the cursor after 10 minutes of inactivity or if client has exhausted the cursor. To override this behavior, you can specify the noTimeout wire protocol flag in your query; however, you should either close the cursor manually or exhaust the cursor. In the mongo shell, you can set the noTimeout flag:

    var myCursor = db.inventory.find().addOption(DBQuery.Option.noTimeout);
    

    See your driver documentation for information on setting the noTimeout flag. See Cursor Flags for a complete list of available cursor flags.

  • Because the cursor is not isolated during its lifetime, intervening write operations may result in a cursor that returns a single document [3] more than once. To handle this situation, see the information on snapshot mode.

  • The MongoDB server returns the query results in batches:

    • For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte. Subsequent batch size is 4 megabytes. To override the default size of the batch, see batchSize() and limit().

    • For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort and will return all documents in the first batch.

    • Batch size will not exceed the maximum BSON document size.

    • As you iterate through the cursor and reach the end of the returned batch, if there are more results, cursor.next() will perform a getmore operation to retrieve the next batch.

      To see how many documents remain in the batch as you iterate the cursor, you can use the objsLeftInBatch() method, as in the following example:

      var myCursor = db.inventory.find();
      
      var myFirstDocument = myCursor.hasNext() ? myCursor.next() : null;
      
      myCursor.objsLeftInBatch();
      
  • You can use the command cursorInfo to retrieve the following information on cursors:

    • total number of open cursors
    • size of the client cursors in current use
    • number of timed out cursors since the last server restart

    Consider the following example:

    db.runCommand( { cursorInfo: 1 } )
    

    The result from the command returns the following document:

    {
      "totalOpen" : <number>,
      "clientCursors_size" : <number>,
      "timedOut" : <number>,
      "ok" : 1
    }
    
[3]A single document relative to value of the _id field. A cursor cannot return the same document more than once if the document has not changed.
Cursor Flags

The mongo shell provides the following cursor flags:

  • DBQuery.Option.tailable
  • DBQuery.Option.slaveOk
  • DBQuery.Option.oplogReplay
  • DBQuery.Option.noTimeout
  • DBQuery.Option.awaitData
  • DBQuery.Option.exhaust
  • DBQuery.Option.partial
Aggregation

Changed in version 2.2.

MongoDB can perform some basic data aggregation operations on results before returning data to the application. These operations are not queries; they use database commands rather than queries, and they do not return a cursor. However, they still require MongoDB to read data.

Running aggregation operations on the database side can be more efficient than running them in the application layer and can reduce the amount of data MongoDB needs to send to the application. These aggregation operations include basic grouping, counting, and even processing data using a map reduce framework. Additionally, in 2.2 MongoDB provides a complete aggregation framework for more rich aggregation operations.

The aggregation framework provides users with a “pipeline” like framework: documents enter from a collection and then pass through a series of steps by a sequence of pipeline operators that manipulate and transform the documents until they’re output at the end. The aggregation framework is accessible via the aggregate command or the db.collection.aggregate() helper in the mongo shell.

For more information on the aggregation framework see Aggregation.

Additionally, MongoDB provides a number of simple data aggregation operations for more basic data aggregation operations:

Architecture
Read Operations from Sharded Clusters

Sharded clusters allow you to partition a data set among a cluster of mongod in a way that is nearly transparent to the application. See the Sharding section of this manual for additional information about these deployments.

For a sharded cluster, you issue all operations to one of the mongos instances associated with the cluster. mongos instances route operations to the mongod in the cluster and behave like mongod instances to the application. Read operations to a sharded collection in a sharded cluster are largely the same as operations to a replica set or standalone instances. See the section on Read Operations in Sharded Clusters for more information.

In sharded deployments, the mongos instance routes the queries from the clients to the mongod instances that hold the data, using the cluster metadata stored in the config database.

For sharded collections, if queries do not include the shard key, the mongos must direct the query to all shards in a collection. These scatter gather queries can be inefficient, particularly on larger clusters, and are unfeasible for routine operations.

For more information on read operations in sharded clusters, consider the following resources:

Read Operations from Replica Sets

Replica sets use read preferences to determine where and how to route read operations to members of the replica set. By default, MongoDB always reads data from a replica set’s primary. You can modify that behavior by changing the read preference mode.

You can configure the read preference mode on a per-connection or per-operation basis to allow reads from secondaries to:

  • reduce latency in multi-data-center deployments,
  • improve read throughput by distributing high read-volumes (relative to write volume),
  • for backup operations, and/or
  • to allow reads during failover situations.

Read operations from secondary members of replica sets are not guaranteed to reflect the current state of the primary, and the state of secondaries will trail the primary by some amount of time. Often, applications don’t rely on this kind of strict consistency, but application developers should always consider the needs of their application before setting read preference.

For more information on read preference or on the read preference modes, see Read Preference and Read Preference Modes.

Write Operations

All operations that create or modify data in the MongoDB instance are write operations. MongoDB represents data as BSON documents stored in collections. Write operations target one collection and are atomic on the level of a single document: no single write operation can atomically affect more than one document or more than one collection.

This document introduces the write operators available in MongoDB as well as presents strategies to increase the efficiency of writes in applications.

Write Operators

For information on write operators and how to write data to a MongoDB database, see the following pages:

For information on specific methods used to perform write operations in the mongo shell, see the following:

For information on how to perform write operations from within an application, see the MongoDB Drivers and Client Libraries documentation or the documentation for your client library.

Write Concern

Write concern is a quality of every write operation issued to a MongoDB deployment, and describes the amount of concern the application has for the outcome of the write operation. With weak or disabled write concern, the application can send an write operation to MongoDB and then continue without waiting for a response from the database. With stronger write concerns, write operations wait until MongoDB acknowledges or confirms a successful write operation. MongoDB provides different levels of write concern to better address the specific needs of applications.

Note

The driver write concern change created a new connection class in all of the MongoDB drivers, called MongoClient with a different default write concern. See the release notes for this change, and the release notes for the driver you’re using for more information about your driver’s release.

Bulk Inserts

In some situations you may need to insert or ingest a large amount of data into a MongoDB database. These bulk inserts have some special considerations that are different from other write operations.

The insert() method, when passed an array of documents, will perform a bulk insert, and inserts each document atomically. Drivers provide their own interface for this kind of operation.

New in version 2.2: insert() in the mongo shell gained support for bulk inserts in version 2.2.

Bulk insert can significantly increase performance by amortizing write concern costs. In the drivers, you can configure write concern for batches rather than on a per-document level.

Drivers also have a ContinueOnError option in their insert operation, so that the bulk operation will continue to insert remaining documents in a batch even if an insert fails.

Note

New in version 2.0: Support for ContinueOnError depends on version 2.0 of the core mongod and mongos components.

If the bulk insert process generates more than one error in a batch job, the client will only receive the most recent error. All bulk operations to a sharded collection run with ContinueOnError, which applications cannot disable. See Strategies for Bulk Inserts in Sharded Clusters section for more information on consideration for bulk inserts in sharded clusters.

For more information see your driver documentation for details on performing bulk inserts in your application. Also consider the following resources: Sharded Clusters, Strategies for Bulk Inserts in Sharded Clusters, and Import and Export MongoDB Data.

Indexing

After every insert, update, or delete operation, MongoDB must update every index associated with the collection in addition to the data itself. Therefore, every index on a collection adds some amount of overhead for the performance of write operations. [1]

In general, the performance gains that indexes provide for read operations are worth the insertion penalty; however, when optimizing write performance, be careful when creating new indexes and always evaluate the indexes on the collection and ensure that your queries are actually using these indexes.

For more information on indexes in MongoDB consider Indexes and Indexing Strategies.

[1]The overhead for sparse indexes inserts and updates to un-indexed fields is less than for non-sparse indexes. Also for non-sparse indexes, updates that don’t change the record size have less indexing overhead.
Isolation

When a single write operation modifies multiple documents, the operation as a whole is not atomic, and other operations may interleave. The modification of a single document, or record, is always atomic, even if the write operation modifies multiple sub-document within the single record.

No other operations are atomic; however, you can attempt to isolate a write operation that affects multiple documents using the isolation operator.

To isolate a sequence of write operations from other read and write operations, see Perform Two Phase Commits.

Updates

Each document in a MongoDB collection has allocated record space which includes the entire document and a small amount of padding. This padding makes it possible for update operations to increase the size of a document slightly without causing the document to outgrow the allocated record size.

Documents in MongoDB can grow up to the full maximum BSON document size. However, when documents outgrow their allocated record size MongoDB must allocate a new record and move the document to the new record. Update operations that do not cause a document to grow, (i.e. in-place updates,) are significantly more efficient than those updates that cause document growth. Use data models that minimize the need for document growth when possible.

For complete examples of update operations, see Update.

Padding Factor

If an update operation does not cause the document to increase in size, MongoDB can apply the update in-place. Some updates change the size of the document, for example using the $push operator to append a sub-document to an array can cause the top level document to grow beyond its allocated space.

When documents grow, MongoDB relocates the document on disk with enough contiguous space to hold the document. These relocations take longer than in-place updates, particularly if the collection has indexes that MongoDB must update all index entries. If collection has many indexes, the move will impact write throughput.

To minimize document movements, MongoDB employs padding. MongoDB adaptively learns if documents in a collection tend to grow, and if they do, adds a paddingFactor so that the documents have room to grow on subsequent writes. The paddingFactor indicates the padding for new inserts and moves.

New in version 2.2: You can use the collMod command with the usePowerOf2Sizes flag so that MongoDB allocates document space in sizes that are powers of 2. This helps ensure that MongoDB can efficiently reuse the space freed as a result of deletions or document relocations. As with all padding, using document space allocations with power of 2 sizes minimizes, but does not eliminate, document movements.

To check the current paddingFactor on a collection, you can run the db.collection.stats() operation in the mongo shell, as in the following example:

db.myCollection.stats()

Since MongoDB writes each document at a different point in time, the padding for each document will not be the same. You can calculate the padding size by subtracting 1 from the paddingFactor, for example:

padding size = (paddingFactor - 1) * <document size>.

For example, a paddingFactor of 1.0 specifies no padding whereas a paddingFactor of 1.5 specifies a padding size of 0.5 or 50 percent (50%) of the document size.

Because the paddingFactor is relative to the size of each document, you cannot calculate the exact amount of padding for a collection based on the average document size and padding factor.

If an update operation causes the document to decrease in size, for instance if you perform an $unset or a $pop update, the document remains in place and effectively has more padding. If the document remains this size, the space is not reclaimed until you perform a compact or a repairDatabase operation.

Note

The following operations remove padding:

However, with the compact command, you can run the command with a paddingFactor or a paddingBytes parameter.

Padding is also removed if you use mongoexport from a collection. If you use mongoimport into a new collection, mongoimport will not add padding. If you use mongoimport with an existing collection with padding, mongoimport will not affect the existing padding.

When a database operation removes padding, subsequent update that require changes in record sizes will have reduced throughput until the collection’s padding factor grows. Padding does not affect in-place, and after compact, repairDatabase, and replica set initial sync the collection will require less storage.

Architecture
Replica Sets

In replica sets, all write operations go to the set’s primary, which applies the write operation then records the operations on the primary’s operation log or oplog. The oplog is a reproducible sequence of operations to the data set. Secondary members of the set are continuously replicating the oplog and applying the operations to themselves in an asynchronous process.

Large volumes of write operations, particularly bulk operations, may create situations where the secondary members have difficulty applying the replicating operations from the primary at a sufficient rate: this can cause the secondary’s state to fall behind that of the primary. Secondaries that are significantly behind the primary present problems for normal operation of the replica set, particularly failover in the form of rollbacks as well as general read consistency.

To help avoid this issue, you can customize the write concern to return confirmation of the write operation to another member [2] of the replica set every 100 or 1,000 operations. This provides an opportunity for secondaries to catch up with the primary. Write concern can slow the overall progress of write operations but ensure that the secondaries can maintain a largely current state with respect to the primary.

For more information on replica sets and write operations, see Replica Acknowledged, Oplog, Oplog Internals, and Change the Size of the Oplog.

[2]Calling getLastError intermittently with a w value of 2 or majority will slow the throughput of write traffic; however, this practice will allow the secondaries to remain current with the state of the primary.
Sharded Clusters

In a sharded cluster, MongoDB directs a given write operation to a shard and then performs the write on a particular chunk on that shard. Shards and chunks are range-based. Shard keys affect how MongoDB distributes documents among shards. Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster.

For more information, see Sharded Cluster Administration and Bulk Inserts.

Write Concern Reference

Overview

Write concern is a quality of every write operation issued to a MongoDB deployment, and describes the amount of concern the application has for the outcome of the write operation. With weak or disabled write concern, the application can send an write operation to MongoDB and then continue without waiting for a response from the database. With stronger write concerns, write operations wait until MongoDB acknowledges or confirms a successful write operation. MongoDB provides different levels of write concern to better address the specific needs of applications.

See also

Write Concern for an introduction to write concern in MongoDB.

Available Write Concern

To provide write concern, drivers issue the getLastError command after a write operation and receive a document with information about the last operation. This document’s err field contains either:

  • null, which indicates the write operations have completed successfully, or
  • a description of the last error encountered.

The definition of a “successful write” depends on the arguments specified to getLastError, or in replica sets, the configuration of getLastErrorDefaults. When deciding the level of write concern for your application, see the introduction to Write Concern.

The getLastError command has the following options to configure write concern requirements:

  • j or “journal” option

    This option confirms that the mongod instance has written the data to the on-disk journal and ensures data is not lost if the mongod instance shuts down unexpectedly. Set to true to enable, as shown in the following example:

    db.runCommand( { getLastError: 1, j: "true" } )
    

    If you set journal to true, and the mongod does not have journaling enabled, as with nojournal, then getLastError will provide basic receipt acknowledgment, and will include a jnote field in its return document.

  • w option

    This option provides the ability to disable write concern entirely as well as specifies the write concern operations for replica sets. See Write Concern Considerations for an introduction to the fundamental concepts of write concern. By default, the w option is set to 1, which provides basic receipt acknowledgment on a single mongod instance or on the primary in a replica set.

    The w option takes the following values:

    • -1:

      Disables all acknowledgment of write operations, and suppresses all errors, including network and socket errors.

    • 0:

      Disables basic acknowledgment of write operations, but returns information about socket exceptions and networking errors to the application.

      Note

      If you disable basic write operation acknowledgment but require journal commit acknowledgment, the journal commit prevails, and the driver will require that mongod will acknowledge the write operation.

    • 1:

      Provides acknowledgment of write operations on a standalone mongod or the primary in a replica set.

    • A number greater than 1:

      Guarantees that write operations have propagated successfully to the specified number of replica set members including the primary. If you set w to a number that is greater than the number of set members that hold data, MongoDB waits for the non-existent members to become available, which means MongoDB blocks indefinitely.

    • majority:

      Confirms that write operations have propagated to the majority of configured replica set: a majority of the set’s configured members must acknowledge the write operation before it succeeds. This ensures that write operation will never be subject to a rollback in the course of normal operation, and furthermore allows you to avoid hard coding assumptions about the size of your replica set into your application.

    • A tag set:

      By specifying a tag set you can have fine-grained control over which replica set members must acknowledge a write operation to satisfy the required level of write concern.

getLastError also supports a wtimeout setting which allows clients to specify a timeout for the write concern: if you don’t specify wtimeout and the mongod cannot fulfill the write concern the getLastError will block, potentially forever.

For more information on write concern and replica sets, see Write Concern for Replica Sets for more information.

In sharded clusters, mongos instances will pass write concern on to the shard mongod instances.

Fundamental Concepts for Document Databases

BSON Documents

MongoDB is a document-based database system, and as a result, all records, or data, in MongoDB are documents. Documents are the default representation of most user accessible data structures in the database. Documents provide structure for data in the following MongoDB contexts:

Structure

The document structure in MongoDB are BSON objects with support for the full range of BSON types; however, BSON documents are conceptually, similar to JSON objects, and have the following structure:

{
   field1: value1,
   field2: value2,
   field3: value3,
   ...
   fieldN: valueN
}

Having support for the full range of BSON types, MongoDB documents may contain field and value pairs where the value can be another document, an array, an array of documents as well as the basic types such as Double, String, and Date. See also BSON Type Considerations.

Consider the following document that contains values of varying types:

var mydoc = {
               _id: ObjectId("5099803df3f4948bd2f98391"),
               name: { first: "Alan", last: "Turing" },
               birth: new Date('Jun 23, 1912'),
               death: new Date('Jun 07, 1954'),
               contribs: [ "Turing machine", "Turing test", "Turingery" ],
               views : NumberLong(1250000)
            }

The document contains the following fields:

  • _id that holds an ObjectId.
  • name that holds a subdocument that contains the fields first and last.
  • birth and death, which both have Date types.
  • contribs that holds an array of strings.
  • views that holds a value of NumberLong type.

All field names are strings in BSON documents. Be aware that there are some restrictions on field names for BSON documents: field names cannot contain null characters, dots (.), or dollar signs ($).

Note

BSON documents may have more than one field with the same name; however, most MongoDB Interfaces represent MongoDB with a structure (e.g. a hash table) that does not support duplicate field names. If you need to manipulate documents that have more than one field with the same name, see your driver’s documentation for more information.

Some documents created by internal MongoDB processes may have duplicate fields, but no MongoDB process will ever add duplicate keys to an existing user document.

Type Operators

To determine the type of fields, the mongo shell provides the following operators:

  • instanceof returns a boolean to test if a value has a specific type.
  • typeof returns the type of a field.

Example

Consider the following operations using instanceof and typeof:

  • The following operation tests whether the _id field is of type ObjectId:

    mydoc._id instanceof ObjectId
    

    The operation returns true.

  • The following operation returns the type of the _id field:

    typeof mydoc._id
    

    In this case typeof will return the more generic object type rather than ObjectId type.

Dot Notation

MongoDB uses the dot notation to access the elements of an array and to access the fields of a subdocument.

To access an element of an array by the zero-based index position, you concatenate the array name with the dot (.) and zero-based index position:

'<array>.<index>'

To access a field of a subdocument with dot-notation, you concatenate the subdocument name with the dot (.) and the field name:

'<subdocument>.<field>'

See also

  • Subdocuments for dot notation examples with subdocuments.
  • Arrays for dot notation examples with arrays.
Document Types in MongoDB
Record Documents

Most documents in MongoDB in collections store data from users’ applications.

These documents have the following attributes:

  • The maximum BSON document size is 16 megabytes.

    The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS.

  • Documents have the following restrictions on field names:

    • The field name _id is reserved for use as a primary key; its value must be unique in the collection, is immutable, and may be of any type other than an array.
    • The field names cannot start with the $ character.
    • The field names cannot contain the . character.

Note

Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the insert operation to MongoDB; however, if the client sends a document without an _id field, the mongod will add the _id field and generate the ObjectId.

The following document specifies a record in a collection:

{
  _id: 1,
  name: { first: 'John', last: 'Back