Replication Introduction

Replication is the process of synchronizing data across multiple servers.

Purpose of Replication

Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.

In some cases, you can use replication to increase read capacity. Clients have the ability to send read and write operations to different servers. You can also maintain copies in different data centers to increase the locality and availability of data for distributed applications.

Replication in MongoDB

A replica set is a group of mongod instances that host the same data set. One mongod, the primary, receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set.

The primary accepts all write operations from clients. Replica set can have only one primary. Because only one member can accept write operations, replica sets provide strict consistency for all reads from the primary. To support replication, the primary logs all changes to its data sets in its oplog. See primary for more information.

Diagram of default routing of reads and writes to the primary.

The secondaries replicate the primary’s oplog and apply the operations to their data sets. Secondaries’ data sets reflect the primary’s data set. If the primary is unavailable, the replica set will elect a secondary to be primary. By default, clients read from the primary, however, clients can specify a read preferences to send read operations to secondaries. Reads from secondaries may return data that does not reflect the state of the primary. See secondaries for more information.

Diagram of a 3 member replica set that consists of a primary and two secondaries.

You may add an extra mongod instance to a replica set as an arbiter. Arbiters do not maintain a data set. Arbiters only exist to vote in elections. If your replica set has an even number of members, add an arbiter to obtain a majority of votes in an election for primary. Arbiters do not require dedicated hardware. See arbiter for more information.

Diagram of a replica set that consists of a primary, a secondary, and an arbiter.

An arbiter will always be an arbiter. A primary may step down and become a secondary. A secondary may become the primary during an election.

Asynchronous Replication

Secondaries apply operations from the primary asynchronously. By applying operations after the primary, sets can continue to function without some members. However, as a result secondaries may not return the most current data to clients.

See Replica Set Oplog and Replica Set Data Synchronization for more information. See Read Preference for more on read operations and secondaries.

Automatic Failover

When a primary does not communicate with the other members of the set for more than 10 seconds, the replica set will attempt to select another member to become the new primary. The first secondary that receives a majority of the votes becomes primary.

Diagram of an election of a new primary. In a three member replica set with two secondaries, the primary becomes unreachable. The loss of a primary triggers an election where one of the secondaries becomes the new primary

See Replica Set Elections and Rollbacks During Replica Set Failover for more information.

Additional Features

Replica sets provide a number of options to support application needs. For example, you may deploy a replica set with members in multiple data centers, or control the outcome of elections by adjusting the priority of some members. Replica sets also support dedicated members for reporting, disaster recovery, or backup functions.

See Priority 0 Replica Set Members, Hidden Replica Set Members and Delayed Replica Set Members for more information.