OPTIONS

Troubleshoot Replica Sets

This section describes common strategies for troubleshooting replica set deployments.

Check Replica Set Status

To display the current state of the replica set and current state of each member, run the rs.status() method in a mongo shell connected to the replica set’s primary. For descriptions of the information displayed by rs.status(), see replSetGetStatus.

Note

The rs.status() method is a wrapper that runs the replSetGetStatus database command.

Check the Replication Lag

Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. Replication lag can be a significant issue and can seriously affect MongoDB replica set deployments. Excessive replication lag makes “lagged” members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent.

To check the current length of replication lag:

  • In a mongo shell connected to the primary, call the rs.printSlaveReplicationInfo() method.

    Returns the syncedTo value for each member, which shows the time when the last oplog entry was written to the secondary, as shown in the following example:

    source: m1.example.net:27017
        syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
        0 secs (0 hrs) behind the primary
    source: m2.example.net:27017
        syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
        0 secs (0 hrs) behind the primary
    

    A delayed member may show as 0 seconds behind the primary when the inactivity period on the primary is greater than the slaveDelay value.

    Note

    The rs.status() method is a wrapper around the replSetGetStatus database command.

  • Monitor the rate of replication by watching the oplog time in the “replica” graph in the MongoDB Management Service. For more information see the documentation for MMS.

Possible causes of replication lag include:

  • Network Latency

    Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.

    Use tools including ping to test latency between set members and traceroute to expose the routing of packets network endpoints.

  • Disk Throughput

    If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary, then the secondary will have difficulty keeping state. Disk-related issues are incredibly prevalent on multi-tenant systems, including virtualized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazon’s EBS system.)

    Use system-level tools to assess disk status, including iostat or vmstat.

  • Concurrency

    In some cases, long-running operations on the primary can block replication on secondaries. For best results, configure write concern to require confirmation of replication to secondaries, as described in replica set write concern. This prevents write operations from returning if replication cannot keep up with the write load.

    Use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.

  • Appropriate Write Concern

    If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, particularly with unacknowledged write concern, the secondaries will not be able to read the oplog fast enough to keep up with changes.

    To prevent this, require write acknowledgment or journaled write concern after every 100, 1,000, or an another interval to provide an opportunity for secondaries to catch up with the primary.

    For more information see:

Test Connections Between all Members

All members of a replica set must be able to connect to every other member of the set to support replication. Always verify connections in both “directions.” Networking topologies and firewall configurations prevent normal and required connectivity, which can block replication.

Consider the following example of a bidirectional test of networking:

Example

Given a replica set with three members running on three separate hosts:

  • m1.example.net
  • m2.example.net
  • m3.example.net
  1. Test the connection from m1.example.net to the other hosts with the following operation set m1.example.net:

    mongo --host m2.example.net --port 27017
    
    mongo --host m3.example.net --port 27017
    
  2. Test the connection from m2.example.net to the other two hosts with the following operation set from m2.example.net, as in:

    mongo --host m1.example.net --port 27017
    
    mongo --host m3.example.net --port 27017
    

    You have now tested the connection between m2.example.net and m1.example.net in both directions.

  3. Test the connection from m3.example.net to the other two hosts with the following operation set from the m3.example.net host, as in:

    mongo --host m1.example.net --port 27017
    
    mongo --host m2.example.net --port 27017
    

If any connection, in any direction fails, check your networking and firewall configuration and reconfigure your environment to allow these connections.

Socket Exceptions when Rebooting More than One Secondary

When you reboot members of a replica set, ensure that the set is able to elect a primary during the maintenance. This means ensuring that a majority of the set’s ‘votes are available.

When a set’s active members can no longer form a majority, the set’s primary steps down and becomes a secondary. The former primary closes all open connections to client applications. Clients attempting to write to the former primary receive socket exceptions and Connection reset errors until the set can elect a primary.

Example

Given a three-member replica set where every member has one vote, the set can elect a primary only as long as two members can connect to each other. If two you reboot the two secondaries once, the primary steps down and becomes a secondary. Until the at least one secondary becomes available, the set has no primary and cannot elect a new primary.

For more information on votes, see Replica Set Elections. For related information on connection errors, see Does TCP keepalive time affect sharded clusters and replica sets?.

Check the Size of the Oplog

A larger oplog can give a replica set a greater tolerance for lag, and make the set more resilient.

To check the size of the oplog for a given replica set member, connect to the member in a mongo shell and run the rs.printReplicationInfo() method.

The output displays the size of the oplog and the date ranges of the operations contained in the oplog. In the following example, the oplog is about 10MB and is able to fit about 26 hours (94400 seconds) of operations:

configured oplog size:   10.10546875MB
log length start to end: 94400 (26.22hrs)
oplog first event time:  Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
oplog last event time:   Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
now:                     Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)

The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. At a minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72 hours or even a week’s work of operations.

For more information on how oplog size affects operations, see:

Note

You normally want the oplog to be the same size on all members. If you resize the oplog, resize it on all members.

To change oplog size, see the Change the Size of the Oplog tutorial.

Oplog Entry Timestamp Error

Consider the following error in mongod output and logs:

replSet error fatal couldn't query the local local.oplog.rs collection.  Terminating mongod after 30 seconds.
<timestamp> [rsStart] bad replSet oplog entry?

Often, an incorrectly typed value in the ts field in the last oplog entry causes this error. The correct data type is Timestamp.

Check the type of the ts value using the following two queries against the oplog collection:

db = db.getSiblingDB("local")
db.oplog.rs.find().sort({$natural:-1}).limit(1)
db.oplog.rs.find({ts:{$type:17}}).sort({$natural:-1}).limit(1)

The first query returns the last document in the oplog, while the second returns the last document in the oplog where the ts value is a Timestamp. The $type operator allows you to select BSON type 17, is the Timestamp data type.

If the queries don’t return the same document, then the last document in the oplog has the wrong data type in the ts field.

Example

If the first query returns this as the last oplog entry:

{ "ts" : {t: 1347982456000, i: 1},
  "h" : NumberLong("8191276672478122996"),
  "op" : "n",
  "ns" : "",
  "o" : { "msg" : "Reconfig set", "version" : 4 } }

And the second query returns this as the last entry where ts has the Timestamp type:

{ "ts" : Timestamp(1347982454000, 1),
  "h" : NumberLong("6188469075153256465"),
  "op" : "n",
  "ns" : "",
  "o" : { "msg" : "Reconfig set", "version" : 3 } }

Then the value for the ts field in the last oplog entry is of the wrong data type.

To set the proper type for this value and resolve this issue, use an update operation that resembles the following:

db.oplog.rs.update( { ts: { t:1347982456000, i:1 } },
                    { $set: { ts: new Timestamp(1347982456000, 1)}})

Modify the timestamp values as needed based on your oplog entry. This operation may take some period to complete because the update must scan and pull the entire oplog into memory.

Duplicate Key Error on local.slaves

The duplicate key on local.slaves error, occurs when a secondary or slave changes its hostname and the primary or master tries to update its local.slaves collection with the new name. The update fails because it contains the same _id value as the document containing the previous hostname. The error itself will resemble the following.

exception: E11000 duplicate key error index: local.slaves.$_id_  dup key: { : ObjectId('<object ID>') } 0ms

This is a benign error and does not affect replication operations on the secondary or slave.

To prevent the error from appearing, drop the local.slaves collection from the primary or master, with the following sequence of operations in the mongo shell:

use local
db.slaves.drop()

The next time a secondary or slave polls the primary or master, the primary or master recreates the local.slaves collection.