Backups are an important part of any operational disaster recovery plan. A good backup plan must be able to capture data in a consistent and usable state, and operators must be able to automate both the backup and the recovery operations. Also test all components of the backup system to ensure that you can recover backed up data as needed. If you cannot effectively restore your database from the backup, then your backups are useless. This document addresses higher level backup strategies, for more information on specific backup procedures consider the following documents:
As you develop a backup strategy for your MongoDB deployment consider the following factors:
There are two main methodologies for backing up MongoDB instances. Creating binary “dumps” of the database using mongodump or creating filesystem level snapshots. Both methodologies have advantages and disadvantages:
The best option depends on the requirements of your deployment and disaster recovery needs. Typically, filesystem snapshots are because of their accuracy and simplicity; however, mongodump is a viable option used often to generate backups of MongoDB systems.
The following documents provide details and procedures on the two approaches:
In some cases, taking backups is difficult or impossible because of large data volumes, distributed architectures, and data transmission speeds. In these situations, increase the number of members in your replica set or sets.
In most cases, backing up data stored in a replica set is similar to backing up data stored in a single instance. It is possible to lock a single secondary database and then create a backup from that instance. When you unlock the database, the secondary will catch up with the primary. You may also choose to deploy a dedicated hidden member for backup purposes.
If you have a sharded cluster where each shard is itself a replica set, you can use this method to create a backup of the entire cluster without disrupting the operation of the node. In these situations you should still turn off the balancer when you create backups.
For any cluster, using a non-primary node to create backups is particularly advantageous in that the backup operation does not affect the performance of the primary. Replication itself provides some measure of redundancy. Nevertheless, keeping point-in time backups of your cluster to provide for disaster recovery and as an additional layer of protection is crucial.