OPTIONS

Google Compute Engine

Overview

This document outlines a process that you can use to deploy MongoDB on Google Compute Engine (GCE), beginning with selecting the appropriate instance and storage configuration, then addresses the process for deploying a MongoDB system. Finally, the tutorial includes strategies for backup and restore and additional production considerations.

Instance Recommendations

GCE offers several different instance types across four categories:

  • Standard
  • Shared Core
  • High Memory
  • High CPU

When deploying MongoDB, consider the following recommendations based on deployment type and component.

  • For development deployments, deploy mongod on Standard or High Memory instances.
  • For production deployments, deploy mongod on High Memory instances.
  • For sharded clusters mongos should be run on instances on which your application is running however, it can be run as a standalone process on Standard n1-standard-1 instances.
  • Sharded cluster config servers should be deployed on Standard n1-standard-1 instances.

Note

Shared Core instances are not recommended as they only provide intermittent access the underlying CPU. High CPU instances are also not recommended because MongoDB is traditionally memory and I/O bound.

Storage Considerations

GCE offers persistent disk storage to go along with your instances, either as a single root volume or with multiple disks attached to an instance. GCE’s persistent disks (PD) already stripe data across a very large number of volumes, shere is no need to do it yourself. MongoDB journal data is small and putting it on its own disk means either creating a small disk with insufficient performance or creating a large disk that goes mostly unused. Put your MongoDB journal files on the same disk as your data. Putting your MongoDB journal files on a small persistent disk will dramatically decrease performance of database writes.

The most optimal and performant storage configuration is to use a single volume to hold MongoDB data, journal, and log files. To properly determine the right storage configuration for your deployment it is best to prototype and test your workload.

There are multiple storage configurations possible with GCE persistent disks, each with it’s own tradeoffs. To properly determine the right storage configuration you will need to prototype and test your workload.

Performance

GCE persistent disk performance profiles are primarily based on allocated disk size. At higher disk sizes performance can also scale based on the number of cores within an instance. For more information, refer to “Persistent Disk Performance” within the GCE Disks documentation.

Instance Configuration

The following steps can be used to deploy MongoDB on a single GCE instance. The instance will be configured as follows:

  • CentOS 6
  • MongoDB 2.4.x installed via Yum
  • Individual persistent disks for data, journal, and log
  • Updated read-ahead values for each disk
  • Updated ulimit settings
  • Updated TCP keep-alive settings

Note

To get started, please have the Google Cloud SDK tools installed and setup to work with your account, as well as an initial project created. For more information, refer to the Google Cloud SDK installation instructions, Managing Authentication and Credentials, and Quick Start.

First create the persistent disk that will be the root storage device for the instance using the gcutil command.

gcutil adddisk mongodb-root \
       --project=mongodb-project \
       --size_gb=10 \
       --zone=us-central1-b

Next create the persistent disk that will hold MongoDB data, journal, and logs.

gcutil adddisk mongodb-storage \
       --project=mongodb-project \
       --size_gb=500 \
       --zone=us-central1-b

With the disks now created, the next step is to launch an instance with these volumes attached.

gcutil addinstance mongodb-instance \
       --project=mongodb-project --zone=us-central1-b \
       --machine_type=n1-standard-2 \
       --image=projects/centos-cloud/global/images/centos-6-v20131120 \
       --disk=mongodb-root,mode=rw,boot --disk=mongodb-storage,mode=rw

The above command will launch an n1-standard-2 instance running CentOS in the us-central1-b zone with 2 disks attached (1 as instance boot volume and the other for MongoDB data, log, and journal).

Once the instance is created (check status using the gcutil getinstance command) connect to it via SSH.

gcutil ssh mongodb-instance

Use yum to update the system and then add in the MongoDB yum repo.

sudo yum -y update

echo "[MongoDB]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64
gpgcheck=0
enabled=1" | sudo tee -a /etc/yum.repos.d/mongodb.repo

Then install MongoDB:

sudo yum install -y mongo-10gen-server

Now locate the persistent disks previously attached to the instance..

ls -l /dev/disk/by-id/google-*
# ...
lrwxrwxrwx. 1 root root  9 Mar  3 20:38 google-mongodb-storage -> ../../sdb
lrwxrwxrwx. 1 root root  9 Mar  3 20:38 google-mongodb-root -> ../../sda

Now create the mount point to hold MongoDB data, journal, and log files.

sudo mkdir /data

Next, format and mount the disk using the GCE safe_format_and_mount tool (skip the root volume as it was formatted and attached at instance boot). This tool will only format the disk if it is unformatted and mount the disk. For more information see refer to the Formatting Disks documentation.

sudo /usr/share/google/safe_format_and_mount \
     -m "mkfs.ext4 -F -o defaults,auto,noatime,noexec" /dev/sdb /data

Finally set the appropriate ownership of the storage volume (“owner:group” should be mongod:mongod).

sudo chown mongod:mongod /data

Optionally, to persist this storage mount you can either call safe_format_and_mount (as invoked above) in a startup script or persist the mount information to /etc/fstab:

echo '/dev/sdb /data ext4 defaults,auto,noatime,noexec 0 0' | sudo tee -a /etc/fstab

Storage setup is now complete. Now configure the following MongoDB parameters by editing the configuration file /etc/mongod.conf:

dbpath = /data
logpath = /data/mongod.log

Optionally, if you don’t want MongoDB to start at boot you can issue the following command:

sudo chkconfig mongod off

By default CentOS uses ulimit settings that are not appropriate for MongoDB. To setup ulimit to match the documented ulimit settings use the following steps:

sudo nano /etc/security/limits.conf
mongod soft nofile 64000
mongod hard nofile 64000
mongod soft nproc 32000
mongod hard nproc 32000
sudo nano /etc/security/limits.d/90-nproc.conf
mongod soft nproc 32000
mongod hard nproc 32000

Additionally, default read ahead settings on persistent disks are not optimized for MongoDB. As noted in the read-ahead settings from Production Notes, the settings should be adjusted to read approximately 32 blocks (or 16 KB) of data. The following command will set the readahead appropriately (repeat for necessary volumes):

sudo blockdev --setra 32 /dev/sdb

To make this change persistent across system boot you can issue the following command.

echo 'ACTION=="add", KERNEL=="sdb", ATTR{bdi/read_ahead_kb}="16"' \
 | sudo tee -a /etc/udev/rules.d/85-mongod.rules

The default TCP keep alive on Linux also needs to be changed for MongoDB deployments as the default is too long. The following commands will update the time to 300 seconds and persist those settings across reboot:

echo 300 | sudo tee /proc/sys/net/ipv4/tcp_keepalive_time
echo "net.ipv4.tcp_keepalive_time = 300" | sudo tee -a /etc/sysctl.conf

To start MongoDB, issue the following command:

sudo service mongod start

And now connect to the MongoDB instance using the mongo shell. You will see the following output.

$ mongo
MongoDB shell version: 2.4.9
connecting to: test
>

To have MongoDB start automatically at boot issue the following command:

sudo chkconfig mongod on

Backup and Restore

GCE offers the ability to take snapshots of your persistent disk and create new persistent disks from that snapshot. The underlying mechanism uses differential snapshots, which allow for better performance and lower storage costs. Snapshots are also a global resouce meaning they are available in any region in which your project is setup.

To backup your MongoDB data using GCE snapshots, use the following procedure. First use db.fsyncLock from the mongo shell to force mongod to flush all pending writes to disk and lock it against further writes.

db.fsyncLock()
{
    "info" : "now locked against writes, use db.fsyncUnlock() to unlock",
    "seeAlso" : "http://dochub.mongodb.org/core/fsynccommand",
    "ok" : 1
}

Next force the system to sync the filesystem and flush disk buffers:

sudo sync

Now execute the snapshot command:

gceutil addsnapshot mongodb-storage-snapshot \
        --project=mongodb-project \
        --source_disk=mongodb-storage

Once the snapshot has been completed, unlock mongod via the mongo shell:

db.fsyncUnlock()

To view snapshot details, use the following command:

gcutil getsnapshot mongodb-storage-snapshot
+----------------------+-------------------------------------+
| name                 | mongodb-storage-snapshot            |
| description          |                                     |
| creation-time        | 2014-03-03T15:13:50.209-08:00       |
| status               | READY                               |
| disk-size-gb         | 500                                 |
| storage-bytes        |                                     |
| storage-bytes-status |                                     |
| source-disk          | us-central1-b/disks/mongodb-storage |
+----------------------+-------------------------------------+

To restore MongoDB data from a GCE snapshot, use the following procedure. First, if mongod is running, stop it.

sudo service mongod stop

Next create a new persistent disk using the snapshot as the source and attach it to a running instance.

gcutil adddisk mongodb-storage-2 \
       --source_snapshot=mongodb-storage-snapshot \
       --project=mongodb-project \
       --zone=us-central1-b
gcutil attachdisk mongodb-instance
       --disk=mongodb-storage-2,mode=rw \
       --project=mongodb-project

After the disk attaches, refer to the above instructions for mounting the disk to the /data mount point, then restart mongod:

sudo service mongod start

Deployment Notes

Networking

Networks in GCE are a global resource meaning they are available in any region in which your project is setup. By default, all new instances have the following connections enabled:

  • Traffic between instances in the same network, over any port and any protocol
  • Incoming SSH (port 22) access from anywhere

Unless otherwise specified during gcutil addinstance new instances are assigned to the default network that exists within your account and instances can only belong to a single network. For more information on creating and managing network, refer to the Networking and Firewalls documentation.

Regions and Zones

GCE provides different regions and zones to provide control over where data is stored and used. Certain resources exist only at zone, region, or global levels. For example, instances and disks are zone-specific however snapshots and networks are globally accessible. For more information, refer to the Regions and Zones documentation.

When deploying MongoDB for high availability within GCE, we strongly recommend deploying across multiple zones or multiple regions. Introducing multiple regions into your deployment may increase transfer costs and latency of network access between individual nodes so it is best to prototype this approach prior to a production deployment.

For more information on high availability in GCE, refer to Designing Robust Systems.

←   dotCloud Joyent Cloud  →