New in version 3.0: MongoDB adds support for additional storage engines. MongoDB’s original storage engine, known as mmapv1 remains the default in 3.0, but the new wiredTiger engine is available and can offer additional flexibility and improved throughput for many workloads.
MongoDB stores data in the form of BSON documents, which are rich mappings of keys, or field names, to values. BSON supports a rich collection of types, and fields in BSON documents may hold arrays of values or embedded documents. All documents in MongoDB must be less than 16MB, which is the BSON document size.
All documents are part of a collection, which are a logical groupings of documents in a MongoDB database. The documents in a collection share a set of indexes, and typically these documents share common fields and structure.
In MongoDB the database construct is a group of related collections. Each database has a distinct set of data files and can contain a large number of collections. A single MongoDB deployment may have many databases.
WiredTiger Storage Engine¶
New in version 3.0.
WiredTiger is a storage engine that is optionally available in the 64-bit build of MongoDB 3.0. It excels at read and insert workloads as well as more complex update workloads.
Document Level Locking¶
With WiredTiger, all write operations happen within the context of a document level lock. As a result, multiple clients can modify more than one document in a single collection at the same time. With this very granular concurrency control, MongoDB can more effectively support workloads with read, write and updates as well as high-throughput concurrent workloads.
WiredTiger uses a write-ahead transaction log in combination with checkpoints to ensure data persistence. With WiredTiger, by default MongoDB will commit a checkpoint to disk every 60 seconds, or when there are 2 gigabytes of data to write. The checkpoint thresholds are configurable. Between and during checkpoints the data files are always valid.
The WiredTiger journal persists all data modifications between checkpoints. If MongoDB exits between checkpoints, it uses the journal to replay all data modified since the last checkpoint. By default the WiredTiger journal is compressed using the snappy algorithm.
You can disable journaling by setting storage.journal.enabled to false, which can reduce the overhead of maintaining the journal. For standalone instances, not using the journal means that you will lose some data modifications when MongoDB exits unexpectedly between checkpoints. For members of replica sets, the replication process may provide sufficient durability guarantees.
MongoDB supports compression for all collections and indexes using both block and prefix compression. Compression minimizes storage use at the expense of additional CPU.
By default, all indexes with the WiredTiger engine use prefix compression. Also, by default all collections with WiredTiger use block compression with the snappy algorithm. Compression with zlib is also available.
You can modify the default compression settings for all collections and indexes. Compression is also configurable on a per-collection and per-index basis during collection and index creation.
For most workloads, the default compression settings balance storage efficiency and processing requirements.
MMAPv1 Storage Engine¶
MMAPv1 is MongoDB’s original storage engine based on memory mapped files. It excels at workloads with high volume inserts, reads, and in-place updates. MMAPv1 is the default storage engine in MongoDB 3.0 and all previous versions.
In order to ensure that all modifications to a MongoDB data set are durably written to disk, MongoDB records all modifications to a journal that it writes to disk more frequently than it writes the data files. The journal allows MongoDB to successfully recover data from data files after a mongod instance exits without flushing all changes.
See Journaling Mechanics for more information about the journal in MongoDB.
Record Storage Characteristics¶
All records are contiguously located on disk, and when a document becomes larger than the allocated record, MongoDB must allocate a new record. New allocations require MongoDB to move a document and update all indexes that refer to the document, which takes more time than in-place updates and leads to storage fragmentation.
Changed in version 3.0.0.
By default, MongoDB uses Power of 2 Sized Allocations so that every document in MongoDB is stored in a record which contains the document itself and extra space, or padding. Padding allows the document to grow as the result of updates while minimizing the likelihood of reallocations.
Record Allocation Strategies¶
MongoDB supports multiple record allocation strategies that determine how mongod adds padding to a document when creating a record. Because documents in MongoDB may grow after insertion and all records are contiguous on disk, the padding can reduce the need to relocate documents on disk following updates. Relocations are less efficient than in-place updates and can lead to storage fragmentation. As a result, all padding strategies trade additional space for increased efficiency and decreased fragmentation.
Different allocation strategies support different kinds of workloads: the power of 2 allocations are more efficient for insert/update/delete workloads; while exact fit allocations is ideal for collections without update and delete workloads.
Power of 2 Sized Allocations¶
Changed in version 3.0.0.
MongoDB 3.0 uses the power of 2 sizes allocation as the default record allocation strategy for MMAPv1. With the power of 2 sizes allocation strategy, each record has a size in bytes that is a power of 2 (e.g. 32, 64, 128, 256, 512 ... 2MB). For documents larger than 2MB, the allocation is rounded up to the nearest multiple of 2MB.
The power of 2 sizes allocation strategy has the following key properties:
- Can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation.
- Can reduce moves. The added padding space gives a document room to grow without requiring a move. In addition to saving the cost of moving, this results in less updates to indexes. Although the power of 2 sizes strategy can minimize moves, it does not eliminate them entirely.
No Padding Allocation Strategy¶
Changed in version 3.0.0.
For collections whose workloads do not change the document sizes, such as workloads that consist of insert-only operations or update operations that do not increase document size (such as incrementing a counter), you can disable the power of 2 allocation using the collMod command with the noPadding flag or the db.createCollection() method with the noPadding option.
Prior to version 3.0.0, MongoDB used an allocation strategy that included a dynamically calculated padding as a factor of the document size.
Capped collections are fixed-size collections that support high-throughput operations that store records in insertion order. Capped collections work like circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.
See Capped Collections for more information.