OPTIONS

Data Model Design

Effective data models support your application needs. The key consideration for the structure of your documents is the decision to embed or to use references.

Embedded Data Models

With MongoDB, you may embed related data in a single structure or document. These schema are generally known as “denormalized” models, and take advantage of MongoDB’s rich documents. Consider the following diagram:

Data model with embedded fields that contain all related information.

Data model with embedded fields that contain all related information.

Embedded data models allow applications to store related pieces of information in the same database record. As a result, applications may need to issue fewer queries and updates to complete common operations.

In general, use embedded data models when:

In general, embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. Embedded data models make it possible to update related data in a single atomic write operation.

However, embedding related data in documents may lead to situations where documents grow after creation. Document growth can impact write performance and lead to data fragmentation. See Document Growth for details. Furthermore, documents in MongoDB must be smaller than the maximum BSON document size. For bulk binary data, consider GridFS.

To interact with embedded documents, use dot notation to “reach into” embedded documents. See query for data in arrays and query data in sub-documents for more examples on accessing data in arrays and embedded documents.

Normalized Data Models

Normalized data models describe relationships using references between documents.

Data model using references to link documents. Both the ``contact`` document and the ``access`` document contain a reference to the ``user`` document.

Data model using references to link documents. Both the contact document and the access document contain a reference to the user document.

In general, use normalized data models:

  • when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
  • to represent more complex many-to-many relationships.
  • to model large hierarchical data sets.

References provides more flexibility than embedding. However, client-side applications must issue follow-up queries to resolve the references. In other words, normalized data models can require more roundtrips to the server.

See Model One-to-Many Relationships with Document References for an example of referencing. For examples of various tree models using references, see Model Tree Structures.