OPTIONS

Category Hierarchy

Overview

This document provides the basic design for modeling a product hierarchy stored in MongoDB as well as a collection of common operations for interacting with this data that will help you begin to write an E-commerce product category hierarchy.

See also

Product Catalog

Solution

To model a product category hierarchy, this solution keeps each category in its own document that also has a list of its ancestors or “parents.” This document uses music genres as the basis of its examples:

Initial category hierarchy

Initial category hierarchy

Because these kinds of categories change infrequently, this model focuses on the operations needed to keep the hierarchy up-to-date rather than the performance profile of update operations.

Schema

This schema has the following properties:

  • A single document represents each category in the hierarchy.
  • An ObjectId identifies each category document for internal cross-referencing.
  • Each category document has a human-readable name and a URL compatible slug field.
  • The schema stores a list of ancestors for each category to facilitate displaying a query and its ancestors using only a single query.

Consider the following prototype:

{ "_id" : ObjectId("4f5ec858eb03303a11000002"),
  "name" : "Modal Jazz",
  "parent" : ObjectId("4f5ec858eb03303a11000001"),
  "slug" : "modal-jazz",
  "ancestors" : [
         { "_id" : ObjectId("4f5ec858eb03303a11000001"),
        "slug" : "bop",
        "name" : "Bop" },
         { "_id" : ObjectId("4f5ec858eb03303a11000000"),
           "slug" : "ragtime",
           "name" : "Ragtime" } ]
}

Operations

This section outlines the category hierarchy manipulations that you may need in an E-Commerce site. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.

Read and Display a Category

Querying

Use the following option to read and display a category hierarchy. This query will use the slug field to return the category information and a “bread crumb” trail from the current category to the top level category.

category = db.categories.find(
    {'slug':slug},
    {'_id':0, 'name':1, 'ancestors.slug':1, 'ancestors.name':1 })

Indexing

Create a unique index on the slug field with the following operation on the Python/PyMongo console:

>>> db.categories.ensure_index('slug', unique=True)

Add a Category to the Hierarchy

To add a category you must first determine its ancestors. Take adding a new category “Swing” as a child of “Ragtime”, as below:

Adding a category

Adding a category

The insert operation would be trivial except for the ancestors. To define this array, consider the following helper function:

def build_ancestors(_id, parent_id):
    parent = db.categories.find_one(
        {'_id': parent_id},
        {'name': 1, 'slug': 1, 'ancestors':1})
    parent_ancestors = parent.pop('ancestors')
    ancestors = [ parent ] + parent_ancestors
    db.categories.update(
        {'_id': _id},
        {'$set': { 'ancestors': ancestors } })

You only need to travel “up” one level in the hierarchy to get the ancestor list for “Ragtime” that you can use to build the ancestor list for “Swing.” Then create a document with the following set of operations:

doc = dict(name='Swing', slug='swing', parent=ragtime_id)
swing_id = db.categories.insert(doc)
build_ancestors(swing_id, ragtime_id)

Note

Since these queries and updates all selected based on _id, you only need the default MongoDB-supplied index on _id to support this operation efficiently.

Change the Ancestry of a Category

This section address the process for reorganizing the hierarchy by moving “bop” under “swing” as follows:

Change the parent of a category

Change the parent of a category

Procedure

Update the bop document to reflect the change in ancestry with the following operation:

db.categories.update(
    {'_id':bop_id}, {'$set': { 'parent': swing_id } } )

The following helper function, rebuilds the ancestor fields to ensure correctness. [1]

def build_ancestors_full(_id, parent_id):
    ancestors = []
    while parent_id is not None:
        parent = db.categories.find_one(
            {'_id': parent_id},
            {'parent': 1, 'name': 1, 'slug': 1, 'ancestors':1})
        parent_id = parent.pop('parent')
        ancestors.append(parent)
    db.categories.update(
        {'_id': _id},
        {'$set': { 'ancestors': ancestors } })

You can use the following loop to reconstruct all the descendants of the “bop” category:

for cat in db.categories.find(
    {'ancestors._id': bop_id},
    {'parent': 1}):
    build_ancestors_full(cat['_id'], cat['parent'])
[1]Your application cannot guarantee that the ancestor list of a parent category is correct, because MongoDB may process the categories out-of-order.

Indexing

Create an index on the ancestors._id field to support the update operation.

db.categories.ensure_index('ancestors._id')

Rename a Category

To a rename a category you need to both update the category itself and also update all the descendants. Consider renaming “Bop” to “BeBop” as in the following figure:

Rename a category

Rename a category

First, you need to update the category name with the following operation:

db.categories.update(
    {'_id':bop_id}, {'$set': { 'name': 'BeBop' } } )

Next, you need to update each descendant’s ancestors list:

db.categories.update(
    {'ancestors._id': bop_id},
    {'$set': { 'ancestors.$.name': 'BeBop' } },
    multi=True)

This operation uses:

  • the positional operation $ to match the exact “ancestor” entry that matches the query, and
  • the multi option to update all documents that match this query.

Note

In this case, the index you have already defined on ancestors._id is sufficient to ensure good performance.

Sharding

For most deployments, sharding this collection has limited value because the collection will be very small. If you do need to shard, because most updates query the _id field, this field is a suitable shard key. Shard the collection with the following operation in the Python/PyMongo console.

>>> db.command('shardCollection', 'categories', {
...     'key': {'_id': 1} })
{ "collectionsharded" : "categories", "ok" : 1 }