MongoDB Method: db.collection.aggregate()

Summary

Method Namedb.collection.aggregate()
UsageCalculate aggregated value
Group aggregate

NOTES

  • Aggregation pipeline is the recommended way for processing data that needs multi level processing.
  • We should use aggregation pipeline, instead of map-reduce. As the aggregation pipeline performs much better and more readable(easier to understand).

Signature

db.collection.aggregate(pipeline, options)

Usage

Process or calculate aggregated values from a collection or view. The processing steps are mentioned in different stages.

We can perform operations like grouping, filtering, and/or any general calculation using the aggregation pipeline. The processing is defined in stages, and the processed data of one stage works as an input(source of truth) for the next stage.

MongoDB Aggregate Method Processing in Steps
MongoDB Aggregate Method Processing in Steps

NOTES

  • We can use the aggregate method for calculation data for reports and analytics.

Arguments

ParameterDescriptionTypeOptional
pipelineSteps for processing aggregationarray
optionsSome special options for the aggregationdocumentyes

options” argument parameters

“options” argument accepts a document, and the following options are allowed in the document.

ParameterDescriptionTypeDefaultOptional
explainExplain pipeline processbooleanyes
autoDiskUseAllow writing to disk. booleanyes
cursorInitial batch sizedocumentyes
maxTimeMSMax time limit for execution(in milliseconds)integeryes
bypassDocumentValidationIgnore the vlaidation of the documentbooleanyes
readConcernDefine read concern documentyes
writeConcerndefine write concerndocumentyes
collationSpecify collationdocumentyes
hintSpecify which index to be used for this operaitonstring or documentyes
commentComment for future referencestringyes
letDefine list of variablesdocumentyes

NOTES

  • If allowDiskUse is set to true, then the aggregation operation can write data to disk, in dbPath/_tmp directory.
  • readConcern levels are-
    • local
    • available
    • majority
    • linearizable

Return Value

Return valueCase for the return valueType
cursorOn success this method returns a cursor to the documents that matches the pipeline criteriascursor

Data set for examples

We are using some data demoing different options and stages of aggregation. Use the following data if you want to follow along-

# Insert some data to MongoDB
# for checking db.customer.aggregate() method

db.customer.insertMany([
  {
    name: "Izabella Conroy",
    email: "izabella.conroy@hotmail.com",
    age: 19,
    phone: "108-679-2154",
    address: {
      street: "11 North Vineyard Drive",
      city: "Minneapolis",
      state: "Minnesota",
      postalCode: "19426",
      country: "US"
    },
    profileCompletenessScore: 30,
    areaOfWork: "healthcare",
    leadSource: "socal media",
    tag: [
      "vip",
      "golf",
      "frequent flyer"
    ]
  },
  {
    name: "Lambert Purdy",
    email: "lambert@gmail.com",
    age: 28,
    phone: "(610) 489-3633",
    address: {
      street: "305 2nd Ave",
      city: "Collegeville",
      state: "Minnesota",
      postalCode: "81007",
      country: "US"
    },
    profileCompletenessScore: 40,
    areaOfWork: "software",
    leadSource: "website",
    tag: [
      "opportunity",
      "urgent"
    ]
  },
  {
    name: "Alisa Parker",
    email: "alisa@example.com",
    age: 72,
    phone: "768-319-1054",
    address: {
      street: "8532 Ingalls Circle",
      city: "Arvada",
      state: "CO",
      postalCode: "80003",
      country: "CA"
    },
    profileCompletenessScore: 60,
    areaOfWork: "construction",
    leadSource: "email",
    tag: [
      "urgent"
    ]
  },
  {
    name: "Emanuel Parker",
    email: "emanuel@test.com",
    age: 45,
    phone: "(706) 554-9951",
    address: {
      street: "534 Liberty St",
      city: "Waynesboro",
      state: "Georgia",
      postalCode: "30830",
      country: "US"
    },
    profileCompletenessScore: 35,
    areaOfWork: "legal",
    leadSource: "call",
    tag: [
      "vip",
      "frequent flyer"
    ]
  },
  {
    name: "Derrick Blick",
    email: "derric@test.com",
    age: 56,
    phone: "(703) 931-6700",
    address: {
      street: "3101 Park Center Dr",
      city: "Alexandria",
      state: "Vermont",
      postalCode: "22302",
      country: "US"
    },
    profileCompletenessScore: 90,
    areaOfWork: "sales",
    leadSource: "email",
    tag: [
      "golf"
    ]
  }
])

Aggregation Pipeline Stages

We pass one or more stages as the pipeline. An array of states are passed, and the stages are processed one by one.

Let’s check the different stages we can pass-

$match

Specific condition/filter for the documents. These are direct filters to filter out our desired documents from collections.

$match accepts a document, and in the document we specify the conditions. This $match condition works similar to db.collection.find() which can be used to get desired documents at any stage.

Here is how a $match stage would look like-

{ $match: { <query to match documents, this query is similar to the options we use for find method> } }

Example #1:

# Match specific field
bigboxcode> db.customer.aggregate([
  { $match: { email: 'alisa@example.com' } }
])

output:
[
  {
    _id: ObjectId('65f6da229de0b3faa1ffc482'),
    name: 'Alisa Parker',
    email: 'alisa@example.com',
    age: 72,
    phone: '768-319-1054',
    address: {
      street: '8532 Ingalls Circle',
      city: 'Arvada',
      state: 'CO',
      postalCode: '80003',
      country: 'CA'
    },
    profileCompletenessScore: 60,
    areaOfWork: 'construction',
    leadSource: 'email',
    tag: [ 'urgent' ]
  }
]

Example #2:

# Match multiple field in a single stage
bigboxcode> db.customer.aggregate([
  { $match: {
    name: 'Alisa Parker',
    'address.country': 'CA'
  }}
])

output:
[
  {
    _id: ObjectId('65f6da229de0b3faa1ffc482'),
    name: 'Alisa Parker',
    email: 'alisa@example.com',
    age: 72,
    phone: '768-319-1054',
    address: {
      street: '8532 Ingalls Circle',
      city: 'Arvada',
      state: 'CO',
      postalCode: '80003',
      country: 'CA'
    },
    profileCompletenessScore: 60,
    areaOfWork: 'construction',
    leadSource: 'email',
    tag: [ 'urgent' ]
  }
]

Example #3:

# Use $gt operation in $match
bigboxcode> db.customer.aggregate([ 
  { 
    $match: { 
      profileCompletenessScore: { $gt: 80 } 
    } 
  }
])

output:
[
  {
    _id: ObjectId('65f6da229de0b3faa1ffc484'),
    name: 'Derrick Blick',
    email: 'derric@test.com',
    age: 56,
    phone: '(703) 931-6700',
    address: {
      street: '3101 Park Center Dr',
      city: 'Alexandria',
      state: 'Vermont',
      postalCode: '22302',
      country: 'US'
    },
    profileCompletenessScore: 90,
    areaOfWork: 'sales',
    leadSource: 'email',
    tag: [ 'golf' ]
  }
]

Example #4:

# Use $or condition in aggregate $match
bigboxcode> db.customer.aggregate([ 
  { $match: { 
    $or: [
      {profileCompletenessScore: { $gt: 80 } } , 
      {'address.country': 'CA'}
    ]
  }}
])

output:
[
  {
    _id: ObjectId('65f6da229de0b3faa1ffc482'),
    name: 'Alisa Parker',
    email: 'alisa@example.com',
    age: 72,
    phone: '768-319-1054',
    address: {
      street: '8532 Ingalls Circle',
      city: 'Arvada',
      state: 'CO',
      postalCode: '80003',
      country: 'CA'
    },
    profileCompletenessScore: 60,
    areaOfWork: 'construction',
    leadSource: 'email',
    tag: [ 'urgent' ]
  },
  {
    _id: ObjectId('65f6da229de0b3faa1ffc484'),
    name: 'Derrick Blick',
    email: 'derric@test.com',
    age: 56,
    phone: '(703) 931-6700',
    address: {
      street: '3101 Park Center Dr',
      city: 'Alexandria',
      state: 'Vermont',
      postalCode: '22302',
      country: 'US'
    },
    profileCompletenessScore: 90,
    areaOfWork: 'sales',
    leadSource: 'email',
    tag: [ 'golf' ]
  }
]

Example #5:

# Use regex for matching
bigboxcode> db.customer.aggregate([ 
  { 
    $match: { 
      name: /^al.*/i
    } 
  }
])

output:
[
  {
    _id: ObjectId('65f6da229de0b3faa1ffc482'),
    name: 'Alisa Parker',
    email: 'alisa@example.com',
    age: 72,
    phone: '768-319-1054',
    address: {
      street: '8532 Ingalls Circle',
      city: 'Arvada',
      state: 'CO',
      postalCode: '80003',
      country: 'CA'
    },
    profileCompletenessScore: 60,
    areaOfWork: 'construction',
    leadSource: 'email',
    tag: [ 'urgent' ]
  }
]

$count

Get the count of the documents obtained from previous step. We just pass the name/key, that we want the count value’s key to be.

Here is how a $count stage would look like-

{ $count: <key name of the count result value> }

Example #1:

# Count as the first stage counts all documents
bigboxcode> db.customer.aggregate([
  { $count: "my_customer_count" }
])

output:
[ { my_customer_count: 5 } ]

Example #2:

# Match and count
bigboxcode> db.customer.aggregate([
  { $match: { 'address.country': 'US' } },
  { $count: 'cusomters_from_us' }
])

output:
[ { cusomters_from_us: 4 } ]

Example #3:

# Match with $gt on age and then count
bigboxcode> db.customer.aggregate([ 
  { 
    $match: { 
      age: {$gt: 40 }
    }
  }, 
  { $count: "cusotmer_over_40" }
])

output:
[ { cusotmer_over_40: 3 } ]

$group

Apply grouping in an aggregation. This is equivalent to the SQL GROUP BY, as the same results can be achived by $group in MongoDB.

Here is how a $group stage would look like-

{ 
  $group: {
    _id: <key for grouping documents>,

    <some_output_field_name_1>: { <accumulator_operator_1>: <some_expression_1>},
    <some_output_field_name_2>: { <accumulator_operator_2>: <some_expression_2>},
    .
    .
    .
    .
  }
}

NOTES

The following accumulator can be used in the $group stage-

  • $accumulator
  • addToSet
  • $avg
  • $bottom
  • $bottomN
  • $count
  • $first
  • $firstN
  • $last
  • $lastN
  • $max
  • $maxN
  • $median
  • $mergeObjects
  • $min
  • $minN
  • $percentile
  • $push
  • $stdDebPop
  • $stdDevSamp
  • $sum
  • $top
  • $topN

Example #1:

# $group all documents and calculate sum of profileCompletenessScore
#
# Equivalent SQL: 
#   SELECT 
#     SUM(profileCompletenessScore) AS score_sum 
#   FROM customer
bigboxcode> db.customer.aggregate([
  { $group: {
    _id: null,
    score_sum: { $sum: "$profileCompletenessScore" }
  }}
])

output:
[ { _id: null, score_sum: 255 } ]

Example #2:

# Group by country
# 
# Equivalent SQL: 
#   SELECT 
#     country
#   FROM customer
#   GROUP BY country
bigboxcode> db.customer.aggregate([
  {
    $group: {
      _id: "$address.country"
    }
  }
])

output:
[ { _id: 'US' }, { _id: 'CA' } ]

Example #3:

# Group by country and process average, max, min, etc.
# 
# Equivalent SQL: 
#   SELECT 
#     count(*) AS total_customers,
#
#     GROUP_CONCAT(profileCompletenessScore) AS profile_socres,
#     AVG(profileCompletenessScore) AS avg_profile_score,
#     MIN(profileCompletenessScore) AS min_profile_score,
#     MAX(profileCompletenessScore) AS max_profile_score,
#
#     GROUP_CONCAT(age) AS ages,
#     AVG(age) AS average_age,
#     MIN(age) AS min_age,
#     MAX(age) AS max_age,
#     
#   FROM customer
#   GROUP BY country
bigboxcode> db.customer.aggregate([
  {
    $group: {
      _id: "$address.country",
      total_customers: { $sum: 1},

      profile_socres: { $push: '$profileCompletenessScore'},
      avg_profile_score: { $avg: '$profileCompletenessScore'},
      min_profile_score: { $min: '$profileCompletenessScore'},
      max_profile_score: { $max: '$profileCompletenessScore'},

      ages: { $push: '$age'},
      average_age: { $avg: '$age'},
      min_age: { $min: '$age'},
      max_age: { $max: '$age'},
    }
  }
])

output:
[
  {
    _id: 'US',
    total_customers: 4,
    profile_socres: [ 30, 40, 35, 90 ],
    avg_profile_score: 48.75,
    min_profile_score: 30,
    max_profile_score: 90,
    ages: [ 19, 28, 45, 56 ],
    average_age: 37,
    min_age: 19,
    max_age: 56
  },
  {
    _id: 'CA',
    total_customers: 1,
    profile_socres: [ 60 ],
    avg_profile_score: 60,
    min_profile_score: 60,
    max_profile_score: 60,
    ages: [ 72 ],
    average_age: 72,
    min_age: 72,
    max_age: 72
  }
]

Example #4:

# Group by country and process average, max, min, etc.
# Then filter the result with match
# 
# Equivalent SQL: 
#   SELECT 
#     count(*) AS total_customers,
#
#     GROUP_CONCAT(profileCompletenessScore) AS profile_socres,
#     AVG(profileCompletenessScore) AS avg_profile_score,
#     MIN(profileCompletenessScore) AS min_profile_score,
#     MAX(profileCompletenessScore) AS max_profile_score,
#
#     GROUP_CONCAT(age) AS ages,
#     AVG(age) AS average_age,
#     MIN(age) AS min_age,
#     MAX(age) AS max_age,
#     
#   FROM customer
#   GROUP BY country
#   HAVING
#     average_age <= 50
#     AND total_customers > 2
bigboxcode> db.customer.aggregate([
  {
    $group: {
      _id: "$address.country",
      total_customers: { $sum: 1},

      profile_socres: { $push: '$profileCompletenessScore'},
      avg_profile_score: { $avg: '$profileCompletenessScore'},
      min_profile_score: { $min: '$profileCompletenessScore'},
      max_profile_score: { $max: '$profileCompletenessScore'},

      ages: { $push: '$age'},
      average_age: { $avg: '$age'},
      min_age: { $min: '$age'},
      max_age: { $max: '$age'},
    }
  },
  { $match: {$and: [{ average_age: { $lte: 50 }}, {total_customers: {$gt: 2}}]}}
])

output:
[
  {
    _id: 'US',
    total_customers: 4,
    profile_socres: [ 30, 40, 35, 90 ],
    avg_profile_score: 48.75,
    min_profile_score: 30,
    max_profile_score: 90,
    ages: [ 19, 28, 45, 56 ],
    average_age: 37,
    min_age: 19,
    max_age: 56
  }
]

Example #5:

# Group by country and featch the original/root record
# Access the root element with $$ROOT
bigboxcode> db.customer.aggregate([
  {
    $group: {
      _id: "$address.country",
      customers: {$push: "$$ROOT"}
    }
  }
])

output:
[
  {
    _id: 'US',
    customers: [
      {
        _id: ObjectId('65f6da229de0b3faa1ffc480'),
        name: 'Izabella Conroy',
        email: 'izabella.conroy@hotmail.com',
        age: 19,
        phone: '108-679-2154',
        address: {
          street: '11 North Vineyard Drive',
          city: 'Minneapolis',
          state: 'Minnesota',
          postalCode: '19426',
          country: 'US'
        },
        profileCompletenessScore: 30,
        areaOfWork: 'healthcare',
        leadSource: 'socal media',
        tag: [ 'vip', 'golf', 'frequent flyer' ]
      },
      {
        _id: ObjectId('65f6da229de0b3faa1ffc481'),
        name: 'Lambert Purdy',
        email: 'lambert@gmail.com',
        age: 28,
        phone: '(610) 489-3633',
        address: {
          street: '305 2nd Ave',
          city: 'Collegeville',
          state: 'Minnesota',
          postalCode: '81007',
          country: 'US'
        },
        profileCompletenessScore: 40,
        areaOfWork: 'software',
        leadSource: 'website',
        tag: [ 'opportunity', 'urgent' ]
      },
      {
        _id: ObjectId('65f6da229de0b3faa1ffc483'),
        name: 'Emanuel Parker',
        email: 'emanuel@test.com',
        age: 45,
        phone: '(706) 554-9951',
        address: {
          street: '534 Liberty St',
          city: 'Waynesboro',
          state: 'Georgia',
          postalCode: '30830',
          country: 'US'
        },
        profileCompletenessScore: 35,
        areaOfWork: 'legal',
        leadSource: 'call',
        tag: [ 'vip', 'frequent flyer' ]
      },
      {
        _id: ObjectId('65f6da229de0b3faa1ffc484'),
        name: 'Derrick Blick',
        email: 'derric@test.com',
        age: 56,
        phone: '(703) 931-6700',
        address: {
          street: '3101 Park Center Dr',
          city: 'Alexandria',
          state: 'Vermont',
          postalCode: '22302',
          country: 'US'
        },
        profileCompletenessScore: 90,
        areaOfWork: 'sales',
        leadSource: 'email',
        tag: [ 'golf' ]
      }
    ]
  },
  {
    _id: 'CA',
    customers: [
      {
        _id: ObjectId('65f6da229de0b3faa1ffc482'),
        name: 'Alisa Parker',
        email: 'alisa@example.com',
        age: 72,
        phone: '768-319-1054',
        address: {
          street: '8532 Ingalls Circle',
          city: 'Arvada',
          state: 'CO',
          postalCode: '80003',
          country: 'CA'
        },
        profileCompletenessScore: 60,
        areaOfWork: 'construction',
        leadSource: 'email',
        tag: [ 'urgent' ]
      }
    ]
  }
]

Example #6:

# Group by country and extract all(in record) postalCode codes of a country
bigboxcode> db.customer.aggregate([
  {
    $group: {
      _id: "$address.country",
      zip_codes: {$push: "$$ROOT.address.postalCode"}
    }
  }
])


output:
[
  { _id: 'CA', zip_codes: [ '80003' ] },
  { _id: 'US', zip_codes: [ '19426', '81007', '30830', '22302' ] }
]

$sort

Use a $sort stage for sorting the obtained result.

Here is how a $sort stage would look like-

{ 
  $sort: {
    <field_name_1>: <sort_order_of_field_1>,
    <field_name_2>: <sort_order_of_field_2>,
    .
    .
    .
    .
  }
}

The following sort order can be used in the $sort stage-

ValueDescription
1Ascending
-1Descending
{ $meta: “textScore” }Order by “textScore” metadata, in descending order

NOTES

  • The maximum number of keys/field names (for sort order) that can be passed to $sort is 32 keys/field names.
  • If the sorting field has a duplicate value, then there can be some inconsistency in the result. To fix this inconsistency add the _id, or some unique field if required.
  • $sort can use a maximum of 100 MegaBytes of memory while processing. If more memory is required, then it uses a temporary file on disk. In the latest version, the disk usage is by default enabled, but if you need then you can use option {allowDiskUse: true}.
  • When a $limit is used in an aggregation after $sort, and the number of processed documents is not changed after the $sort processing, then the optimizer combines these two while processing it internally.

Example #1:

# Sort by name
# Added projection just for better result visiablity
bigboxcode> db.customer.aggregate([
  { $sort: {name: 1}},
  { $project: { name: 1, phone: 1, age: 1}}
])

output:
[
  {
    _id: ObjectId('65f6da229de0b3faa1ffc482'),
    name: 'Alisa Parker',
    age: 72,
    phone: '768-319-1054'
  },
  {
    _id: ObjectId('65f6da229de0b3faa1ffc484'),
    name: 'Derrick Blick',
    age: 56,
    phone: '(703) 931-6700'
  },
  {
    _id: ObjectId('65f6da229de0b3faa1ffc483'),
    name: 'Emanuel Parker',
    age: 45,
    phone: '(706) 554-9951'
  },
  {
    _id: ObjectId('65f6da229de0b3faa1ffc480'),
    name: 'Izabella Conroy',
    age: 19,
    phone: '108-679-2154'
  },
  {
    _id: ObjectId('65f6da229de0b3faa1ffc481'),
    name: 'Lambert Purdy',
    age: 28,
    phone: '(610) 489-3633'
  }
]

Example #2:

# Sort by country and age
# Added projection just for better result visiablity
bigboxcode> db.customer.aggregate([
  { $sort: {'address.country': 1, age: -1}},
  { $project: { _id: 0, name: 1, phone: 1, age: 1, 'country': '$address.country': }}
])

output:
[
  { name: 'Alisa Parker', age: 72, phone: '768-319-1054', country: 'CA' },
  { name: 'Derrick Blick', age: 56, phone: '(703) 931-6700', country: 'US' },
  { name: 'Emanuel Parker', age: 45, phone: '(706) 554-9951', country: 'US' },
  { name: 'Lambert Purdy', age: 28, phone: '(610) 489-3633', country: 'US' },
  { name: 'Izabella Conroy', age: 19, phone: '108-679-2154', country: 'US' }
]

$limit

Use $limit to restrict the number of documents, for the next stage. Works similar to SQL LIMIT clause.

The usage is simple, we just have to pass an integer value to $limit. The integer can be any positive 64-bint integer. Here is what a $limit stage would look like-

{ $limit: <64-bit positive integer value> }

NOTES

  • When a $limit is used in an aggregation after $sort, and the number of processed documents is not changed after the $sort processing, then the optimizer combines these two while processing it internally.

Example #1:

# Limit the resulting documents
bigboxcode> db.customer.aggregate([
  { $limit: 2 }
])

output:
[
  {
    _id: ObjectId('65f6da229de0b3faa1ffc480'),
    name: 'Izabella Conroy',
    email: 'izabella.conroy@hotmail.com',
    age: 19,
    phone: '108-679-2154',
    address: {
      street: '11 North Vineyard Drive',
      city: 'Minneapolis',
      state: 'Minnesota',
      postalCode: '19426',
      country: 'US'
    },
    profileCompletenessScore: 30,
    areaOfWork: 'healthcare',
    leadSource: 'socal media',
    tag: [ 'vip', 'golf', 'frequent flyer' ]
  },
  {
    _id: ObjectId('65f6da229de0b3faa1ffc481'),
    name: 'Lambert Purdy',
    email: 'lambert@gmail.com',
    age: 28,
    phone: '(610) 489-3633',
    address: {
      street: '305 2nd Ave',
      city: 'Collegeville',
      state: 'Minnesota',
      postalCode: '81007',
      country: 'US'
    },
    profileCompletenessScore: 40,
    areaOfWork: 'software',
    leadSource: 'website',
    tag: [ 'opportunity', 'urgent' ]
  }
]

Example #2:

# sort and Limit the resulting documents
bigboxcode> db.customer.aggregate([
  { $sort: {'address.country': 1, age: -1}},
  { $project: { _id: 0, name: 1, phone: 1, age: 1, 'country': '$address.country' }},
  { $limit: 2 }
])

output:
[
  {
    name: 'Alisa Parker',
    age: 72,
    phone: '768-319-1054',
    country: 'CA'
  },
  {
    name: 'Derrick Blick',
    age: 56,
    phone: '(703) 931-6700',
    country: 'US'
  }
]

Source Code

Use the following links to get the source code used in this article-

Related Methods

CommandDetails
db.collection.getIndexes() Method Details

Leave a Comment


The reCAPTCHA verification period has expired. Please reload the page.