Joining 10 Collections in MongoDB

Ask Question

Asked 1 year, 5 months ago

Modified 1 year, 5 months ago

Viewed 76 times

I encountered the following scenario: I’m doing an aggregation for the MasterCollection collection. I’m “joining” this collection with other 9 collections in the aggregation.

In the end, I’m merging everything into the same MasterCollection. The aggregation execution time took 30 minutes, which is not acceptable. We have a single MongoDb instance (Mongo version 7) with 16GB RAM and we are running it in a docker container.

The MasterCollection has 1015787 documents. The average document size is 1.8kB for the MasterCollection. Additional stats for the collections (Collection name, Number of documents, Avg Doc size):

collection 1016878 40B
collection2 0 0B
collection3 232 94B
collection4 10289 97B
collection5 10289 97B
collection6 1747 102B
collection 1326 103B
collection8 1016878 42B
collection9 1016878 58B

Compound indexes are created for the fields that are used in the lookups.

My aggregation looks like this:

MasterCollection.aggregate([
  {
    $project: {
      _id: 1,
      field1: 1,
      field2: 1,
      field3: 1,
    },
  },
  {
    $lookup: {
      from: 'collection1',
      localField: '_id',
      foreignField: '_id',
      as: 'collection1',
    },
  },
  {
      $lookup: {
        from: 'collection8',
        localField: '_id',
        foreignField: '_id',
        as: 'collection8',
      },
  },
  {
    $lookup: {
      from: 'collection9',
      localField: '_id',
      foreignField: '_id',
      as: 'collection9',
    },
  },
  {
    $lookup: {
      from: 'collection2',
      let: {
        field1Id: '$field1',
        field2Id: '$field2',
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $and: [
                { $eq: ['$_id.field1', '$$field1Id'] },
                { $eq: ['$_id.field2', '$$field2Id'] },
              ],
            },
          },
        },
        {
          $project: {
            _id: 0,
            fieldFromCollection2: 1,
          },
        },
      ],
      as: 'collection2',
    },
  },
  {
    $lookup: {
      from: 'colelction3',
      let: {
        field1Id: '$field1',
        field2Id: '$field2',
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $and: [
                { $eq: ['$_id.field1', '$$field1Id'] },
                { $eq: ['$_id.field2', '$$field2Id'] },
              ],
            },
          },
        },
        {
          $project: {
            _id: 0,
            fieldFromCollection3: 1,
          },
        },
      ],
      as: 'colelction3',
    },
  },
  {
    $lookup: {
      from: 'collection4',
      let: {
        field1Id: '$field1',
        field2Id: '$field2',
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $and: [
                { $eq: ['$_id.field1', '$$field1Id'] },
                { $eq: ['$_id.field2', '$$field2Id'] },
              ],
            },
          },
        },
        {
          $project: {
            _id: 0,
            fieldFromCollection4: 1,
          },
        },
      ],
      as: 'collection4',
    },
  },
  {
    $lookup: {
      from: 'collection5',
      let: {
        field1Id: '$field1',
        field2Id: '$field2',
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $and: [
                { $eq: ['$_id.field1', '$$field1Id'] },
                { $eq: ['$_id.field2', '$$field2Id'] },
              ],
            },
          },
        },
        {
          $project: {
            _id: 0,
            fieldFromCollection5: 1,
          },
        },
      ],
      as: 'collection5',
    },
  },
  {
    $lookup: {
      from: 'collection6',
      let: {
        field1Id: '$field1',
        field3Id: '$field3',
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $and: [
                { $eq: ['$_id.field1', '$$field1Id'] },
                { $eq: ['$_id.field3', '$$field3Id'] },
              ],
            },
          },
        },
        {
          $project: {
            _id: 0,
            fieldFromCollection6: 1,
          },
        },
      ],
      as: 'collection6',
    },
  },
  {
    $lookup: {
      from: 'collection7',
      let: {
        field1Id: '$field1',
        field2Id: '$field2',
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $and: [
                { $eq: ['$_id.field1', '$$field1Id'] },
                { $eq: ['$_id.field2', '$$field2Id'] },
              ],
            },
          },
        },
        {
          $project: {
            _id: 0,
            fieldFromCollection7: 1,
          },
        },
      ],
      as: 'collection7',
    },
  },      
  { 
    $unwind: // from each collection
  },
  {
    $project: {
      _id: 1,
      // project from each collection
    },
  },
  {
    $merge: {
      into: 'MasterCollection',
      on: '_id',
      whenMatched: 'merge',
      whenNotMatched: 'discard',
    },
  },
], { allowDiskUse: true })

Do you have any suggestions how to improve this aggregation?

I already tried playing with the indexes, I tried to use $facet and to split the whole aggregation into chunks with $skip and $limit; however I had no positive outcome for improving the aggregation.

edited Jul 10, 2024 at 14:23

aneroid

16.7k3 gold badges42 silver badges77 bronze badges

asked Jul 10, 2024 at 13:01

Boros Gergo

1

Refactor your schema to merge all the collections into one. Lookup/Join is costly in MongoDB. You should reconsider why the data are scattered around in different collections at the first place.

ray
– ray

2024-07-10 14:22:18 +00:00
Commented Jul 10, 2024 at 14:22
For the collections which have the _id as two fields, you can try combining them into a single field to do the match: let: { collId: { field1: "$field1", field2: "$field2"} } and then $match: { _id: "$$collId" } - may improve the performance by using the index. Otherwise, I think it's partial with field1 and then a scan for field2.

aneroid
– aneroid

2024-07-10 15:44:49 +00:00
Commented Jul 10, 2024 at 15:44
Review your database design! MongoDB is not a relational database, some NoSQL databases even do not support joins at all. Typically an application has much less number of collections than you have number of tables in according application running in a relational RDBMS.

Wernfried Domscheit
– Wernfried Domscheit

2024-07-10 18:22:57 +00:00
Commented Jul 10, 2024 at 18:22
@ray Thanks for the suggestion, I managed to get rid of collection9 and include that field in the MasterCollection. However the data in the other collections represents different counts. Initially we wrote this counts back to the MasterCollection, but the performance was much worse.

Boros Gergo
– Boros Gergo

2024-07-11 07:04:47 +00:00
Commented Jul 11, 2024 at 7:04
@aneroid Thanks for the suggestion, I'll look into this.

Boros Gergo
– Boros Gergo

2024-07-11 07:05:52 +00:00
Commented Jul 11, 2024 at 7:05

| Show 1 more comment

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Joining 10 Collections in MongoDB

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest