MongoDB Grouping and counting by composite field

Question

Here's what my records look like:

I have obtained this by using collection.find().limit(1)

[
  {
    "_id": {"$oid": "..."},
    "husband.firstName": "John",
    "husband.secondName": "Smith",
    "wife.firstName": "Alice",
    "wife.secondName": "Watson",
    "...": "...",
  }
]

The husband and wife fields only contains firstName and secondName

I want to count how common each husband and wife name combos are.

I imagine the results in the form of something like:

[
  {
    "husband.firstName": "John",
    "husband.secondName": "Smith",
    "wife.firstName": "Alice",
    "wife.secondName": "Watson",
    "count": "456",
  },
  {
    "husband.firstName": "Jack",
    "husband.secondName": "Smith",
    "wife.firstName": "Alice",
    "wife.secondName": "Watson",
    "count": "123",
  }
]

I'm using Python and pymongo so I have tried the following:

pipeline = [
    {
        "$group": {
            "_id": {
                "husband": "$husband",
                "wife": "$wife"
            },
            "count": {"$sum": 1}
        }
    },
    {
        "$sort": {"count": -1}
    },
]

However this returns an empty result of:

[
  {
    "_id": "{}",
    "count": 47553
  }
]

I tried also grouping by the fields separately but the result was the same.

problem is with your field format. you have . in the fields. so you actually dont have $husband or $wife fields. here for you to compare mongoplayground.net/p/RDHwPfEWNpW and mongoplayground.net/p/0uyZnMcymNc — cmgchess
– cmgchess, Commented Jul 29, 2024 at 16:59
if your data is like that where dots in fieldname you should change it since it is not recommended — cmgchess
– cmgchess, Commented Jul 29, 2024 at 17:10
if you have no real choice you might have to use helpers such as getField mongoplayground.net/p/ABFRPQ4_S7A . mongodb.com/docs/manual/core/dot-dollar-considerations — cmgchess
– cmgchess, Commented Jul 29, 2024 at 17:18

ray · Accepted Answer · 2024-07-29 17:14:07Z

2

As pointed out in the comments by @cmgchess, the main difficulty for your case is the dot in your field name. You may want to refactor your schema to something like below:

{
    "husband": {
      "firstName": "John",
      "secondName": "Smith"
    },
    "wife": {
      "firstName": "Alice",
      "secondName": "Watson"
    }
  }

Nevertheless, for your current schema, you may workaround it through usage of $getField.

db.collection.aggregate([
  {
    "$group": {
      "_id": {
        "husband": {
          "firstName": {
            "$getField": {
              "field": "husband.firstName",
              "input": "$$ROOT"
            }
          },
          "secondName": {
            "$getField": {
              "field": "husband.secondName",
              "input": "$$ROOT"
            }
          }
        },
        "wife": {
          "firstName": {
            "$getField": {
              "field": "wife.firstName",
              "input": "$$ROOT"
            }
          },
          "secondName": {
            "$getField": {
              "field": "wife.secondName",
              "input": "$$ROOT"
            }
          }
        }
      },
      "count": {
        "$sum": 1
      }
    }
  },
  {
    "$sort": {
      "count": -1
    }
  }
])

Mongo Playground

answered Jul 29, 2024 at 17:14

ray

15.6k12 gold badges100 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

cmgchess Over a year ago

can even shorten the getField a bit mongoplayground.net/p/ABFRPQ4_S7A

Collectives™ on Stack Overflow

MongoDB Grouping and counting by composite field

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related