179

Is there a query for calculating how many distinct values a field contains in DB.

f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)

If someone adds more documents with a new country I would like the query to return 9.

Is there easier way then group and count?

3

10 Answers 10

305

MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.

There is a shell db.collection.distinct() helper as well:

> db.countries.distinct('country');
[ "Spain", "England", "France", "Australia" ]

> db.countries.distinct('country').length
4

As noted in the MongoDB documentation:

Results must not be larger than the maximum BSON size (16MB). If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.

Sign up to request clarification or add additional context in comments.

8 Comments

this doesn't really work if your number of distinct values is too high... if you were looking at distinct names of people in the world or something. do you have an answer that scales?
1+ for length. i was struggling to find something like that. Thanks.
I don´t know why they don´t use count() there as well
@MarianKlühspies - because it's just a javascript array, which uses the length property to count the number of elements.
This doesn't work when you have millions or billions of records.....
|
172

Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.

db.articles.aggregate([
    {
        $match: {
            keywords: { $not: {$size: 0} }
        }
    },
    { $unwind: "$keywords" },
    {
        $group: {
            _id: {$toLower: '$keywords'},
            count: { $sum: 1 }
        }
    },
    {
        $match: {
            count: { $gte: 2 }
        }
    },
    { $sort : { count : -1} },
    { $limit : 100 }
]);

that give result such as

{ "_id" : "inflammation", "count" : 765 }
{ "_id" : "obesity", "count" : 641 }
{ "_id" : "epidemiology", "count" : 617 }
{ "_id" : "cancer", "count" : 604 }
{ "_id" : "breast cancer", "count" : 596 }
{ "_id" : "apoptosis", "count" : 570 }
{ "_id" : "children", "count" : 487 }
{ "_id" : "depression", "count" : 474 }
{ "_id" : "hiv", "count" : 468 }
{ "_id" : "prognosis", "count" : 428 }

5 Comments

Logged in just to + this answer. Thanks! btw if you are doing it on a unique field, just remove the unwind line.
@RichieRich, unwind is necessary because the code is grouping individual values of an array field which matches how distinct works.
@Paul what Richie said is that if the grouping is done just "regular" field (string, int etc.) then you don't need the unwind step. Isn't it correct?
@guyarad unwind is necessary when working with arrays.
+1 for the answer, exactly the thing I was working on, however distinct has its own charms but this is just gold :) -- anyhow I have to read more about aggregates to achieve desired set of results to filter data
34

With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.

For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:

db.users.aggregate([
    { "$group": {
        "_id": { "$toLower": "$role" },
        "count": { "$sum": 1 }
    } },
    { "$group": {
        "_id": null,
        "counts": {
            "$push": { "k": "$_id", "v": "$count" }
        }
    } },
    { "$replaceRoot": {
        "newRoot": { "$arrayToObject": "$counts" }
    } }    
])

Example Output

{
    "user" : 67,
    "superuser" : 5,
    "admin" : 4,
    "moderator" : 12
}

1 Comment

This is not the answer to the question, but it is helpful nonetheless. I wonder how this performs compared to .distinct().
23

I wanted a more concise answer and I came up with the following using the documentation at aggregates and group

db.countries.aggregate([{"$group": {"_id": "$country", "count":{"$sum": 1}}}])

2 Comments

@Kr1 how to add the value of each country to get the total?
thank you vert much exactly what I need and clean to read.
10

You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.

Sample

For each distinct value of field counts the occurrences in documents optionally filtered by query

> db.users.distinctAndCount('name', {name: /^a/i})

{
  "Abagail": 1,
  "Abbey": 3,
  "Abbie": 1,
  ...
}

The field parameter could be an array of fields

> db.users.distinctAndCount(['name','job'], {name: /^a/i})

{
  "Austin,Educator" : 1,
  "Aurelia,Educator" : 1,
  "Augustine,Carpenter" : 1,
  ...
}

4 Comments

how would i import this in node?
require("./script.js"), i suppose
right, but I wasn't able to get the functions inside. How do I use them. They are defined as db.protoptype.distinctAndCount
There's a how-to section in the repo's readme (RTFM!1!!1!) basically, put the .mongorc.jsfile into your home dir. Done.
10

To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :

db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})

So, find number distinct names from a collection where age > 25 will be like :

db.your_collection_name.distinct('names', {'age': {"$gt": 25}})

Hope it helps!

Comments

10

I use this query:

var collection = "countries"; var field = "country"; 
db[collection].distinct(field).forEach(function(value){print(field + ", " + value + ": " + db[collection].count({[field]: value}))})

Output:

countries, England: 3536
countries, France: 238
countries, Australia: 1044
countries, Spain: 16

This query first distinct all the values, and then count for each one of them the number of occurrences.

4 Comments

Can you please tell me how to write this same query in php laravel?
what is host here in this query ?
@HeenaPatil Good catch! I had two bugs in the query, I fixed it now. The hosts was the name of my collection in my db... sorry for that. The other issue that I also fixed tried to call db.collection which I fixed to db[collection]. Thanks!
that is what I am looking for, but I was hoping there is a one line query to handle it, Thanks anyway.
4

If you're on MongoDB 3.4+, you can use $count in an aggregation pipeline:

db.users.aggregate([
  { $group: { _id: '$country' } },
  { $count: 'countOfUniqueCountries' }
]);

Comments

1

Case 1 If you are using find() then the solution for that is:

const getDistinctCountryCount = async (req, res) => {
  const countryCount = await(await db.model.distinct('country')).length;
  console.log("countryCount: ", countryCount);
}

You have to first await the query result, then await to get the length of it.

Case 2 If you are using Aggregate then the solution for that is:

db.model.aggregate([
  { $group: { _id: '$country' } },
  { $count: 'countOfDistinchCountries' }
])

Comments

1

Shortest query I've found to get a sorted count of all distincts:

db.myCollection.aggregate([{
    "$group": {"_id": "$myField", "count":{"$sum": 1}}
  },
  { "$sort": {"count": -1}}
])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.