0

I'm using MongoDB version 4.0.0 with DocumentDB.

I have a collection called transactions. That collection has this two indices:

{
  "creation_date": -1
}
{
  "project_id": 1,
  "mid.id": 1  
}

I'm running the following query:

db.transactions.explain('executionStats').find({
  "project_id": "1",
  "mid.id": {
    $in: [/* array with 47 ids */]
  }
}).sort({creation_date: -1}).skip(0).limit(50)

The execution stats for that code are the following:

{
  "queryPlanner": {
    "plannerVersion": 1,
    "namespace": "myDB.transactions",
    "winningPlan": {
      "stage": "SUBSCAN",
      "inputStage": {
        "stage": "LIMIT_SKIP",
        "inputStage": {
          "stage": "IXSCAN",
          "indexName": "creation_date_idx",
          "direction": "forward"
        }
      }
    }
  },
  "executionStats": {
    "executionSuccess": true,
    "executionTimeMillis": "445731.524",
    "planningTimeMillis": "1.487",
    "executionStages": {
      "stage": "SUBSCAN",
      "nReturned": "44",
      "executionTimeMillisEstimate": "445729.856",
      "inputStage": {
        "stage": "LIMIT_SKIP",
        "nReturned": "44",
        "executionTimeMillisEstimate": "445729.787",
        "inputStage": {
          "stage": "IXSCAN",
          "nReturned": "44",
          "executionTimeMillisEstimate": "445729.765",
          "indexName": "creation_date_idx",
          "direction": "forward"
        }
      }
    }
  },
  "serverInfo": {
    "host": "127.0.0.1",
    "port": 27017,
    "version": "4.0.0"
  },
  "ok": 1,
  "operationTime": {
    "$timestamp": "7231419170506670081"
  }
}

It chose the creation_date index and took 445 seconds to perform the query!

If I try, instead, to hint the query to force using the project_id and mid.id index, here is the winning plan:

{
  "queryPlanner": {
    "plannerVersion": 1,
    "namespace": "myDB.transactions",
    "winningPlan": {
      "stage": "SUBSCAN",
      "inputStage": {
        "stage": "LIMIT_SKIP",
        "inputStage": {
          "stage": "SORT",
          "sortPattern": {
            "creation_date": -1
          },
          "inputStage": {
            "stage": "IXSCAN",
            "indexName": "project_id_mid.id_idx",
            "direction": "forward"
          }
        }
      }
    }
  },
  "executionStats": {
    "executionSuccess": true,
    "executionTimeMillis": "10.966",
    "planningTimeMillis": "1.615",
    "executionStages": {
      "stage": "SUBSCAN",
      "nReturned": "44",
      "executionTimeMillisEstimate": "9.310",
      "inputStage": {
        "stage": "LIMIT_SKIP",
        "nReturned": "44",
        "executionTimeMillisEstimate": "9.292",
        "inputStage": {
          "stage": "SORT",
          "nReturned": "44",
          "executionTimeMillisEstimate": "9.287",
          "sortPattern": {
            "creation_date": -1
          },
          "inputStage": {
            "stage": "IXSCAN",
            "nReturned": "44",
            "executionTimeMillisEstimate": "9.208",
            "indexName": "project_id_mid.id_idx",
            "direction": "forward"
          }
        }
      }
    }
  },
  "serverInfo": {
    "host": "127.0.0.1",
    "port": 27017,
    "version": "4.0.0"
  },
  "ok": 1,
  "operationTime": {
    "$timestamp": "7231419990845423617"
  }
}

It takes 10 milliseconds to perform the query. So... why is DocumentDB choosing the creation_date index by default? Is something wrong with my query?

Some other things I tried:

  • If I run the query with MongoDB Compass pointing to DocumentDB, the query works fine without me having to hint anything.
  • If I reduce the size of the mid.id array to 14 items, it works
2
  • Did you try to re-create the index? Commented May 10, 2023 at 6:00
  • @WernfriedDomscheit Yes, and nothing changes :( Commented May 10, 2023 at 6:22

1 Answer 1

3

You are forcing the query planner to choose between an index that minimized the number of documents examined, and an index that avoids an in-memory sort. Which performs better when testing the plans will depend on things like which index has the most results near the beginning of the index, and the one that avoids the in-memory sort gets a bonus for reduced memory usage.

If you create an index that will do both, like:

{
  "project_id": 1,
  "mid.id": 1,
  "creation_date": -1
}

It should win, hands down, every time.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.