0

I have a table Tours with a Partition Key (PK) id, which is unique and a Global Secondary Index (GSI) geohash, which has repeated values.

I need to get all the geohashes in the array geohashNeighbors, and I am using the following query:

const geohashNeighbors = ['geo1', 'geo2', 'geo3']

const queryPromises = geohashNeighbors.map(geohash => 
    dynamodb.send(new QueryCommand({
        TableName: 'Tours',
        IndexName: 'geohash_index',
        KeyConditionExpression: 'geohash = :geohash',
        ExpressionAttributeValues: {
            ':geohash': geohash
        }
    }))
)

const results = await Promise.all(queryPromises)

As you can see, I am sending one request per item in the array, but I would like to send only one request and get all the items at once.

This is what I tried:

  • I could not find something like KeyConditionExpression: 'geohash IN geohashNeighbors', seems like DynamoDB does not provide search in expression like Postgres.

  • I tried GetBatchItem but it does not work for QueryCommand, only for GetCommand, which only works for the PK, not for the GSI.

  • I tried TransactGetItems but requires the PK for each item, and I do not have the id, only the geohash.

  • I tried begins_with(), but it does not work for the GSI, only the PK.

  • I do not want to run a Scan and filter the hashes, that will defy the purpose of adding an index and probably be slower.

I come from the relational DB world and have only a few weeks of experience with DynamoDB. I searched for hours and tried a few ideas, but either DynamoDB is lacking basic operations or I am missing something (probably the second).

0

1 Answer 1

2

Fundamentally, this is not how DynamoDB works. DynamoDB's "tables" and "indexes" have almost nothing in common with SQL's "tables" and "indexes". It can take a while to "unlearn" the relational mindset.


All data is stored in a partition, which can only be found by its Partition Key. There is no concept of "nearby" or "similar" partitions; you either know the Partition Key or not. (Imagine sorting your address book by the SHA-1 hash of people's names; there's no way to find "everyone named John", because the hashes will have nothing in common.)

This is true in both Tables and Secondary Indexes - the "index" is essentially a second copy of the table, with exactly the same structure. (Secondary Indexes are allowed to have duplicate items, but that doesn't change how you access them.)

If your Table or Index has only a Partition Key, there are exactly two ways of finding an item:

  1. Query it by knowing its Partition Key
  2. Scan all items, in no particular order, testing each one against a filter

If your Table or Index has a Partition Key plus a Sort Key, there are exactly three ways of finding an item:

  1. Query it by knowing its Partition Key and Sort Key
  2. Scan all items, in no particular order, testing each one against a filter
  3. Query a single Partition, by knowing its Partition Key; then retrieve some or all items in that partition, in the order of their Sort Keys

You can run any of those in a loop, either locally, or using batch operations in the DynamoDB API; but every read or write ultimately needs to identify items using one of those methods.


AWS has quite a good section of the user manual about "Data modelling for DynamoDB". The key idea is that you design your Tables and Indexes around how you're going to use the data, not its abstract structure.

So, if your use case involves finding "neighbours" according to some algorithm, you need those neighbours to be together in a single partition. That probably involves calculating a Partition Key which groups them in some way, and a Sort Key which identifies sub-sets within that group. That can be either the Partition Key + Sort Key of the main Table, or of a Secondary Index.

I don't know much about geohashes, but a quick search suggests they're designed so that geographically close items are also lexically close, so you might be able to use:

  • Partition Key: First n characters of geohash
  • Sort Key: Full geohash

Once you have that, you can use a Query to:

  • Target one partition
  • Target multiple items within that partition; ideally, a continuous range based on Sort Key

You might still need to Query two partitions because the values you want are "on the border"; but you won't need a Query for every item.


For example, the data (in the table or GSI) could look like this:

GeoHashPrefix (PK) Geohash (SK) Something Else
AA AA123 blah blah
AA AA128 rhubarb custard
AA AA135 foo bar
AA AA200 left right
AB AB123 me you

Then to find all geohashes starting with "AA" would just be a single Query with a KeyConditionExpression of:

  • GeoHashPrefix = "AA"

To find only those beginning with "AA1", you'd specify an extra condition on the Sort Key within that partition:

  • GeoHashPrefix = "AA"
  • GeoHash begins_with("AA1")

Or to find all geohashes between "AA125" and "AA135" (inclusive):

  • GeoHashPrefix = "AA"
  • GeoHash BETWEEN "AA125" AND "AA135"

A final note is that DynamoDB is not intended to be a universal data store. If the data you have is not suited for partitioning, or the access you need cannot be optimised on that basis, you should pick a different technology.

For instance, ElasticSearch / OpenSearch has optimised index for searching geospatial data based on various operators. There are also extensions for relational databases that add "GIS" functionality, such as PostGIS for PostgreSQL which is supported by AWS Aurora.

Sign up to request clarification or add additional context in comments.

7 Comments

geohashes are hierarchical from what I looked up. So DDB might be a good choice; searching on "dynamodb geohash" turns up some useful ideas. Alternatively, the OP might consider AWS Aurora which supports actual geospatial data
Yes, ElasticSearch/OpenSearch came to my mind as a possibility if what the OP actually wants is a fast geo search. I've added both suggestions as a postscript to the answer.
IMSoP thanks for your comprehensive answer. It's fairly clear to me how DDB works, I understand the idea of how the partitions and indexes work. Leaving geohashes to the side, what I am trying to do is ultimately searching the index by a series of exact matches. The solution I found was to loop and send X requests against the index, but it must be a way to send all the requests in one petition, DDB match the key on their side and returns everything in one response. Something like GetBatchItem but for a Query, since GetItem only works for the PK. Is there anything that comes to mind?
Since as you correctly point that geohashes are incremental strings, the optimal solution will be to use begins_with("abcd", "abcdefg"), but begins_with() only works for PK, not for GSI. Do you know of something that can do the trick with GSI? Google and AI are not giving me much hope. Although I think I understand how indexes work, I do not follow why PKs and GSI are treated differently. At the end a GSI is the PK of another partition with a copy of the data, right?
I'm not sure what you mean by using begins_with; using it where? As I explained in the answer, you can't find "all partitions whose key begins with..." except by scanning all items, because partition keys are hashed, not sorted. But you can make a partition for all geohashes that begin with the same string, and use begins_with or between on the Sort Key inside that partition to narrow the matches further.
I've added an example of exactly how that would look. Remember that the Query must always select a single partition but can select multiple items within that partition.
One further clarification, a GSI is not "a partition", it is a set of partitions, just like the main Table. Think of them like sets of boxes: the Table and the GSI can arrange items into different boxes, but they're always in boxes. So to find an item, you tell the API 1) which set of boxes (Table or GSI); 2) which single box to look in; 3) how to find the right item(s) in that box.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.