Fundamentally, this is not how DynamoDB works. DynamoDB's "tables" and "indexes" have almost nothing in common with SQL's "tables" and "indexes". It can take a while to "unlearn" the relational mindset.
All data is stored in a partition, which can only be found by its Partition Key. There is no concept of "nearby" or "similar" partitions; you either know the Partition Key or not. (Imagine sorting your address book by the SHA-1 hash of people's names; there's no way to find "everyone named John", because the hashes will have nothing in common.)
This is true in both Tables and Secondary Indexes - the "index" is essentially a second copy of the table, with exactly the same structure. (Secondary Indexes are allowed to have duplicate items, but that doesn't change how you access them.)
If your Table or Index has only a Partition Key, there are exactly two ways of finding an item:
- Query it by knowing its Partition Key
- Scan all items, in no particular order, testing each one against a filter
If your Table or Index has a Partition Key plus a Sort Key, there are exactly three ways of finding an item:
- Query it by knowing its Partition Key and Sort Key
- Scan all items, in no particular order, testing each one against a filter
- Query a single Partition, by knowing its Partition Key; then retrieve some or all items in that partition, in the order of their Sort Keys
You can run any of those in a loop, either locally, or using batch operations in the DynamoDB API; but every read or write ultimately needs to identify items using one of those methods.
AWS has quite a good section of the user manual about "Data modelling for DynamoDB". The key idea is that you design your Tables and Indexes around how you're going to use the data, not its abstract structure.
So, if your use case involves finding "neighbours" according to some algorithm, you need those neighbours to be together in a single partition. That probably involves calculating a Partition Key which groups them in some way, and a Sort Key which identifies sub-sets within that group. That can be either the Partition Key + Sort Key of the main Table, or of a Secondary Index.
I don't know much about geohashes, but a quick search suggests they're designed so that geographically close items are also lexically close, so you might be able to use:
- Partition Key: First n characters of geohash
- Sort Key: Full geohash
Once you have that, you can use a Query to:
- Target one partition
- Target multiple items within that partition; ideally, a continuous range based on Sort Key
You might still need to Query two partitions because the values you want are "on the border"; but you won't need a Query for every item.
For example, the data (in the table or GSI) could look like this:
| GeoHashPrefix (PK) |
Geohash (SK) |
Something |
Else |
| AA |
AA123 |
blah |
blah |
| AA |
AA128 |
rhubarb |
custard |
| AA |
AA135 |
foo |
bar |
| AA |
AA200 |
left |
right |
| AB |
AB123 |
me |
you |
Then to find all geohashes starting with "AA" would just be a single Query with a KeyConditionExpression of:
To find only those beginning with "AA1", you'd specify an extra condition on the Sort Key within that partition:
GeoHashPrefix = "AA"
GeoHash begins_with("AA1")
Or to find all geohashes between "AA125" and "AA135" (inclusive):
GeoHashPrefix = "AA"
GeoHash BETWEEN "AA125" AND "AA135"
A final note is that DynamoDB is not intended to be a universal data store. If the data you have is not suited for partitioning, or the access you need cannot be optimised on that basis, you should pick a different technology.
For instance, ElasticSearch / OpenSearch has optimised index for searching geospatial data based on various operators. There are also extensions for relational databases that add "GIS" functionality, such as PostGIS for PostgreSQL which is supported by AWS Aurora.