So I currently have tens of thousands of items I need to get from dynamo on a call. In my testing, with just 16K+ items, it takes around 750-1000ms for dynamo to return all the data. Not terrible, but I know they are going to be looking for more, and that is a long time for a customer event driven query.
I am running parallel tasks on the batches of 100 for batchGetItem.
IEnumerable<string> ids;
using (var client = GetDynamoDbClient())
{
DynamoDBContextConfig config = new DynamoDBContextConfig
{
TableNamePrefix = "test",
Conversion = DynamoDBEntryConversion.V2,
ConsistentRead = false,
};
DynamoDBContext context = new DynamoDBContext(client, config);
var IdsBatchedLists = Split(ids.ToList(), BatchRequestSizeLimit); //100 items per batch
var batchTasks = IdsBatchedLists.Select(async list =>
{
var batchRequest = context.CreateBatchGet<MyTable>();
list.ForEach(x => batchRequest.AddKey(x));
await batchRequest.ExecuteAsync().ConfigureAwait(false);
return batchRequest.Results;
});
var results = await Task.WhenAll(batchTasks).ConfigureAwait(false);
}
I think I have done everything I can to make it as fast as possible (I have tried every configuration I can think of for AmazonDynamoDBConfig).
I think Dynamo might be suffering some kind of query fatigue. In my traces, I can see large amounts of simultaneous queries, and then I will see one or two, then a group again. Image is from DataDog showing all the dynamo hits in a specific trace.
Overall, Has anyone tried this throughput level before, and is there anything I could be missing to do this in parallel faster?
Thanks
