1

So I currently have tens of thousands of items I need to get from dynamo on a call. In my testing, with just 16K+ items, it takes around 750-1000ms for dynamo to return all the data. Not terrible, but I know they are going to be looking for more, and that is a long time for a customer event driven query.

I am running parallel tasks on the batches of 100 for batchGetItem.

        IEnumerable<string> ids;
        
        using (var client = GetDynamoDbClient())
        {
            DynamoDBContextConfig config = new DynamoDBContextConfig
            {
                TableNamePrefix = "test",
                Conversion = DynamoDBEntryConversion.V2,
                ConsistentRead = false,
            };
            
            DynamoDBContext context = new DynamoDBContext(client, config);
            var IdsBatchedLists = Split(ids.ToList(), BatchRequestSizeLimit); //100 items per batch
            var batchTasks = IdsBatchedLists.Select(async list =>
            {
                var batchRequest = context.CreateBatchGet<MyTable>();
                list.ForEach(x => batchRequest.AddKey(x));

                    await batchRequest.ExecuteAsync().ConfigureAwait(false);
                    return batchRequest.Results;
            });

            var results = await Task.WhenAll(batchTasks).ConfigureAwait(false);
        }

I think I have done everything I can to make it as fast as possible (I have tried every configuration I can think of for AmazonDynamoDBConfig).

I think Dynamo might be suffering some kind of query fatigue. In my traces, I can see large amounts of simultaneous queries, and then I will see one or two, then a group again. Image is from DataDog showing all the dynamo hits in a specific trace.

enter image description here

Overall, Has anyone tried this throughput level before, and is there anything I could be missing to do this in parallel faster?

Thanks

6
  • 16k records in a second comes out to about .06ms / record. That's really very good. I'd look at why you need 16k records for a your event as that seems huge. Commented May 12 at 16:18
  • Mostly due to design failures in other areas. Commented May 12 at 16:19
  • You may have to look at something like DAX to speed things up then. I've not used this but, in general, it sounds like you need an in memory cache. Commented May 12 at 16:24
  • Do you have the available compute to do that parallelism. If you're not being throttled by DynamoDB, I'd suspect your slowness is caused by queuing threads. Commented May 12 at 17:19
  • Yeah, the system has enough juice for it. Any limiting on the parallelization just slows the process down more. Commented May 12 at 17:58

1 Answer 1

2

I'd agree that retrieving 16K items at once from DDB isn't a good use case.

But as far as the behavior you are seeing, I suspect the issue is that you're not actually running in parallel.

Try using Parallel.ForEachAsync to spread out the tasks among multiple threads. By default 1 per CPU, but controllable with

ParallelOptions parallelOptions = new()
{
    MaxDegreeOfParallelism = 3
};

you should end up with something like so:

await Parallel.ForEachAsync(batchRequestList, parallelOptions, async (batchRequest) =>
{
    var response = await batchRequest.ExecuteAsync().ConfigureAwait(false);
    //do something to store the response 
    // note that multiple threads will be involved, 
    // so look at System.Collections.Concurrent 
    // maybe ConcurrentBag?
}
// Should have all the results at this point
Sign up to request clarification or add additional context in comments.

3 Comments

What about GSIs? If the 16K items had the same one, would it be faster?
Don't think it'd be faster, possibly a bit easier as DDB Scan has built in support for parallelization
I gave it a shot, and yeah, its actually slower since its sequential vs in parallel. I don't want to do any of the synthetic sharding that would allowing for pulling in parallel.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.