Is your feature request related to a problem? Please describe.
Writing to online store (DynamoDB) currently uses a single thread, and takes extremely long when attempting to write billions of records (on the order of 12+ hours).
Describe the solution you'd like
Ability to parallelize writes to online store across multiple threads, ideally multiple nodes.
Describe alternatives you've considered
Tried using multiprocessing.Pool to parallelize writes on a single node with 96 cores but am hitting memory limits.
Historically have used AWS Glue to ingest data from S3->DynamoDB (using DynamoDB as a sink), and it works quite well.
Additional context
N/A
Is your feature request related to a problem? Please describe.
Writing to online store (DynamoDB) currently uses a single thread, and takes extremely long when attempting to write billions of records (on the order of 12+ hours).
Describe the solution you'd like
Ability to parallelize writes to online store across multiple threads, ideally multiple nodes.
Describe alternatives you've considered
Tried using
multiprocessing.Poolto parallelize writes on a single node with 96 cores but am hitting memory limits.Historically have used AWS Glue to ingest data from S3->DynamoDB (using DynamoDB as a sink), and it works quite well.
Additional context
N/A