-
Notifications
You must be signed in to change notification settings - Fork 101
Add Streaming module #1085
Copy link
Copy link
Open
Labels
feature-parityFeature parity with python versionFeature parity with python versionfeature-requestNew feature or requestNew feature or requestneeds-more-information
Metadata
Metadata
Assignees
Labels
feature-parityFeature parity with python versionFeature parity with python versionfeature-requestNew feature or requestNew feature or requestneeds-more-information
Type
Fields
Give feedbackNo fields configured for issues without a type.
Projects
StatusShow more project fields
Ideas
To fill the gap with Powertools for Python, we should add a streaming module. This will allow us to handle datasets larger than the available memory as streaming data, for instance, transforming CSVs on the fly.
Within Lambda, processing S3 objects larger than the allocated amount of memory can lead to out of memory or timeout situations. For cost efficiency, your S3 objects may be encoded and compressed in various formats (gzip, CSV, zip files, etc), increasing the amount of non-business logic and reliability risks.
Streaming utility makes this process easier by fetching parts of your data as you consume it, and transparently applying data transformations to the data stream. This allows you to process one, a few, or all rows of your large dataset while consuming a few MBs only.
Python version: https://awslabs.github.io/aws-lambda-powertools-python/2.4.0/utilities/streaming/