Skip to content

Conversation

@graytaylor0
Copy link
Member

Description

Adds some documentation for new metrics for the DynamoDB source, as well as the limitation of each Data Prepper instance only being able to process 150 DynamoDB stream shards in parallel.

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…d limitation on shard processing

Signed-off-by: Taylor Gray <tylgry@amazon.com>
@github-actions
Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.


## Limitations

* Each Data Prepper instance is limited to process a maximum of 150 DynamoDB stream shards in parallel at one time. You should configure the number of Data Prepper instances as ceil(totalOpenShards.max / 150) to prevent high latency or data loss.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve this? ceil(totalOpenShards.max / 150)

Copy link
Collaborator

@kolchfa-aws kolchfa-aws Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a rewrite suggestion:

Each Data Prepper instance can process up to 150 DynamoDB stream shards in parallel. To prevent high latency and data loss, set the number of Data Prepper instances to the maximum number of open shards divided by 150 (rounded up to the nearest integer).

Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this looks good. But, I think we should improve the way we describe the number of instances to use.

Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @graytaylor0! Please see my suggestions and let me know if you have any questions.


### Gauges

* `totalOpenShards`: The number of open shards found on the DynamoDB stream. Open shards are shards that do not have an EndingSequenceNumber yet.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `totalOpenShards`: The number of open shards found on the DynamoDB stream. Open shards are shards that do not have an EndingSequenceNumber yet.
The `dynamodb` source includes the following gauges:
* `totalOpenShards`: The number of open shards in the DynamoDB stream. Open shards are shards that are not assigned an `EndingSequenceNumber`.

### Gauges

* `totalOpenShards`: The number of open shards found on the DynamoDB stream. Open shards are shards that do not have an EndingSequenceNumber yet.
* `activeShardsInProcessing`: The number of shards that are currently being processed by Data Prepper
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `activeShardsInProcessing`: The number of shards that are currently being processed by Data Prepper
* `activeShardsInProcessing`: The number of shards currently being processed by Data Prepper.


## Limitations

* Each Data Prepper instance is limited to process a maximum of 150 DynamoDB stream shards in parallel at one time. You should configure the number of Data Prepper instances as ceil(totalOpenShards.max / 150) to prevent high latency or data loss.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Each Data Prepper instance is limited to process a maximum of 150 DynamoDB stream shards in parallel at one time. You should configure the number of Data Prepper instances as ceil(totalOpenShards.max / 150) to prevent high latency or data loss.
Note the following limitations:
* Each Data Prepper instance can process up to 150 DynamoDB stream shards in parallel. To prevent high latency and data loss, set the number of Data Prepper instances to the maximum number of open shards divided by 150 (rounded up to the nearest integer).

@kolchfa-aws kolchfa-aws added Doc review PR: Doc review in progress backport 3.4 labels Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 3.4 Doc review PR: Doc review in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants