-
Notifications
You must be signed in to change notification settings - Fork 696
Move publishRetryMessage to end of processing loop to avoid duplicate retries #4966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: stu/fix_job_retry
Are you sure you want to change the base?
Conversation
Per reviewer feedback, publishRetryMessage is now called at the end of the processing loop for each message that is NOT marked as invalid. This prevents duplicate retry messages for the same event, since invalid messages already go back to the SQS queue for retry. Key changes: - Track validMessagesForRetry separately from all messages - Only messages that pass job queued check are added to validMessagesForRetry - publishRetryMessage is called after runner creation, not before - Messages marked as invalid (e.g., max runners reached, creation failed) are excluded from retry message publishing Tests updated to reflect new behavior: - publishRetryMessage is called AFTER runner creation - Messages marked invalid do not get retry messages published - All test scenarios updated with proper mock runner creation Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>
Address code review feedback: - Use Set for invalidMessageIds to improve lookup performance from O(n) to O(1) - Extract duplicate retry message publishing logic into helper function - Maintain both invalidMessages array (for return value) and Set (for fast lookups) This improves performance when processing large message batches and makes the code more maintainable by eliminating duplication. Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
@copilot can you re-check, I think naming can be a bit better. Do we really need this complicty? |
Simplified the retry message logic based on reviewer feedback: - Renamed `validMessagesForRetry` to `queuedMessages` (clearer intent) - Renamed `invalidMessageIds` to `rejectedMessageIds` (more accurate) - Removed `invalidMessages` array duplication - now using only Set - Removed `publishRetryMessagesForValid` helper function (inline instead) - Convert Set to array only at return statement Result: clearer naming, less complexity, same functionality. Net reduction of 6 lines while maintaining all tests passing. Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>
You're right! I've simplified the code: Naming improvements:
Complexity reduction:
Fixed in commit b5399f6. |
|
@copilot adjust the workflow https://github.com/github-aws-runners/terraform-aws-github-runner/blob/main/.github/workflows/lambda.yml to run on any pr not only the onces to main |
Remove branch restriction from lambda.yml workflow to support stacked PRs and feature branch workflows. The workflow will now run on any PR that modifies lambda code or the workflow itself. Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>
Updated the lambda workflow to run on all PRs by removing the Fixed in commit cba7bbf. |
|
|
||
| it('calls publishRetryMessage for each valid message when job is queued', async () => { | ||
| const messages = createTestMessages(3); | ||
| mockCreateRunner.mockResolvedValue(['i-12345', 'i-67890', 'i-abcdef']); // Create all requested runners |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot why is this added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures all 3 requested runners are created successfully. With the new behavior, publishRetryMessage is only called for messages that aren't rejected. The default mock only creates 1 runner, so when 3 are requested, 2 messages would be marked rejected (creation failed) and wouldn't get retry messages. By mocking all 3 runners, we test that retry messages are published for all successfully processed messages.
| }); | ||
|
|
||
| it('calls publishRetryMessage for each message in a multi-runner scenario', async () => { | ||
| mockCreateRunner.mockResolvedValue(['i-12345', 'i-67890', 'i-abcdef', 'i-11111', 'i-22222']); // Create all requested runners |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot why this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason - ensures all 5 requested runners are created so no messages are marked rejected. This tests that publishRetryMessage is called for each message when runner creation succeeds.
| it('calls publishRetryMessage before runner creation', async () => { | ||
| it('calls publishRetryMessage after runner creation', async () => { | ||
| const messages = createTestMessages(1); | ||
| mockCreateRunner.mockResolvedValue(['i-12345']); // Create the requested runner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot why this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensures the requested runner is created successfully. This test verifies the order of operations (runner creation before retry message publishing), and needs a successful creation to ensure the retry message is actually published.
Per reviewer feedback,
publishRetryMessagewas being called before runner creation, causing duplicate retry attempts for messages that would be marked invalid. Invalid messages already return to the SQS queue for retry, so publishing retry messages for them creates two retry mechanisms for the same event.Changes
queuedMessagesarray populated only with messages that pass job queued checkpublishRetryMessageafter runner creation, not beforerejectedMessageIdsSet (max runners reached, creation failed)Set<string>for O(1) rejected message lookups instead of O(n) array includesrejectedMessageIdsSet as source of truth, converted to array only at returnExample
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.