Move publishRetryMessage to end of processing loop to avoid duplicate retries #4966

Copilot · 2025-12-18T20:42:58Z

Per reviewer feedback, publishRetryMessage was being called before runner creation, causing duplicate retry attempts for messages that would be marked invalid. Invalid messages already return to the SQS queue for retry, so publishing retry messages for them creates two retry mechanisms for the same event.

Changes

Track queued messages separately: Introduced queuedMessages array populated only with messages that pass job queued check
Defer retry publishing: Call publishRetryMessage after runner creation, not before
Skip rejected messages: Only publish retry messages for messages not in rejectedMessageIds Set (max runners reached, creation failed)
Performance: Use Set<string> for O(1) rejected message lookups instead of O(n) array includes
Simplified implementation: Single rejectedMessageIds Set as source of truth, converted to array only at return
CI workflow update: Modified lambda workflow to run on all PRs, not just PRs targeting main, to support stacked PRs and feature branch workflows

Example

// Before: Called early, includes messages that will become invalid
for (const message of messages) {
  if (await isJobQueued(client, message)) {
    scaleUp++;
    await publishRetryMessage(message);  // Too early
  }
}

// After: Called at end, excludes rejected messages
const queuedMessages = [];
for (const message of messages) {
  if (await isJobQueued(client, message)) {
    scaleUp++;
    queuedMessages.push(message);
  }
}
// ... runner creation, marking rejected messages ...
for (const message of queuedMessages) {
  if (!rejectedMessageIds.has(message.messageId)) {
    await publishRetryMessage(message);
  }
}

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Per reviewer feedback, publishRetryMessage is now called at the end of the processing loop for each message that is NOT marked as invalid. This prevents duplicate retry messages for the same event, since invalid messages already go back to the SQS queue for retry. Key changes: - Track validMessagesForRetry separately from all messages - Only messages that pass job queued check are added to validMessagesForRetry - publishRetryMessage is called after runner creation, not before - Messages marked as invalid (e.g., max runners reached, creation failed) are excluded from retry message publishing Tests updated to reflect new behavior: - publishRetryMessage is called AFTER runner creation - Messages marked invalid do not get retry messages published - All test scenarios updated with proper mock runner creation Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>

Address code review feedback: - Use Set for invalidMessageIds to improve lookup performance from O(n) to O(1) - Extract duplicate retry message publishing logic into helper function - Maintain both invalidMessages array (for return value) and Set (for fast lookups) This improves performance when processing large message batches and makes the code more maintainable by eliminating duplication. Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>

github-actions · 2025-12-18T21:03:50Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

npalm · 2025-12-18T21:28:22Z

@copilot can you re-check, I think naming can be a bit better. Do we really need this complicty?

Simplified the retry message logic based on reviewer feedback: - Renamed `validMessagesForRetry` to `queuedMessages` (clearer intent) - Renamed `invalidMessageIds` to `rejectedMessageIds` (more accurate) - Removed `invalidMessages` array duplication - now using only Set - Removed `publishRetryMessagesForValid` helper function (inline instead) - Convert Set to array only at return statement Result: clearer naming, less complexity, same functionality. Net reduction of 6 lines while maintaining all tests passing. Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>

Copilot · 2025-12-18T21:33:08Z

@copilot can you re-check, I think naming can be a bit better. Do we really need this complicty?

You're right! I've simplified the code:

Naming improvements:

validMessagesForRetry → queuedMessages (clearer - these are messages with queued jobs)
invalidMessageIds → rejectedMessageIds (more accurate terminology)

Complexity reduction:

Removed duplicate tracking: was using both invalidMessages array AND invalidMessageIds Set
Now using only rejectedMessageIds Set, converting to array at return
Removed publishRetryMessagesForValid helper function - inlined the simple loop instead
Net result: 6 fewer lines, clearer intent

Fixed in commit b5399f6.

npalm · 2025-12-18T23:08:25Z

@copilot adjust the workflow https://github.com/github-aws-runners/terraform-aws-github-runner/blob/main/.github/workflows/lambda.yml to run on any pr not only the onces to main

Remove branch restriction from lambda.yml workflow to support stacked PRs and feature branch workflows. The workflow will now run on any PR that modifies lambda code or the workflow itself. Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>

Copilot · 2025-12-18T23:10:23Z

@copilot adjust the workflow https://github.com/github-aws-runners/terraform-aws-github-runner/blob/main/.github/workflows/lambda.yml to run on any pr not only the onces to main

Updated the lambda workflow to run on all PRs by removing the branches: - main restriction. Now it will trigger on any PR that modifies lambda code or the workflow itself, supporting stacked PRs and feature branch workflows.

Fixed in commit cba7bbf.

npalm · 2025-12-19T09:47:45Z

lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts


  it('calls publishRetryMessage for each valid message when job is queued', async () => {
    const messages = createTestMessages(3);
+    mockCreateRunner.mockResolvedValue(['i-12345', 'i-67890', 'i-abcdef']); // Create all requested runners


@copilot why is this added?

This ensures all 3 requested runners are created successfully. With the new behavior, publishRetryMessage is only called for messages that aren't rejected. The default mock only creates 1 runner, so when 3 are requested, 2 messages would be marked rejected (creation failed) and wouldn't get retry messages. By mocking all 3 runners, we test that retry messages are published for all successfully processed messages.

npalm · 2025-12-19T09:48:53Z

lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts

  });

  it('calls publishRetryMessage for each message in a multi-runner scenario', async () => {
+    mockCreateRunner.mockResolvedValue(['i-12345', 'i-67890', 'i-abcdef', 'i-11111', 'i-22222']); // Create all requested runners


@copilot why this line?

Same reason - ensures all 5 requested runners are created so no messages are marked rejected. This tests that publishRetryMessage is called for each message when runner creation succeeds.

npalm · 2025-12-19T09:48:59Z

lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts

-  it('calls publishRetryMessage before runner creation', async () => {
+  it('calls publishRetryMessage after runner creation', async () => {
    const messages = createTestMessages(1);
+    mockCreateRunner.mockResolvedValue(['i-12345']); // Create the requested runner


@copilot why this line?

Ensures the requested runner is created successfully. This test verifies the order of operations (runner creation before retry message publishing), and needs a successful creation to ensure the retry message is actually published.

Initial plan

3e24ec6

Copilot AI assigned Copilot and npalm Dec 18, 2025

Copilot AI mentioned this pull request Dec 18, 2025

fix: job retry mechanism not triggering #4961

Open

Copilot started work on behalf of npalm December 18, 2025 20:43 View session

Copilot AI and others added 2 commits December 18, 2025 20:54

Copilot AI changed the title ~~[WIP] Address feedback on job retry mechanism fix~~ Move publishRetryMessage to end of processing loop to avoid duplicate retries Dec 18, 2025

Copilot AI requested a review from npalm December 18, 2025 20:59

Copilot finished work on behalf of npalm December 18, 2025 20:59

Copilot started work on behalf of npalm December 18, 2025 21:28 View session

Copilot finished work on behalf of npalm December 18, 2025 21:35

Copilot started work on behalf of npalm December 18, 2025 23:08 View session

Copilot finished work on behalf of npalm December 18, 2025 23:11

npalm reviewed Dec 19, 2025

View reviewed changes

Copilot started work on behalf of npalm December 19, 2025 09:49 View session

Copilot AI requested a review from npalm December 19, 2025 09:56

Copilot finished work on behalf of npalm December 19, 2025 09:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move publishRetryMessage to end of processing loop to avoid duplicate retries #4966

Move publishRetryMessage to end of processing loop to avoid duplicate retries #4966

Uh oh!

Copilot AI commented Dec 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

npalm commented Dec 18, 2025

Uh oh!

Copilot AI commented Dec 18, 2025

Uh oh!

npalm commented Dec 18, 2025

Uh oh!

Copilot AI commented Dec 18, 2025

Uh oh!

npalm Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

npalm Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

npalm Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Move publishRetryMessage to end of processing loop to avoid duplicate retries #4966

Are you sure you want to change the base?

Move publishRetryMessage to end of processing loop to avoid duplicate retries #4966

Uh oh!

Conversation

Copilot AI commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example

Uh oh!

github-actions bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

npalm commented Dec 18, 2025

Uh oh!

Copilot AI commented Dec 18, 2025

Uh oh!

npalm commented Dec 18, 2025

Uh oh!

Copilot AI commented Dec 18, 2025

Uh oh!

npalm Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

npalm Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

npalm Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 18, 2025 •

edited

Loading

github-actions bot commented Dec 18, 2025 •

edited

Loading