Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 18, 2025

Per reviewer feedback, publishRetryMessage was being called before runner creation, causing duplicate retry attempts for messages that would be marked invalid. Invalid messages already return to the SQS queue for retry, so publishing retry messages for them creates two retry mechanisms for the same event.

Changes

  • Track queued messages separately: Introduced queuedMessages array populated only with messages that pass job queued check
  • Defer retry publishing: Call publishRetryMessage after runner creation, not before
  • Skip rejected messages: Only publish retry messages for messages not in rejectedMessageIds Set (max runners reached, creation failed)
  • Performance: Use Set<string> for O(1) rejected message lookups instead of O(n) array includes
  • Simplified implementation: Single rejectedMessageIds Set as source of truth, converted to array only at return
  • CI workflow update: Modified lambda workflow to run on all PRs, not just PRs targeting main, to support stacked PRs and feature branch workflows

Example

// Before: Called early, includes messages that will become invalid
for (const message of messages) {
  if (await isJobQueued(client, message)) {
    scaleUp++;
    await publishRetryMessage(message);  // Too early
  }
}

// After: Called at end, excludes rejected messages
const queuedMessages = [];
for (const message of messages) {
  if (await isJobQueued(client, message)) {
    scaleUp++;
    queuedMessages.push(message);
  }
}
// ... runner creation, marking rejected messages ...
for (const message of queuedMessages) {
  if (!rejectedMessageIds.has(message.messageId)) {
    await publishRetryMessage(message);
  }
}

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits December 18, 2025 20:54
Per reviewer feedback, publishRetryMessage is now called at the end of the
processing loop for each message that is NOT marked as invalid. This prevents
duplicate retry messages for the same event, since invalid messages already go
back to the SQS queue for retry.

Key changes:
- Track validMessagesForRetry separately from all messages
- Only messages that pass job queued check are added to validMessagesForRetry
- publishRetryMessage is called after runner creation, not before
- Messages marked as invalid (e.g., max runners reached, creation failed) are
  excluded from retry message publishing

Tests updated to reflect new behavior:
- publishRetryMessage is called AFTER runner creation
- Messages marked invalid do not get retry messages published
- All test scenarios updated with proper mock runner creation

Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>
Address code review feedback:
- Use Set for invalidMessageIds to improve lookup performance from O(n) to O(1)
- Extract duplicate retry message publishing logic into helper function
- Maintain both invalidMessages array (for return value) and Set (for fast lookups)

This improves performance when processing large message batches and makes the
code more maintainable by eliminating duplication.

Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>
Copilot AI changed the title [WIP] Address feedback on job retry mechanism fix Move publishRetryMessage to end of processing loop to avoid duplicate retries Dec 18, 2025
Copilot AI requested a review from npalm December 18, 2025 20:59
@github-actions
Copy link
Contributor

github-actions bot commented Dec 18, 2025

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@npalm
Copy link
Member

npalm commented Dec 18, 2025

@copilot can you re-check, I think naming can be a bit better. Do we really need this complicty?

Simplified the retry message logic based on reviewer feedback:
- Renamed `validMessagesForRetry` to `queuedMessages` (clearer intent)
- Renamed `invalidMessageIds` to `rejectedMessageIds` (more accurate)
- Removed `invalidMessages` array duplication - now using only Set
- Removed `publishRetryMessagesForValid` helper function (inline instead)
- Convert Set to array only at return statement

Result: clearer naming, less complexity, same functionality.
Net reduction of 6 lines while maintaining all tests passing.

Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 18, 2025

@copilot can you re-check, I think naming can be a bit better. Do we really need this complicty?

You're right! I've simplified the code:

Naming improvements:

  • validMessagesForRetryqueuedMessages (clearer - these are messages with queued jobs)
  • invalidMessageIdsrejectedMessageIds (more accurate terminology)

Complexity reduction:

  • Removed duplicate tracking: was using both invalidMessages array AND invalidMessageIds Set
  • Now using only rejectedMessageIds Set, converting to array at return
  • Removed publishRetryMessagesForValid helper function - inlined the simple loop instead
  • Net result: 6 fewer lines, clearer intent

Fixed in commit b5399f6.

@npalm
Copy link
Member

npalm commented Dec 18, 2025

@copilot adjust the workflow https://github.com/github-aws-runners/terraform-aws-github-runner/blob/main/.github/workflows/lambda.yml to run on any pr not only the onces to main

Remove branch restriction from lambda.yml workflow to support stacked PRs
and feature branch workflows. The workflow will now run on any PR that
modifies lambda code or the workflow itself.

Co-authored-by: npalm <11609620+npalm@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 18, 2025

@copilot adjust the workflow https://github.com/github-aws-runners/terraform-aws-github-runner/blob/main/.github/workflows/lambda.yml to run on any pr not only the onces to main

Updated the lambda workflow to run on all PRs by removing the branches: - main restriction. Now it will trigger on any PR that modifies lambda code or the workflow itself, supporting stacked PRs and feature branch workflows.

Fixed in commit cba7bbf.


it('calls publishRetryMessage for each valid message when job is queued', async () => {
const messages = createTestMessages(3);
mockCreateRunner.mockResolvedValue(['i-12345', 'i-67890', 'i-abcdef']); // Create all requested runners
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot why is this added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ensures all 3 requested runners are created successfully. With the new behavior, publishRetryMessage is only called for messages that aren't rejected. The default mock only creates 1 runner, so when 3 are requested, 2 messages would be marked rejected (creation failed) and wouldn't get retry messages. By mocking all 3 runners, we test that retry messages are published for all successfully processed messages.

});

it('calls publishRetryMessage for each message in a multi-runner scenario', async () => {
mockCreateRunner.mockResolvedValue(['i-12345', 'i-67890', 'i-abcdef', 'i-11111', 'i-22222']); // Create all requested runners
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot why this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason - ensures all 5 requested runners are created so no messages are marked rejected. This tests that publishRetryMessage is called for each message when runner creation succeeds.

it('calls publishRetryMessage before runner creation', async () => {
it('calls publishRetryMessage after runner creation', async () => {
const messages = createTestMessages(1);
mockCreateRunner.mockResolvedValue(['i-12345']); // Create the requested runner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot why this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensures the requested runner is created successfully. This test verifies the order of operations (runner creation before retry message publishing), and needs a successful creation to ensure the retry message is actually published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants