Skip to content

feat(core): make leader election timings configurable#7377

Open
k8s-1 wants to merge 1 commit into
crossplane:mainfrom
k8s-1:feat/configurable-leader-election-timings
Open

feat(core): make leader election timings configurable#7377
k8s-1 wants to merge 1 commit into
crossplane:mainfrom
k8s-1:feat/configurable-leader-election-timings

Conversation

@k8s-1
Copy link
Copy Markdown

@k8s-1 k8s-1 commented May 11, 2026

Description of your changes

Adds --leader-election-lease-duration, --leader-election-renew-deadline, and --leader-election-retry-period flags and equivalent env vars to the core and rbac manager start subcommands.

  • 60s / 50s / 2s for core, which preserves existing behaviour
  • 60s / 50s / 2s for rbac, up from the controller-runtime defaults of 15s / 10s / 2s for consistency with core

Motivated by clusters with degraded API server connectivity where the hardcoded renew deadline is exceeded under load causing controller crashes.

Note: reformatted surrounding struct tags for consistency (gofmt).

Fixes #7376

I have:

  • Read and followed Crossplane's contribution process.
  • Run ./nix.sh flake check to ensure this PR is ready for review.
    - [ ] Added or updated unit tests.
    - [ ] Added or updated e2e tests.
    - [ ] Linked a PR or a docs tracking issue to document this change.
    - [ ] Added backport release-x.y labels to auto-backport this PR.
    - [ ] Followed the API promotion workflow if this PR introduces, removes, or promotes an API.

Need help with this checklist? See the cheat sheet.

@k8s-1 k8s-1 requested review from a team and negz as code owners May 11, 2026 20:06
@k8s-1 k8s-1 requested a review from adamwg May 11, 2026 20:06
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e24ad540-2356-4456-bb4c-8f8685acf497

📥 Commits

Reviewing files that changed from the base of the PR and between 8c232d5 and 592a95c.

📒 Files selected for processing (2)
  • cmd/crossplane/core/core.go
  • cmd/crossplane/rbac/rbac.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/crossplane/rbac/rbac.go
  • cmd/crossplane/core/core.go

📝 Walkthrough

Walkthrough

The PR exposes leader election timing flags (LeaderElectionLeaseDuration, LeaderElectionRenewDeadline, LeaderElectionRetryPeriod) in core and RBAC start commands (with env var bindings) and wires them into the controller-runtime manager options (LeaseDuration, RenewDeadline, RetryPeriod).

Changes

Leader Election Timing Configuration

Layer / File(s) Summary
Core Controller - Leader Election Configuration
cmd/crossplane/core/core.go
Start command adds flags LeaderElectionLeaseDuration, LeaderElectionRenewDeadline, LeaderElectionRetryPeriod (with env var bindings). Manager options use these fields via pointers for LeaseDuration, RenewDeadline, and RetryPeriod.
RBAC Controller - Leader Election Configuration
cmd/crossplane/rbac/rbac.go
Start command adds the same leader-election timing flags with defaults and env var support. Manager options are updated to set LeaseDuration, RenewDeadline, and RetryPeriod from these fields.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

"I nibble on flags beneath the moon,
Leasing time so leaders stay in tune,
Deadlines softened, retries align,
Controllers hum and logs feel fine,
— a rabbit's hop for better uptime 🐇"

Would you like me to call out the exact default values and env var names inline in the review to verify they match the linked issue?

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: making leader election timings configurable. It is descriptive, under 72 characters (53), and directly reflects the primary objective of the PR.
Description check ✅ Passed The description is detailed and directly related to the changeset, explaining the motivation, specific flags added, default values, and the problem being solved.
Linked Issues check ✅ Passed The PR successfully implements the requested feature from #7376 by adding CLI flags and environment variables for leader election timing configuration to both core and rbac controllers.
Out of Scope Changes check ✅ Passed All changes are directly related to the linked issue objective. The minor reformatting of struct tags (gofmt) is a standard, incidental improvement and not out of scope.
Breaking Changes ✅ Passed The PR adds optional CLI flags with defaults to private internal startCommand structs. No existing public fields/flags are removed, renamed, or made required. No breaking changes detected.
Feature Gate Requirement ✅ Passed PR adds configurable options for existing leader election timings, not new experimental features. No changes to apis/**. Flags not marked as experimental/alpha.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/crossplane/core/core.go`:
- Around line 104-107: Add the same leader-election timing validation in the
core manager's Run method: when LeaderElection is true, validate that
LeaderElectionLeaseDuration > LeaderElectionRenewDeadline and
LeaderElectionRenewDeadline > LeaderElectionRetryPeriod and return an error (or
exit) with a clear message if violated; update the Run function (where manager
startup occurs) to perform these checks using the LeaderElectionLeaseDuration,
LeaderElectionRenewDeadline, and LeaderElectionRetryPeriod fields so the core
controller enforces the documented constraints consistently with the RBAC
controller.

In `@cmd/crossplane/rbac/rbac.go`:
- Around line 52-56: Add startup validation in the Run method to enforce the
documented leader election timing relationships: when LeaderElection is true,
verify LeaderElectionRenewDeadline < LeaderElectionLeaseDuration and
LeaderElectionRetryPeriod < LeaderElectionRenewDeadline; if either check fails,
return a clear error (e.g., "leader election renew deadline must be less than
lease duration" or "leader election retry period must be less than renew
deadline") before calling ctrl.NewManager so misconfigurations are caught early.
Ensure you reference the struct fields
ProviderClusterRole/LeaderElection/LeaderElectionLeaseDuration/LeaderElectionRenewDeadline/LeaderElectionRetryPeriod
and perform the checks only when LeaderElection is enabled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ec049bdd-98ad-4562-a24d-e4ec605b7d4c

📥 Commits

Reviewing files that changed from the base of the PR and between f018aac and 9639704.

📒 Files selected for processing (2)
  • cmd/crossplane/core/core.go
  • cmd/crossplane/rbac/rbac.go

Comment thread cmd/crossplane/core/core.go Outdated
Comment on lines +104 to +107
LeaderElection bool `default:"false" env:"LEADER_ELECTION" help:"Use leader election for the controller manager." short:"l"`
LeaderElectionLeaseDuration time.Duration `default:"60s" env:"LEADER_ELECTION_LEASE_DURATION" help:"Duration a leader lease is valid. Must be greater than the renew deadline."`
LeaderElectionRenewDeadline time.Duration `default:"50s" env:"LEADER_ELECTION_RENEW_DEADLINE" help:"Duration the leader must renew the lease before it expires. Must be greater than the retry period."`
LeaderElectionRetryPeriod time.Duration `default:"2s" env:"LEADER_ELECTION_RETRY_PERIOD" help:"How often the leader and candidates retry lease operations."`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Same timing constraint validation concern as RBAC.

The same validation concern I mentioned for the RBAC controller applies here—the help text documents timing constraints that aren't enforced by the code. Would you like to add similar validation in this file's Run method?

This would ensure both the core and RBAC managers validate their leader election configuration consistently, preventing runtime surprises for operators.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/crossplane/core/core.go` around lines 104 - 107, Add the same
leader-election timing validation in the core manager's Run method: when
LeaderElection is true, validate that LeaderElectionLeaseDuration >
LeaderElectionRenewDeadline and LeaderElectionRenewDeadline >
LeaderElectionRetryPeriod and return an error (or exit) with a clear message if
violated; update the Run function (where manager startup occurs) to perform
these checks using the LeaderElectionLeaseDuration, LeaderElectionRenewDeadline,
and LeaderElectionRetryPeriod fields so the core controller enforces the
documented constraints consistently with the RBAC controller.

Comment thread cmd/crossplane/rbac/rbac.go Outdated
Comment on lines +52 to +56
ProviderClusterRole string `help:"A ClusterRole enumerating the permissions provider packages may request." name:"provider-clusterrole"`
LeaderElection bool `env:"LEADER_ELECTION" help:"Use leader election for the controller manager." name:"leader-election" short:"l"`
LeaderElectionLeaseDuration time.Duration `default:"60s" env:"LEADER_ELECTION_LEASE_DURATION" help:"Duration a leader lease is valid. Must be greater than the renew deadline."`
LeaderElectionRenewDeadline time.Duration `default:"50s" env:"LEADER_ELECTION_RENEW_DEADLINE" help:"Duration the leader must renew the lease before it expires. Must be greater than the retry period."`
LeaderElectionRetryPeriod time.Duration `default:"2s" env:"LEADER_ELECTION_RETRY_PERIOD" help:"How often the leader and candidates retry lease operations."`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Consider validating timing constraints at startup.

The help text documents important timing relationships (lease > renew > retry), but there's no code validation to enforce these constraints. If a user misconfigures these values (e.g., setting renew deadline > lease duration), controller-runtime may produce unclear errors at runtime.

Would it be helpful to add validation in the Run method before calling ctrl.NewManager? This would provide immediate, user-friendly feedback about configuration issues. For example:

if c.LeaderElectionRenewDeadline >= c.LeaderElectionLeaseDuration {
    return errors.New("leader election renew deadline must be less than lease duration")
}
if c.LeaderElectionRetryPeriod >= c.LeaderElectionRenewDeadline {
    return errors.New("leader election retry period must be less than renew deadline")
}

This validation would only apply when leader election is enabled, of course.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/crossplane/rbac/rbac.go` around lines 52 - 56, Add startup validation in
the Run method to enforce the documented leader election timing relationships:
when LeaderElection is true, verify LeaderElectionRenewDeadline <
LeaderElectionLeaseDuration and LeaderElectionRetryPeriod <
LeaderElectionRenewDeadline; if either check fails, return a clear error (e.g.,
"leader election renew deadline must be less than lease duration" or "leader
election retry period must be less than renew deadline") before calling
ctrl.NewManager so misconfigurations are caught early. Ensure you reference the
struct fields
ProviderClusterRole/LeaderElection/LeaderElectionLeaseDuration/LeaderElectionRenewDeadline/LeaderElectionRetryPeriod
and perform the checks only when LeaderElection is enabled.

@k8s-1 k8s-1 force-pushed the feat/configurable-leader-election-timings branch from 9639704 to 8c232d5 Compare May 11, 2026 21:03
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
cmd/crossplane/rbac/rbac.go (1)

54-56: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Validate leader election timing constraints at startup.

The help text documents timing constraints (lease duration > renew deadline > retry period), but they're not enforced in code. Misconfigured values could cause unclear runtime errors.

Consider adding validation in the Run method before calling ctrl.NewManager:

if c.LeaderElection {
    if c.LeaderElectionRenewDeadline >= c.LeaderElectionLeaseDuration {
        return errors.New("leader election renew deadline must be less than lease duration")
    }
    if c.LeaderElectionRetryPeriod >= c.LeaderElectionRenewDeadline {
        return errors.New("leader election retry period must be less than renew deadline")
    }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/crossplane/rbac/rbac.go` around lines 54 - 56, Add runtime validation of
the leader election timing fields in the Run method: when LeaderElection is
true, check that LeaderElectionRenewDeadline < LeaderElectionLeaseDuration and
LeaderElectionRetryPeriod < LeaderElectionRenewDeadline, and return clear errors
if those constraints fail; update the Run function to perform these checks
before calling ctrl.NewManager, referencing the struct fields LeaderElection,
LeaderElectionLeaseDuration, LeaderElectionRenewDeadline, and
LeaderElectionRetryPeriod.
cmd/crossplane/core/core.go (1)

106-108: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Validate leader election timing constraints at startup.

The help text documents timing relationships (lease duration > renew deadline > retry period), but there's no validation to enforce these constraints. Misconfigured values could produce unclear runtime errors from controller-runtime.

Consider adding validation in the Run method before calling ctrl.NewManager:

if c.LeaderElection {
    if c.LeaderElectionRenewDeadline >= c.LeaderElectionLeaseDuration {
        return errors.New("leader election renew deadline must be less than lease duration")
    }
    if c.LeaderElectionRetryPeriod >= c.LeaderElectionRenewDeadline {
        return errors.New("leader election retry period must be less than renew deadline")
    }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/crossplane/core/core.go` around lines 106 - 108, Add startup validation
in the Run method to enforce the documented leader election timing constraints:
when LeaderElection is true, check that LeaderElectionRenewDeadline <
LeaderElectionLeaseDuration and LeaderElectionRetryPeriod <
LeaderElectionRenewDeadline and return descriptive errors if violated; reference
the fields LeaderElection, LeaderElectionLeaseDuration,
LeaderElectionRenewDeadline, LeaderElectionRetryPeriod and perform these checks
before calling ctrl.NewManager so misconfigurations fail fast with clear
messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@cmd/crossplane/core/core.go`:
- Around line 106-108: Add startup validation in the Run method to enforce the
documented leader election timing constraints: when LeaderElection is true,
check that LeaderElectionRenewDeadline < LeaderElectionLeaseDuration and
LeaderElectionRetryPeriod < LeaderElectionRenewDeadline and return descriptive
errors if violated; reference the fields LeaderElection,
LeaderElectionLeaseDuration, LeaderElectionRenewDeadline,
LeaderElectionRetryPeriod and perform these checks before calling
ctrl.NewManager so misconfigurations fail fast with clear messages.

In `@cmd/crossplane/rbac/rbac.go`:
- Around line 54-56: Add runtime validation of the leader election timing fields
in the Run method: when LeaderElection is true, check that
LeaderElectionRenewDeadline < LeaderElectionLeaseDuration and
LeaderElectionRetryPeriod < LeaderElectionRenewDeadline, and return clear errors
if those constraints fail; update the Run function to perform these checks
before calling ctrl.NewManager, referencing the struct fields LeaderElection,
LeaderElectionLeaseDuration, LeaderElectionRenewDeadline, and
LeaderElectionRetryPeriod.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 364184a0-66e2-4885-8bda-be8785f82df3

📥 Commits

Reviewing files that changed from the base of the PR and between 9639704 and 8c232d5.

📒 Files selected for processing (2)
  • cmd/crossplane/core/core.go
  • cmd/crossplane/rbac/rbac.go

@k8s-1 k8s-1 changed the title feat(core): make leader election timings configurable feat: make leader election timings configurable May 11, 2026
@k8s-1 k8s-1 changed the title feat: make leader election timings configurable feat(core): make leader election timings configurable May 11, 2026
@k8s-1 k8s-1 force-pushed the feat/configurable-leader-election-timings branch from 8c232d5 to 961030e Compare May 12, 2026 06:04
Adds --leader-election-lease-duration, --leader-election-renew-deadline,
and --leader-election-retry-period flags (and equivalent env vars) to both
the core and RBAC manager start commands. Defaults preserve existing
behaviour (60s/50s/2s for core; 60s/50s/2s for RBAC, up from the
controller-runtime defaults of 15s/10s/2s for consistency).

Motivated by clusters with degraded API server connectivity where the
hardcoded renew deadline is exceeded under load, causing leader loss and
controller crashes.

Signed-off-by: k8s-1 <176627561+k8s-1@users.noreply.github.com>
@k8s-1 k8s-1 force-pushed the feat/configurable-leader-election-timings branch from 961030e to 592a95c Compare May 12, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: make leader election timings configurable

1 participant