feat(core): make leader election timings configurable#7377
Conversation
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThe PR exposes leader election timing flags (LeaderElectionLeaseDuration, LeaderElectionRenewDeadline, LeaderElectionRetryPeriod) in core and RBAC start commands (with env var bindings) and wires them into the controller-runtime manager options (LeaseDuration, RenewDeadline, RetryPeriod). ChangesLeader Election Timing Configuration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Would you like me to call out the exact default values and env var names inline in the review to verify they match the linked issue? 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/crossplane/core/core.go`:
- Around line 104-107: Add the same leader-election timing validation in the
core manager's Run method: when LeaderElection is true, validate that
LeaderElectionLeaseDuration > LeaderElectionRenewDeadline and
LeaderElectionRenewDeadline > LeaderElectionRetryPeriod and return an error (or
exit) with a clear message if violated; update the Run function (where manager
startup occurs) to perform these checks using the LeaderElectionLeaseDuration,
LeaderElectionRenewDeadline, and LeaderElectionRetryPeriod fields so the core
controller enforces the documented constraints consistently with the RBAC
controller.
In `@cmd/crossplane/rbac/rbac.go`:
- Around line 52-56: Add startup validation in the Run method to enforce the
documented leader election timing relationships: when LeaderElection is true,
verify LeaderElectionRenewDeadline < LeaderElectionLeaseDuration and
LeaderElectionRetryPeriod < LeaderElectionRenewDeadline; if either check fails,
return a clear error (e.g., "leader election renew deadline must be less than
lease duration" or "leader election retry period must be less than renew
deadline") before calling ctrl.NewManager so misconfigurations are caught early.
Ensure you reference the struct fields
ProviderClusterRole/LeaderElection/LeaderElectionLeaseDuration/LeaderElectionRenewDeadline/LeaderElectionRetryPeriod
and perform the checks only when LeaderElection is enabled.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ec049bdd-98ad-4562-a24d-e4ec605b7d4c
📒 Files selected for processing (2)
cmd/crossplane/core/core.gocmd/crossplane/rbac/rbac.go
| LeaderElection bool `default:"false" env:"LEADER_ELECTION" help:"Use leader election for the controller manager." short:"l"` | ||
| LeaderElectionLeaseDuration time.Duration `default:"60s" env:"LEADER_ELECTION_LEASE_DURATION" help:"Duration a leader lease is valid. Must be greater than the renew deadline."` | ||
| LeaderElectionRenewDeadline time.Duration `default:"50s" env:"LEADER_ELECTION_RENEW_DEADLINE" help:"Duration the leader must renew the lease before it expires. Must be greater than the retry period."` | ||
| LeaderElectionRetryPeriod time.Duration `default:"2s" env:"LEADER_ELECTION_RETRY_PERIOD" help:"How often the leader and candidates retry lease operations."` |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Same timing constraint validation concern as RBAC.
The same validation concern I mentioned for the RBAC controller applies here—the help text documents timing constraints that aren't enforced by the code. Would you like to add similar validation in this file's Run method?
This would ensure both the core and RBAC managers validate their leader election configuration consistently, preventing runtime surprises for operators.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/crossplane/core/core.go` around lines 104 - 107, Add the same
leader-election timing validation in the core manager's Run method: when
LeaderElection is true, validate that LeaderElectionLeaseDuration >
LeaderElectionRenewDeadline and LeaderElectionRenewDeadline >
LeaderElectionRetryPeriod and return an error (or exit) with a clear message if
violated; update the Run function (where manager startup occurs) to perform
these checks using the LeaderElectionLeaseDuration, LeaderElectionRenewDeadline,
and LeaderElectionRetryPeriod fields so the core controller enforces the
documented constraints consistently with the RBAC controller.
| ProviderClusterRole string `help:"A ClusterRole enumerating the permissions provider packages may request." name:"provider-clusterrole"` | ||
| LeaderElection bool `env:"LEADER_ELECTION" help:"Use leader election for the controller manager." name:"leader-election" short:"l"` | ||
| LeaderElectionLeaseDuration time.Duration `default:"60s" env:"LEADER_ELECTION_LEASE_DURATION" help:"Duration a leader lease is valid. Must be greater than the renew deadline."` | ||
| LeaderElectionRenewDeadline time.Duration `default:"50s" env:"LEADER_ELECTION_RENEW_DEADLINE" help:"Duration the leader must renew the lease before it expires. Must be greater than the retry period."` | ||
| LeaderElectionRetryPeriod time.Duration `default:"2s" env:"LEADER_ELECTION_RETRY_PERIOD" help:"How often the leader and candidates retry lease operations."` |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Consider validating timing constraints at startup.
The help text documents important timing relationships (lease > renew > retry), but there's no code validation to enforce these constraints. If a user misconfigures these values (e.g., setting renew deadline > lease duration), controller-runtime may produce unclear errors at runtime.
Would it be helpful to add validation in the Run method before calling ctrl.NewManager? This would provide immediate, user-friendly feedback about configuration issues. For example:
if c.LeaderElectionRenewDeadline >= c.LeaderElectionLeaseDuration {
return errors.New("leader election renew deadline must be less than lease duration")
}
if c.LeaderElectionRetryPeriod >= c.LeaderElectionRenewDeadline {
return errors.New("leader election retry period must be less than renew deadline")
}This validation would only apply when leader election is enabled, of course.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/crossplane/rbac/rbac.go` around lines 52 - 56, Add startup validation in
the Run method to enforce the documented leader election timing relationships:
when LeaderElection is true, verify LeaderElectionRenewDeadline <
LeaderElectionLeaseDuration and LeaderElectionRetryPeriod <
LeaderElectionRenewDeadline; if either check fails, return a clear error (e.g.,
"leader election renew deadline must be less than lease duration" or "leader
election retry period must be less than renew deadline") before calling
ctrl.NewManager so misconfigurations are caught early. Ensure you reference the
struct fields
ProviderClusterRole/LeaderElection/LeaderElectionLeaseDuration/LeaderElectionRenewDeadline/LeaderElectionRetryPeriod
and perform the checks only when LeaderElection is enabled.
9639704 to
8c232d5
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
cmd/crossplane/rbac/rbac.go (1)
54-56: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winValidate leader election timing constraints at startup.
The help text documents timing constraints (lease duration > renew deadline > retry period), but they're not enforced in code. Misconfigured values could cause unclear runtime errors.
Consider adding validation in the
Runmethod before callingctrl.NewManager:if c.LeaderElection { if c.LeaderElectionRenewDeadline >= c.LeaderElectionLeaseDuration { return errors.New("leader election renew deadline must be less than lease duration") } if c.LeaderElectionRetryPeriod >= c.LeaderElectionRenewDeadline { return errors.New("leader election retry period must be less than renew deadline") } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/crossplane/rbac/rbac.go` around lines 54 - 56, Add runtime validation of the leader election timing fields in the Run method: when LeaderElection is true, check that LeaderElectionRenewDeadline < LeaderElectionLeaseDuration and LeaderElectionRetryPeriod < LeaderElectionRenewDeadline, and return clear errors if those constraints fail; update the Run function to perform these checks before calling ctrl.NewManager, referencing the struct fields LeaderElection, LeaderElectionLeaseDuration, LeaderElectionRenewDeadline, and LeaderElectionRetryPeriod.cmd/crossplane/core/core.go (1)
106-108: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winValidate leader election timing constraints at startup.
The help text documents timing relationships (lease duration > renew deadline > retry period), but there's no validation to enforce these constraints. Misconfigured values could produce unclear runtime errors from controller-runtime.
Consider adding validation in the
Runmethod before callingctrl.NewManager:if c.LeaderElection { if c.LeaderElectionRenewDeadline >= c.LeaderElectionLeaseDuration { return errors.New("leader election renew deadline must be less than lease duration") } if c.LeaderElectionRetryPeriod >= c.LeaderElectionRenewDeadline { return errors.New("leader election retry period must be less than renew deadline") } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/crossplane/core/core.go` around lines 106 - 108, Add startup validation in the Run method to enforce the documented leader election timing constraints: when LeaderElection is true, check that LeaderElectionRenewDeadline < LeaderElectionLeaseDuration and LeaderElectionRetryPeriod < LeaderElectionRenewDeadline and return descriptive errors if violated; reference the fields LeaderElection, LeaderElectionLeaseDuration, LeaderElectionRenewDeadline, LeaderElectionRetryPeriod and perform these checks before calling ctrl.NewManager so misconfigurations fail fast with clear messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@cmd/crossplane/core/core.go`:
- Around line 106-108: Add startup validation in the Run method to enforce the
documented leader election timing constraints: when LeaderElection is true,
check that LeaderElectionRenewDeadline < LeaderElectionLeaseDuration and
LeaderElectionRetryPeriod < LeaderElectionRenewDeadline and return descriptive
errors if violated; reference the fields LeaderElection,
LeaderElectionLeaseDuration, LeaderElectionRenewDeadline,
LeaderElectionRetryPeriod and perform these checks before calling
ctrl.NewManager so misconfigurations fail fast with clear messages.
In `@cmd/crossplane/rbac/rbac.go`:
- Around line 54-56: Add runtime validation of the leader election timing fields
in the Run method: when LeaderElection is true, check that
LeaderElectionRenewDeadline < LeaderElectionLeaseDuration and
LeaderElectionRetryPeriod < LeaderElectionRenewDeadline, and return clear errors
if those constraints fail; update the Run function to perform these checks
before calling ctrl.NewManager, referencing the struct fields LeaderElection,
LeaderElectionLeaseDuration, LeaderElectionRenewDeadline, and
LeaderElectionRetryPeriod.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 364184a0-66e2-4885-8bda-be8785f82df3
📒 Files selected for processing (2)
cmd/crossplane/core/core.gocmd/crossplane/rbac/rbac.go
8c232d5 to
961030e
Compare
Adds --leader-election-lease-duration, --leader-election-renew-deadline, and --leader-election-retry-period flags (and equivalent env vars) to both the core and RBAC manager start commands. Defaults preserve existing behaviour (60s/50s/2s for core; 60s/50s/2s for RBAC, up from the controller-runtime defaults of 15s/10s/2s for consistency). Motivated by clusters with degraded API server connectivity where the hardcoded renew deadline is exceeded under load, causing leader loss and controller crashes. Signed-off-by: k8s-1 <176627561+k8s-1@users.noreply.github.com>
961030e to
592a95c
Compare
Description of your changes
Adds
--leader-election-lease-duration,--leader-election-renew-deadline, and--leader-election-retry-periodflags and equivalent env vars to the core and rbac manager start subcommands.Motivated by clusters with degraded API server connectivity where the hardcoded renew deadline is exceeded under load causing controller crashes.
Note: reformatted surrounding struct tags for consistency (gofmt).
Fixes #7376
I have:
./nix.sh flake checkto ensure this PR is ready for review.- [ ] Added or updated unit tests.- [ ] Added or updated e2e tests.- [ ] Linked a PR or a docs tracking issue to document this change.- [ ] Addedbackport release-x.ylabels to auto-backport this PR.- [ ] Followed the API promotion workflow if this PR introduces, removes, or promotes an API.Need help with this checklist? See the cheat sheet.