-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Fix potential dryrun failure when NodeLocalCRISocket reaches GA #135100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: HirazawaUi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
I've opened a draft PR where I removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i said the same on @carlory 's PR.
i really don't think GetNodeRegistration should take dry-run into account.
IMO it should do the following
- get from annotation (after the NodeLocalCRISocket goes GA this is removed)
- if there is no annotation, get from file
- if there is no file, print a warning and try to auto-detect
- if it can't autodetect, print a warning and return default
use klog.Warningf.
If so, |
|
NVM, i deleted my last comment. |
|
Haha, maybe sometimes it's better to communicate a bit slower. The reason @carlory and I were so quick to respond might be that it's very late in our time zone :) |
|
Given the latest changes in this PR, we no longer need PR #135090, right? Because a "real" socket will always be returned now regardless. |
@carlory in that case you should revert your PR to not compare socket names and use the dryRun argument passed to the preflight check, like you did before ...what a mess. i think making the GetNodeRegistration dry-run aware just feels wrong. |
|
LGTM label has been added. Git tree hash: b1ba6edfec8431917af59cc57e8b70f62872ba96
|
no worries. yes, we can have non-blocking presubmit e2e jobs for k/k. two jobs that we can add are |
Okay, I will do it tomorrow |
https://github.com/neolit123/kubeadm-test/actions/runs/19074447580/job/54485941263 options:
let's continue tomorrow |
Based on this PR, I tested it with this option. The ci is green. https://github.com/neolit123/kubeadm-test/actions/runs/19088637811/job/54534316923 |
This comment was marked as off-topic.
This comment was marked as off-topic.
ok, let's use this option then, because it solves the issue with respect to the FG and doesn't force us to introduce the dryrun option for the preflight check. |
e50c1ee to
8ee3fe2
Compare
|
I directly removed the Are there any other potential failure scenarios? I haven't thought of any at the moment. Although there are only two days left until the code freeze, I'm already confident enough to merge this PR. Alternatively, should we keep it on hold and wait until we've thought it through more thoroughly before merging? @neolit123 @carlory WDYT? |
sgtm, thanks for testing and updating it the only thing that i am not so sure about is having the autodetect in the function. iirc, we explicitly didn't want to have something like that in the past. the purpose of the cluster functions was to get existing config, so if we can't get existing config for the socket from annotation or instance file, then something must be wrong. yet, i don't think this fallback to autodetect logic is too problematic. |
|
I also ran a presubmit-upgrade-latest test: https://github.com/neolit123/kubeadm-test/actions/runs/19107770508 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
LGTM label has been added. Git tree hash: b323eaa06227efe8d8aa6d2fc5c611127bbe44ae
|
|
@HirazawaUi: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest (we need to land this fix in 1.35) |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The dryrun will fail once NodeLocalCRISocket reaches GA.
Specifically, if we remove the
kubeadm.alpha.kubernetes.io/cri-socketannotation upon GA, the dryrun will fail in theGetNodeRegistrationfunction. This occurs because the dryrun process cannot access theinstance-config.yamlfile, and we will have also removed the annotation as planned.Which issue(s) this PR is related to:
Fixes kubernetes/kubeadm#3257
Fixes #135086
Special notes for your reviewer:
I was contemplating a minimal solution by using the following approach before calling FetchInitConfigurationFromCluster:
However, after implementing the change, I found it made the code more cumbersome. So I rolled it back and ended up using a more straightforward approach to implement it. :)
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: