Skip to content

Conversation

@aojea
Copy link
Member

@aojea aojea commented Oct 6, 2023

Kubelet, when using cloud provider external, initializes temporary the node addresses using the --node-ip flag values if set, until the cloud provider overrides it.

This behavior has undesired consequences if the cloud-provider addresses are different than the original ones, specially for hostNetwork pods, that inherit these addresses from the Node.

Since some cloud-providers depend on this behavior, in order to keep backward compatibility, assume that the specifying addresses via the node-ip flags means that the intent is to keep the existing behavior to temporary initialize the addresses.

If the node-ips are the unspecified addresses or are not set, then wait for the external cloud provider to set the node addresses.

/kind bug
/kind documentation
/kind feature

Fixes #120720

Kubelet, when using cloud provider external, initializes temporary the node addresses using the --node-ip flag values if set, until the cloud provider overrides it.

Kubelet, if using cloud provider external, initializes temporary
the node addresses using the non-cloud provider logic, until the
cloud provider overrides it.

This behavior has undesired consequences if the cloud-provider addresses
are different than the original ones, specially for hostNetwork pods,
that inherit these addresses from the Node.

Since some cloud-providers depend on this behavior, in order to keep
backward compatibility, assume that the specifying addresses via
the node-ip flags means that the intent is to keep the existing
behavior to temporary initialize the addresses.

If the node-ips are the unspecified addresses or are not set, then
wait for the external cloud provider to set the node addresses.

Change-Id: I3a3895f9b830769f9658e6a03f058c914c438a09
Signed-off-by: Antonio Ojea <aojea@google.com>
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 6, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Oct 6, 2023
@aojea
Copy link
Member Author

aojea commented Oct 6, 2023

/assign @andrewsykim @danwinship @thockin
/cc @liggitt

I think that this approach solves the backwards compatiblity problem and the bug #120720

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 6, 2023
@k8s-ci-robot k8s-ci-robot requested a review from liggitt October 6, 2023 14:08
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 6, 2023
@aojea
Copy link
Member Author

aojea commented Oct 6, 2023

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 6, 2023
@pacoxu
Copy link
Member

pacoxu commented Oct 6, 2023

/cc @SergeyKanzhelev
/assign @mrunalp

@aojea
Copy link
Member Author

aojea commented Oct 6, 2023

/cc @tzneal @lubronzhan

dkoshkin added a commit to nutanix-cloud-native/cluster-api-runtime-extensions-nutanix that referenced this pull request Apr 26, 2024
**What problem does this PR solve?**:
Adds AWS CCM support for Kubernetes v1.29. While testing I also found an
upstream change kubernetes/kubernetes#121028 and
to fix that added `hostNetwork: true` to the CCM pods. Upstream CAPA
templates and the Nutanix CCM already use `hostNetwork: true`, so it
made sense to go with that approach here too.

**Which issue(s) this PR fixes**:
Fixes #

**How Has This Been Tested?**:
<!--
Please describe the tests that you ran to verify your changes.
Provide output from the tests and any manual steps needed to replicate
the tests.
-->
Created an AWS v1.29.4 cluster with the following lookup envs:
```
AMI_LOOKUP_BASEOS=rocky-9.1
AMI_LOOKUP_FORMAT=konvoy-ami-{{.BaseOS}}-release-?{{.K8sVersion}}-*
AMI_LOOKUP_ORG=999867407951
```

**Special notes for your reviewer**:
<!--
Use this to provide any additional information to the reviewers.
This may include:
- Best way to review the PR.
- Where the author wants the most review attention on.
- etc.
-->
@aojea
Copy link
Member Author

aojea commented Jul 29, 2024

For people coming to this issue, and trying to clarify one important issue because most of the problems are related to wrong deployment architectures that broke these deployments by this change in behavior.

Since the move to --cloud-provider=external, the cloud-controller-manager is responsible of some important loops of the cluster that previously ran in the kube-controller-manager, like the node bootstrap controller, the node lifecycle controller, the service loadbalancers, the node ipam .....

IMPORTANT deploy the CLOUD-CONTROLLER-MANAGER the same as yout kube-controller-manager, typically as an static pod on the control plane nodes or you may end with stability problems. Most people is workarounding the problem using the --node-ip flag, but if this controller is running as a pod in any node of the cluster it can be impacted by other workloads

@elmiko
Copy link
Contributor

elmiko commented Jul 31, 2024

@aojea we were talking about this comment in the SIG cloud provider meeting today and we think it would be nice to have a more formal discussion about deployment patterns and the possible advantages and disadvantages of different styles. would you be willing to collaborate on a blog post to go into greater detail on your comment here about static pod deployment?

happy to followup on slack or emails, whatever is most convenient.

@aojea
Copy link
Member Author

aojea commented Jul 31, 2024

happy to followup on slack or emails, whatever is most convenient.

happy to collaborate, let's follow up on slack

wedaly pushed a commit to Azure/AgentBaker that referenced this pull request Aug 15, 2024
In k8s >= 1.29 kubelet no longer sets node internalIP when using external cloud provider
kubernetes/kubernetes#121028

This regresses node startup performance in Azure CNI Overlay and Podsubnet clusters, which require the node to be
assigned an internal IP before configuring pod networking.
To improve node startup performance, explicitly set `--node-ip` to the IP returned from IMDS so kubelet sets
the internal IP when it registers the node.
If this fails, skip setting --node-ip, which is safe because cloud-node-manager will assign it later anyway.
wedaly pushed a commit to Azure/AgentBaker that referenced this pull request Aug 20, 2024
In k8s >= 1.29 kubelet no longer sets node internalIP when using external cloud provider
kubernetes/kubernetes#121028

This regresses node startup performance in Azure CNI Overlay and Podsubnet clusters, which require the node to be
assigned an internal IP before configuring pod networking.
To improve node startup performance, explicitly set `--node-ip` to the IP returned from IMDS so kubelet sets
the internal IP when it registers the node.
If this fails, skip setting --node-ip, which is safe because cloud-node-manager will assign it later anyway.
wedaly pushed a commit to Azure/AgentBaker that referenced this pull request Aug 20, 2024
In k8s >= 1.29 kubelet no longer sets node internalIP when using external cloud provider
kubernetes/kubernetes#121028

This regresses node startup performance in Azure CNI Overlay and Podsubnet clusters, which require the node to be
assigned an internal IP before configuring pod networking.
To improve node startup performance, explicitly set `--node-ip` to the IP returned from IMDS so kubelet sets
the internal IP when it registers the node.
If this fails, skip setting --node-ip, which is safe because cloud-node-manager will assign it later anyway.
wedaly pushed a commit to Azure/AgentBaker that referenced this pull request Aug 21, 2024
In k8s >= 1.29 kubelet no longer sets node internalIP when using external cloud provider
kubernetes/kubernetes#121028

This regresses node startup performance in Azure CNI Overlay and Podsubnet clusters, which require the node to be
assigned an internal IP before configuring pod networking.
To improve node startup performance, explicitly set `--node-ip` to the IP returned from IMDS so kubelet sets
the internal IP when it registers the node.
If this fails, skip setting --node-ip, which is safe because cloud-node-manager will assign it later anyway.
wedaly pushed a commit to Azure/AgentBaker that referenced this pull request Aug 22, 2024
In k8s >= 1.29 kubelet no longer sets node internalIP when using external cloud provider
kubernetes/kubernetes#121028

This regresses node startup performance in Azure CNI Overlay and Podsubnet clusters, which require the node to be
assigned an internal IP before configuring pod networking.
To improve node startup performance, explicitly set `--node-ip` to the IP returned from IMDS so kubelet sets
the internal IP when it registers the node.
If this fails, skip setting --node-ip, which is safe because cloud-node-manager will assign it later anyway.
wedaly pushed a commit to Azure/AgentBaker that referenced this pull request Aug 22, 2024
In k8s >= 1.29 kubelet no longer sets node internalIP when using external cloud provider
kubernetes/kubernetes#121028

This regresses node startup performance in Azure CNI Overlay and Podsubnet clusters, which require the node to be
assigned an internal IP before configuring pod networking.
To improve node startup performance, explicitly set `--node-ip` to the IP returned from IMDS so kubelet sets
the internal IP when it registers the node.
If this fails, skip setting --node-ip, which is safe because cloud-node-manager will assign it later anyway.
wedaly pushed a commit to Azure/AgentBaker that referenced this pull request Aug 26, 2024
In k8s >= 1.29 kubelet no longer sets node internalIP when using external cloud provider
kubernetes/kubernetes#121028

This regresses node startup performance in Azure CNI Overlay and Podsubnet clusters, which require the node to be
assigned an internal IP before configuring pod networking.
To improve node startup performance, explicitly set `--node-ip` to the IP returned from IMDS so kubelet sets
the internal IP when it registers the node.
If this fails, skip setting --node-ip, which is safe because cloud-node-manager will assign it later anyway.
wedaly pushed a commit to Azure/AgentBaker that referenced this pull request Aug 27, 2024
In k8s >= 1.29 kubelet no longer sets node internalIP when using external cloud provider
kubernetes/kubernetes#121028

This regresses node startup performance in Azure CNI Overlay and Podsubnet clusters, which require the node to be
assigned an internal IP before configuring pod networking.
To improve node startup performance, explicitly set `--node-ip` to the IP returned from IMDS so kubelet sets
the internal IP when it registers the node.
If this fails, skip setting --node-ip, which is safe because cloud-node-manager will assign it later anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kubelet: lookup node address logic for external provider assign wrong PodIPs to hostNetwork pods