enhancement: Add API improvements#341
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive set of improvements to the OpenPERouter API. The changes focus on refining resource semantics, ensuring future-proof dual-stack support, and enhancing the overall usability and maintainability of the CRDs. By consolidating breaking changes and moving validation logic into the schema, the API becomes more robust and easier to consume. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request outlines a series of API improvements for OpenPERouter CRDs, focusing on breaking redesigns for improved semantics, dual-stack support, and the migration of validation logic from webhooks to CEL schemas. The review feedback identifies issues in the proposed CEL rules for localCIDRs and gatewayIPs, where the current logic fails to validate single-item lists and uses incorrect method calls for CIDR objects.
e417a80 to
ce55809
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request proposes a series of API improvements for OpenPERouter, including breaking redesigns to address technical debt, security enhancements like removing plaintext passwords, and the adoption of inclusive terminology. The proposal also outlines a shift toward CEL-based schema validation to replace redundant webhook checks. Feedback indicates that the proposed Route Target regex is overly restrictive by excluding IPv4-based formats and identifies several factual inaccuracies regarding line references and existing checks in the documentation's validation table.
| // +kubebuilder:validation:XValidation:rule="!has(self.features) || !self.features.exists(f, f == 'EBGPMultiHop') || !has(self.bfd) || !has(self.bfd.modes) || !self.bfd.modes.exists(m, m == 'Echo')",message="echo mode cannot be used with multi-hop sessions (RFC 5883)" | ||
| ``` | ||
|
|
||
| Replace `AutoCreate` on bridge configs with a `lifecycle` enum: |
There was a problem hiding this comment.
Not sure how much I like this part actually ..
I do understand the appeal of modelling "auto create" as lifecycle : managed.
Can we entertain the idea of alternate naming ? I understand why the change, but, tbh, it looked simpler before imho.
There was a problem hiding this comment.
Not sure how much I like this part actually ..
I do understand the appeal of modelling "auto create" as lifecycle : managed.
Can we entertain the idea of alternate naming ? I understand why the change, but, tbh, it looked simpler before imho.
So we have two things here, one if the stuff about booleans not very welcome on the kubernetes api conventions (I suppose we agree on that).
Then we have the naming, I am not really 100% ok with lifetime either, other alternatives were "policy", "provisioning", "management", "mode", "ownership".
There was a problem hiding this comment.
I mean ... at the end of the day, some things are just what they are: this knob creates a bridge if one does NOT exist. It doesn't manage any lifecycle (e.g. the bridge is NOT deleted once the VNI is ...), or anything fancy like that.
ownership maybe ... but then that's clunky, since we expect the USER to clean this up manually afterwards.
There was a problem hiding this comment.
I mean ... at the end of the day, some things are just what they are: this knob creates a bridge if one does NOT exist. It doesn't manage any lifecycle (e.g. the bridge is NOT deleted once the VNI is ...), or anything fancy like that.
@maiqueb so first we need to see if we agree not "booleans not welcome on API" convention.
There was a problem hiding this comment.
I'm just saying in this one instance, I think the boolean can make sense ...
But, we may want to decide the behavior we want going forward, which will impact what we do now.
There was a problem hiding this comment.
Ok so after some talking looks like openperouter do also delete the bridge so the managed/lifcycle concept make sense, I have push a change stated that and the enum field is managementPolicy
| **Problem:** `VNI` and `ASN` lack immutability enforcement (unlike | ||
| `L2GatewayIPs` and `LocalCIDR`). | ||
|
|
||
| **Proposal:** Add CEL immutability rules: |
There was a problem hiding this comment.
makes sense to me to have the entire spec immutable.
actually, I'd say the only thing that makes sense to be mutable (and not that sure ...) is the node selector.
There was a problem hiding this comment.
makes sense to me to have the entire spec immutable.
actually, I'd say the only thing that makes sense to be mutable (and not that sure ...) is the node selector.
Maybe we can start with full immutability (included node selector) and if users need mutability we can implement per attribute, that will not break the API since.
There was a problem hiding this comment.
yeah, that'd be backwards compatible.
For me, it makes sense to make the vnis immutable. @fedepaol do you disagree ? Do you have strong opinions here ?
There was a problem hiding this comment.
+1 for mutable node-selector, this is pretty handy for maintenance and load balancing.
There was a problem hiding this comment.
I don't care about backward compatibility for now, but I would not be super strict in immutability.
Users must understand how things work, and we should allow changing things where it's possible unless there's a technical constraint to that.
So, I'd like to be able to amend the neighbor ip or the asn without having to delete and recreate, it's even more controller friendly.
ce55809 to
fedbd9e
Compare
27c4db1 to
9dac816
Compare
RamLavi
left a comment
There was a problem hiding this comment.
Hey, overall changes look great! Added a few comments.
| // one IPv6). The first element is the primary address family. | ||
| // +kubebuilder:validation:MinItems=1 | ||
| // +kubebuilder:validation:MaxItems=2 | ||
| // +kubebuilder:validation:XValidation:rule="self.all(x, isCIDR(x)) && (size(self) == 2 ? cidr(self[0]).ip().family() != cidr(self[1]).ip().family() : true)",message="localCIDRs must contain valid CIDRs and at most one of each family" |
There was a problem hiding this comment.
The isCIDR() and cidr() CEL functions are available starting Kubernetes 1.31+ (CEL library v2).
Will openperouter run on older clusters?
There was a problem hiding this comment.
I think it makes sense to limit openperouter to run on a subset of k8s versions. We'd have to document that in the user facing docs though.
There was a problem hiding this comment.
@maiqueb exactly. We just need to decide what we support and properly document it.
There was a problem hiding this comment.
This is a comparation of using 1.31+ features against not using it, so maybe is not worth it ?
With CEL CIDR library (requires K8s 1.31+)
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=2
// +kubebuilder:validation:XValidation:rule="self.all(x, isCIDR(x)) && (size(self) == 2 ? cidr(self[0]).ip().family() != cidr(self[1]).ip().family() : true)",message="localCIDRs must contain valid CIDRs and at most one of each family"
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="localCIDRs cannot be changed"
LocalCIDRs []string `json:"localCIDRs"`
Without CEL CIDR library (works on K8s 1.25+)
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=2
// +kubebuilder:validation:XValidation:rule="size(self) <= 1 || (self[0].contains(':') != self[1].contains(':'))",message="localCIDRs must contain at most one of each IP family"
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="localCIDRs cannot be changed"
LocalCIDRs []string `json:"localCIDRs"`
There was a problem hiding this comment.
IMO we can keep the proposed CEL rule, and have this validation done at the controller level as well (if its not done yet).
On older clusters the error will be reported later, on the CRD status.
There was a problem hiding this comment.
To me it feels like it should be a hard decision based on what version does openperouter support. if we support only 1.31+ then all good. if not - use the old ones.
But that's a question maintainer. It could stay an open issue at this time.
There was a problem hiding this comment.
to me , that assumption (openpe requires k8s >= 1.31) is fair.
FWIW, 1.31 has been EOL'ed more than 6 months ago (2025-11-11).
There was a problem hiding this comment.
@fedepaol you ok with enforcing k8s >= 1.31 (we have nice CEL things there)
ormergi
left a comment
There was a problem hiding this comment.
Thanks for improving the API, really nice findings!
| **Problem:** `VNI` and `ASN` lack immutability enforcement (unlike | ||
| `L2GatewayIPs` and `LocalCIDR`). | ||
|
|
||
| **Proposal:** Add CEL immutability rules: |
There was a problem hiding this comment.
+1 for mutable node-selector, this is pretty handy for maintenance and load balancing.
| | L3VNI | VNI, VRF, Age | | ||
| | L3Passthrough | ASN, HostASN, Age | | ||
|
|
||
| ### Quality and Correctness |
There was a problem hiding this comment.
Please consider addressing following gaps:
- CRD markers set defaults
Some CRDs specify default kubebuilder markers, this is not a good practice because it almost impossible to change defaults later on. Having default set programmatically by the controller allow future changes.
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/underlay_types.go#L40 - Flag names format is inconsistent
Some flag names are camelCase, some dash-case, others are not formatted at all (e.g.:nodename)
https://github.com/openperouter/openperouter/blob/main/cmd/hostcontroller/main.go#L109-L135
All flags should follow the same format. FWIK I know the common convention is dash-case. - JSON tags format is inconsistent
Some API types field's JSON tags is inconsistent, some tags are camelCases, other are not formatted at all:
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/neighbor.go#L121
-> minimumTTL
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/hostsession.go#L36
-> localCIDR
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/hostsession.go#L22
-> hostASN
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/hostsession.go#L28
-> hostType
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/underlay_types.go#L40
-> routerIdCIDR or routerIDCIDR
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/underlay_types.go#L62
-> vtepCIDR
There was a problem hiding this comment.
Please consider addressing following gaps:
- CRD markers set defaults
Some CRDs specify default kubebuilder markers, this is not a good practice because it almost impossible to change defaults later on. Having default set programmatically by the controller allow future changes.
Not sure about that, is there a document somewhere ? I understand we can update the defaults we configre on api
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/underlay_types.go#L40
- Flag names format is inconsistent
Some flag names are camelCase, some dash-case, others are not formatted at all (e.g.:nodename)
Make sense adding a section for it
https://github.com/openperouter/openperouter/blob/main/cmd/hostcontroller/main.go#L109-L135
All flags should follow the same format. FWIK I know the common convention is dash-case.
- JSON tags format is inconsistent
Some API types field's JSON tags is inconsistent, some tags are camelCases, other are not formatted at all:
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/neighbor.go#L121
-> minimumTTL
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/hostsession.go#L36
-> localCIDR
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/hostsession.go#L22
-> hostASN
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/hostsession.go#L28
-> hostType
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/underlay_types.go#L40
-> routerIdCIDR or routerIDCIDR
https://github.com/openperouter/openperouter/blob/main/api/v1alpha1/underlay_types.go#L62
-> vtepCIDR
There was a problem hiding this comment.
Not sure about that, is there a document somewhere ? I understand we can update the defaults we configre on api
Having the controller to set defaults enable implementing backward compatibility strategies in case of breaking changes, this is almost impossible when done via CRD defaulting.
Sure there are places that fit CRD defaults, I would avoid them when possible.
https://github.com/kubernetes/community/blob/main/contributors/devel/sig-architecture/api-conventions.md#defaulting
There was a problem hiding this comment.
@ormergi I don't understand, the convention is exactly about doing the opposite: being explicit at defaults on the API and skip implicit (at code) defaults.
The concern with CRD defaults (+kubebuilder:default=) is that once an object is created the default value is persisted in etcd, so you can't distinguish "user set this" from "was defaulted". If you later change the default, existing objects keep the old value and new objects get the new one.
With controller-side defaults the field stays nil in etcd, so you can change the default for all objects at once. But then the user does not see the effective value on the stored object, which is what the convention tries to avoid.
Since we are still in v1alpha1 and doing breaking API changes, CRD defaults are fine here — users are expected to recreate their resources anyway.
If needed we can use a conversion webhook
maiqueb
left a comment
There was a problem hiding this comment.
Thanks for this PR.
It will be a very nice improvement over the current codebase.
69481c5 to
bead0b8
Compare
| | `L2VNISpec.VXLanPort` | `vxlanport` | `vxlanPort` | | ||
| | `L3VNISpec.VXLanPort` | `vxlanport` | `vxlanPort` | | ||
| | `L3PassthroughSpec.HostSession` | `hostsession` | `hostSession` | | ||
| | `UnderlaySpec.RouterIDCIDR` | `routeridcidr` | `routerIDCIDR` | |
There was a problem hiding this comment.
I am not extremely fond of routerIDCIDR ... Those all caps blurry the acronyms a bit (IMHO).
Maybe routerIDGenerator ? It happens to be a CIDR, but that's a bit an implementation detail, isn't it ?
I could be wrong though.
|
Will parrot something @andreaskaris said when I spoke to him on the srv6 context: maybe we should also take the opportunity of renaming the I think with that, we'd make it crystal clear (in the API) that in EVPN the control plane protocol is always BGP. i.e. make the |
Signed-off-by: Enrique Llorente <ellorent@redhat.com> Co-Authored-By: Claude <noreply@anthropic.com>
|
Hey folks. I yet have to read through this design proposal, but can be make this API version |
Wonderful suggestion. @qinqon did we ever speak about beta vs alpha ? Do you think we could be ready for a beta API after implementing this enhancement ? |
|
I'd also "suggest" renaming (I really haven't thought this through though, so it's just a rough idea): TL;DR: we need API fields to setup an IPv4 and IPv6 address on the loopback, OR to point to an interface that holds such info, regardless of the tunnel setup. The IPv4 and IPv6 address can then be shared as a tunnel source between protocols. EVPNConfig.VTEPCIDR / VTEPInterface to TunnelConfig.LoopbackCIDR / TunnelConfig.TunnelInterface, or something like this. The LoopbackCIDR needs 2 IP address families, IPv4 (for EVPN currently) , and IPv6 (for SRv6 and for EVPN IPv6 support). The rename is because EVPNConfig really doesn't hold anything exclusively relevant to EVPN, but Tunnel information which is shared by SRv6 or EVPN setups. Or perhaps even do something like: |
I think we are talking about v1alpha1 to v1alpha2 so still at alpha |
@andreaskaris so this means this enhancment should depend on yours right ? so I have to aggregate srv6 changes into this one ? Since we don't need to support upgrades maybe it's not needed ? but maybe what we need to know is if we want to implement first srv6 and then the api enhancement or not. |
I know we are alpha. My question is - in your opinion - if implementing this enhancement can get us to beta. |
I vote for decoupling them. Which is possible if you work on v1alpha2 and @andreaskaris works on v1alpha1. I do think we can learn from the srv6 requirement, and use those API requirements as inputs as well though. |
Make sense to me, still those it means that the implementation of the API should depend on implementation of SRv6 ? |
Nope. At most, let's get the proper structs in place so srv6 can land without having to re-design anything. Or, that's my intention. |
+1
+1 |
|
|
||
| #### Routing Domain / VRF Semantics on L2VNI | ||
|
|
||
| **Problem:** `L2VNISpec.VRF` defaults to `metadata.name` when unset, conflating |
There was a problem hiding this comment.
this is a leftover we should just drop.
| // +optional | ||
| // +kubebuilder:validation:MaxItems=2 | ||
| // +kubebuilder:validation:XValidation:rule="self == oldSelf",message="gatewayIPs cannot be changed" | ||
| GatewayIPs []string `json:"gatewayIPs,omitempty"` |
There was a problem hiding this comment.
even if used only when routing domain is set, imo this doesn't have much to do with routing domain. Validation is enough
|
|
||
| **Problem:** Several API types and comments use non-inclusive language: | ||
|
|
||
| - `HostMaster` / `hostmaster` uses "master" naming and is not |
There was a problem hiding this comment.
this is kernel terminology, I would not remove it.
| // The VRF configuration (name, VNI, route targets) is owned entirely | ||
| // by the referenced L3VNI -- the L2VNI does not duplicate it. | ||
| // +required | ||
| L3VNI string `json:"l3vni,omitempty"` |
There was a problem hiding this comment.
we must allow additional routing domains of different types (upcoming srv6 for example)
| #### Dual-Stack Friendly API Shape | ||
|
|
||
| **Problem:** Split `IPv4`/`IPv6` fields lock out future dual-stack support as a | ||
| breaking change. Both Kubernetes (e.g. `ClusterIPs`, `PodCIDRs`) and OpenShift |
There was a problem hiding this comment.
we shouldn't mention openshift here imo
| // one IPv6). The first element is the primary address family. | ||
| // +kubebuilder:validation:MinItems=1 | ||
| // +kubebuilder:validation:MaxItems=2 | ||
| // +kubebuilder:validation:XValidation:rule="self.all(x, isCIDR(x)) && (size(self) == 2 ? cidr(self[0]).ip().family() != cidr(self[1]).ip().family() : true)",message="localCIDRs must contain valid CIDRs and at most one of each family" |
|
|
||
| **Affected fields:** | ||
|
|
||
| **`LocalCIDRConfig` -- breaking change.** Replace the separate `IPv4`/`IPv6` |
There was a problem hiding this comment.
tbh I don't understand how the current config prevents dual stack unless a new ip family is introduced
| } | ||
| ``` | ||
|
|
||
| **`VTEPCIDR` -- no change.** Single-family in practice; can add `VTEPCIDRs` |
There was a problem hiding this comment.
I would actually do it enforcing ipv4 for now
|
|
||
| **Problem:** `Neighbor.Password` stores the BGP password as plaintext in etcd. | ||
|
|
||
| **Proposal:** Remove both `Password` and `PasswordSecret`. Replace with a |
There was a problem hiding this comment.
how is the change of passwordsecret related to the problem?
I think password is the conventional key for passwords
There was a problem hiding this comment.
.. but I don't have strong opinions either
| **Problem:** `VNI` and `ASN` lack immutability enforcement (unlike | ||
| `L2GatewayIPs` and `LocalCIDR`). | ||
|
|
||
| **Proposal:** Add CEL immutability rules: |
There was a problem hiding this comment.
I don't care about backward compatibility for now, but I would not be super strict in immutability.
Users must understand how things work, and we should allow changing things where it's possible unless there's a technical constraint to that.
So, I'd like to be able to amend the neighbor ip or the asn without having to delete and recreate, it's even more controller friendly.
|
|
||
| #### Replace `nics []string` with `interfaces` | ||
|
|
||
| **Problem:** Underlay connectivity is split between `UnderlaySpec.Nics []string` |
There was a problem hiding this comment.
Over the weekend, I thought about the EVPN.VtepInterface, and I think it is a redundant configuration, too. The same as nics, it moves an interface into the OpenPERouter namespace. There's just an additional semantic: take the IP addresses from the VtepInterface and use them as a tunnel source.
I think the VtepInterface is redundant - instead, the interface should be part of Interfaces, and given that you already have a type there, that could be used as type "tunnelsource" or something ... or there's a completely different means to select the tunnel source, which I'd actually prefer
| // EBGPMultiHop: the peer is multiple hops away (RFC 4271). | ||
| // +optional | ||
| // +listType=set | ||
| Features []NeighborFeature `json:"features,omitempty"` |
There was a problem hiding this comment.
I'm not a big fan of this. It introduces another layer for no gain, and it deviates from FRR / Cisco syntax when it comes to neighbor configuration. We should make the admin's life easier and put the toggles where they expect them to occur, which is right in the neighbor struct.
I do think that be should have this layout:
type BGP struct {
Neighbor {
everything neighbor related
}
globalBGPsettingA ..
globalBGPsettingB..
}
In short, it should reflect the layout of a BGP router section in FRR
Is this a BUG FIX or a FEATURE ?:
/kind documentation
What this PR does / why we need it:
Add enhancement documentation to improve the API
Release note: