-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(coredns): use txt-owner-id to strictly separated external-dns instances #5921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Jan Jansen <jan.jansen@gdata.de>
|
Hi @farodin91. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@ivankatliarchuk I moved to behavior inside the provider only. It would possible to built this with a etcd proxy, that would require much more work with only a little gain. Going for advanced multi cluster solution is a different step. |
|
What is missing with this PR. It does not describe well enough the problem, either a configuration that we could execute locally and test it. I'd assume coredns may require a specific configuration as well. Records managed by external-dns, not by coredns. For external dns we already have an owner, and adding flag As well as flag We don't have any sort of specific roadmap or product vision around multi-cluster .External-dns currently don't handle multi cluster, due to complexity, but not strictly against that. This sounds more like has to be resolved with cluster mesh or coredns multicluster plugin. Due to nature, as external-dns writes data to external etcd, is not as difficult to add multi cluster support. For other reviewers, the PR adds a field to records in etcd. Something like Kubernetes manifests ---
apiVersion: v1
kind: Service
metadata:
name: a
annotations:
external-dns.alpha.kubernetes.io/hostname: a.example.org
external-dns.alpha.kubernetes.io/coredns-group: "g1"
cluster-name: "cluster1"
namespace: default
spec:
type: LoadBalancer
ports:
- port: 80
name: http
targetPort: 80
selector:
app: test-app
---
apiVersion: v1
kind: Service
metadata:
name: a-cluster2
annotations:
external-dns.alpha.kubernetes.io/hostname: a.example.org
external-dns.alpha.kubernetes.io/coredns-group: "g1"
cluster-name: "cluster2"
namespace: default
spec:
type: LoadBalancer
ports:
- port: 80
name: http
targetPort: 80
selector:
app: test-appbefore, the second service will not get created, as there is a clash - same host, same values. We could swap the group from with the change |
…stances Signed-off-by: Jan Jansen <jan.jansen@gdata.de>
|
@ivankatliarchuk I added another Commit use txt-owner-id instead. Do you think this way is better? |
|
/ok-to-test Have you tried in your environment? Could you share similar results for this PR #5085 (comment). Need to make sure it works before we merge, I'll try to smoke test right after as well. |
|
@ivankatliarchuk I've tried the setup in our staging env with managed by variant. As we had issues without such a filter. The test should be fixed with my next push. |
Pull Request Test Coverage Report for Build 19037565486Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
|
@ivankatliarchuk tests are fixed. |
provider/coredns/coredns.go
Outdated
| for _, n := range r.Kvs { | ||
| svc := new(Service) | ||
| if err := json.Unmarshal(n.Value, svc); err != nil { | ||
| return false, fmt.Errorf("%s: %w", n.Key, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt.Errorf("failed to unmarshal %q: %w", n.Key, err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unified and changed.
provider/coredns/coredns.go
Outdated
| case r == nil: | ||
| return true, nil | ||
| case len(r.Kvs) > 1: | ||
| return false, fmt.Errorf("found multiple keys with the same key this service") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please try to rephrase, I don't really understand the error message. I can guess but you can make this better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rephrased
|
/lgtm |
Signed-off-by: Jan Jansen <jan.jansen@gdata.de>
|
New changes are detected. LGTM label has been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to follow changes.
We may need a flow diagram of something else. Hard to follow code, especially with group addition in one of previous PRs.
What else is not clear
- external-dns first does Records(....) where we filter out endpoints not owned by this instance
- Question: why would we need to validate ownership in SaveService(..) and DeleteService(...), this looks like a redundant call to me
Another question, what the behaviour is like
- we have records without owner
- we adding owner tag, will it recreate/update or what is about to happen
provider/coredns/coredns.go
Outdated
| return err | ||
| } | ||
|
|
||
| func (c etcdClient) isOwnedBy(ctx context.Context, key string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to review that
| svc, err := c.unmarshalService(n) | ||
| if err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | |
| } | |
| if c.strictlyOwned && svc.OwnedBy != c.ownerID { | |
| continue | |
| } |
You could just do Early ownership check: skip records not owned by this instance when strictlyOwned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so most likely lines
if c.strictlyOwned {
b.OwnedBy = svc.OwnedBy
}and
if c.strictlyOwned && b.OwnedBy != c.ownerID {
continue
}no longer required
provider/coredns/coredns.go
Outdated
| if ownedBy, err := c.isOwnedBy(ctx, service.Key); err != nil { | ||
| return err | ||
| } else if !ownedBy { | ||
| return fmt.Errorf("key %q is not owned by this service", service.Key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we trowing an error? This should be just ignore and continue with next?
provider/coredns/coredns.go
Outdated
| return true, nil | ||
| } | ||
|
|
||
| r, err := c.client.Get(ctx, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| r, err := c.client.Get(ctx, key) | |
| r, err := c.client.Get(ctx, key, etcdcv3.WithLimit(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? If you are not using prefixkeys, it should always return a single entry.
provider/coredns/coredns.go
Outdated
| } | ||
|
|
||
| r, err := c.client.Get(ctx, key) | ||
| switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look like go code. Let's not have switch, especially when there is error handling
if err != nil {
return false, fmt.Errorf("etcd get %q: %w", key, err)
}
if r == nil || len(r.Kvs) == 0 {
// Key missing -> treat as owned (safe to create)
return true, nil
}
provider/coredns/coredns.go
Outdated
| return true, nil | ||
| } | ||
| } | ||
| return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return false, nil | |
| svc, err := c.unmarshalService(r.Kvs[0]) | |
| if err != nil { | |
| return false, fmt.Errorf("failed to unmarshal value for key %q: %w", key, err) | |
| } | |
| return svc.OwnedBy == c.ownerID, nil |
As we passing full key, not sure how is possible to return 2 results
provider/coredns/coredns.go
Outdated
| if owned, err := c.isOwnedBy(ctx, key); err != nil { | ||
| return err | ||
| } else if !owned { | ||
| return fmt.Errorf("key %q is not owned by this service", key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, why we throwing an error? Is this not just a skip?
provider/coredns/coredns.go
Outdated
| ctx, cancel := context.WithTimeout(ctx, etcdTimeout) | ||
| defer cancel() | ||
|
|
||
| if ownedBy, err := c.isOwnedBy(ctx, service.Key); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we even need this check? Is there is an endpoint to create, we could create it and attach an owner. If there is same key but different owner, why we would even care?
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I have added a flow graph to the documentation.
I reduced the number of redundant calls; however, some cases are still broken as the DeleteService function is a prefix deletion. It could delete records from a different provider. I'll fix the DeleteService now. I didn't want to ignore the fact that one provider was trying to override another, so I sent an error message. Currently, this error can only be sent when storing if the random string generator creates the same string multiple times. If we relax the behaviour of taking over records without an owner, it could be a good idea to prevent overrides, as multiple providers could try to take over a single record.
I have added a part to the documentation. Following your comment, I have also removed the behaviour of taking over records without an owner.
|
What does it do ?
It adds a field to the service value in etcd to keep track of which service is provided by which coredns.
Motivation
Replacement of #5860. This keeps the behavior inside of the coredns provider.
More