Add ability to configure affinity and tolerations for scans and hooks#806
Add ability to configure affinity and tolerations for scans and hooks#806
Conversation
Signed-off-by: Max Maass <max.maass@iteratec.com>
Signed-off-by: Max Maass <max.maass@iteratec.com>
Signed-off-by: GitHub Actions <securecodebox@iteratec.com>
Signed-off-by: Max Maass <max.maass@iteratec.com>
Signed-off-by: Max Maass <max.maass@iteratec.com>
|
Tolerations work, and are also pulled into the parser job from the scanner job now. I think that this is useful because of the following reasons:
Anyway, this is my pitch why it should be done like this. Tolerations work (I have tested them). Affinity is somehow borked right now, it seems like it is not copied over to the job template for some reason, and I don't know why. I would appreciate if someone with more experience with Go and the k8s APIs (@J12934?) could take a look at the diff and check if I am doing something obvious wrong. How to testHere's a quick info on how I tested this feature. Affinity (does not work)My testing setup is: First give a node a label that we can select on: kubectl label nodes kind-control-plane disktype=ssdThen, using the operator and CRDs from this branch, schedule the following job: apiVersion: "execution.securecodebox.io/v1"
kind: Scan
metadata:
name: "nmap-example"
spec:
scanType: "nmap"
parameters:
- "scanme.nmap.org"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssdThen, check the YAML of the created scan job using k9s or kubectl and observe that the affinity is not reflected in the template. I am looking for support in figuring out why this is the case. Definition in the CRD: secureCodeBox/operator/apis/execution/v1/scan_types.go Lines 81 to 84 in 76e221b Copying it in the reconciler: secureCodeBox/operator/controllers/execution/scans/scan_reconciler.go Lines 353 to 360 in 76e221b Tolerations (do work)Tolerations are reflected in the YAML and respected by the cluster when using them. To test, first give your node a taint: kubectl taint node kind-control-plane taint=test:NoScheduleNow, no jobs will be scheduled on it by default (make sure this does not interfere with your operator by running it outside the cluster). Then, schedule the following job: apiVersion: "execution.securecodebox.io/v1"
kind: Scan
metadata:
name: "nmap-test"
spec:
scanType: "nmap"
parameters:
- "scanme.nmap.org"Observe that it will never be scheduled (the pod will hang in "pending"). Delete the scan and then change the scan definition as follows: apiVersion: "execution.securecodebox.io/v1"
kind: Scan
metadata:
name: "nmap-test"
spec:
scanType: "nmap"
parameters:
- "scanme.nmap.org"
tolerations:
- key: "taint"
value: "test"
effect: "NoSchedule"Schedule it and observe that it will get scheduled, including the parser (thanks to the above-mentioned change of copying the tolerations from the scan). They will also be reflected in the YAML version of the job if you inspect it using k9s or kubectl. To get rid of the taint after you are done testing, run: kubectl taint node kind-control-plane taint=test:NoSchedule- |
|
I'm still not sure if the handling currently is intuitive. At the moment I'm understanding that it is configured for each job like the following:
Is that accurate? Shouldn't the hook job also inherit the Scans Affinity & Tolerations? Why can the Affinity & Toleration be configured via the ScanCompletionHook Custom Resource but not via the ParseDefinition CR? |
|
At the moment, you table is accurate. The rationale for not including affinity and tolerations as part of the ParseDefinition follows from the behavior of inheriting it from the scan job: If we inherit it from the scan job (which is the more specific source of information of what is desired for a specific scan), it would pretty much always replace whatever is defined in the ParseDefinition, and it may lead to unintuitive behavior if we merge instead of replacing (affinity cannot easily be merged, tolerations can, but merging would potentially lead to strange behavior as well). Is there a use case where you would want your scans to run with a specific affinity / toleration, and your parsers to run with a different one? I'm usually in favor of not implementing a feature simply for feature parity with a different function if I don't see a real use case for the feature, but do see how it would make the implementation more complex and, in the worse case, error-prone. (Hence, parsers currently also do not support initContainers, and neither do ScanCompletionHooks). Inheriting affinity and tolerations from the scan job to the hook seems like an interesting idea. I would have to dive into the code and see how we can do this, but I definitely see how this would be useful. Do you have any idea why the affinity is not copied over properly from the scan to the job in the reconcilers, i.e., am I doing something obvious wrong in the snippets I posted above? |
|
What would feel intuitive to me would be the following:
Basically only configurable via the Scan CR. This makes this consistent across all of the job types.
🤔 Mh not these look good to me. Can you make sure that you are only running the changed operator for the cluster and don't have a second operator connected to it which doesn't have the new feature? |
|
I think your proposed table looks very sensible. I'll have to check how to have the hooks inherit the settings from the scans, but assuming this is possible, it seems like a very reasonable way to configure this. In that case, I would remove affinity and tolerations from the ScanCompletionHook CRD and have it only grab its config from the scan jobs.
I am certain that this was the case, and everything worked fine for the tolerations, consistently over something like 5-10 test runs, while it never worked for affinity in 5-10 test runs, so the error should be in the operator, not the deployment. Maybe we can take a look in a pair whenever it is convenient for you. |
|
So, it turns out that the issue was that the tolerations are shown in |
Signed-off-by: Max Maass <max.maass@iteratec.com>
Signed-off-by: Max Maass <max.maass@iteratec.com>
Signed-off-by: Max Maass <max.maass@iteratec.com>
Default to true because this is the most sensible behavior and consistent with how we do it for parsers and hooks. Signed-off-by: Max Maass <max.maass@iteratec.com>
|
I have added Concerning changes to the values: Affinity cannot be merged, it can only be replaced in cascading hooks, as it is a deeply nested dictionary, which the current merger logic cannot handle. Tolerations, being a list, are merged using the standard list rules. |
|
This is ready for review on the functional side, I still have to update a few pieces of documentation. How to testPreconditions:
First, set up a cascading scan rule (it's not a very useful rule, its just there to show that inheritance on cascading scans works): apiVersion: "cascading.securecodebox.io/v1"
kind: CascadingRule
metadata:
name: "nmap-hostscan"
labels:
securecodebox.io/invasive: non-invasive
securecodebox.io/intensive: light
spec:
matches:
anyOf:
- category: "Host"
osi_layer: "NETWORK"
scanSpec:
scanType: "nmap"
parameters:
- "-Pn"
- "{{location}}"Then, trigger an nmap scan: apiVersion: "execution.securecodebox.io/v1"
kind: Scan
metadata:
name: "nmap-localhost"
spec:
scanType: "nmap"
parameters:
- "127.0.0.1"
# Demo affinity
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
# Demo tolerations
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
cascades:
matchLabels:
securecodebox.io/intensive: lightThis will not run on your machine unless you have already set up the "disktype=ssd" label for your kubernetes node ( Schedule the scan above, wait a bit, then run You can now go through all of these jobs and check that the tolerations and affinity are present: $ kubectl get job scan-nmap-localhost-95vmf -o yaml | yq e .spec.template.spec.affinity -
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
$ kubectl get job scan-nmap-localhost-95vmf -o yaml | yq e .spec.template.spec.tolerations -
- effect: NoSchedule
key: key1
operator: Equal
value: value1Repeat for all the jobs to check them. Further things to check for:
Edit: Reading through this again, I guess the answer is quite obviously "yes, we want to support it using the YAML files", and I'm just trying to avoid it because it's a lot of annoying work to implement and test it 😬. So, I guess I'll take care of this this afternoon. |
This commit makes the operator respect affinity and tolerations that are specified in ScanSpec and ParseDefinitions. The default values are used if the scan does override it, and otherwise discarded (not merged). Default values are not inherited like explicitly specified values are, and you need to make sure to specify affinity and tolerations for both scanner and parser (they are not copied over, either). Hooks will get the same treatment in the next commit. Signed-off-by: Max Maass <max.maass@iteratec.com>
Signed-off-by: Max Maass <max.maass@iteratec.com>
|
I just pushed support for loading the affinity and tolerations from helm values. At the moment, I have not yet touched all the values.yaml files as well as the ScanSpec, HookSpec, etc. files, to keep the review easier. If you want to test it, change your nmap ScanSpec to this: # SPDX-FileCopyrightText: 2021 iteratec GmbH
#
# SPDX-License-Identifier: Apache-2.0
apiVersion: "execution.securecodebox.io/v1"
kind: ScanType
metadata:
name: "nmap{{ .Values.scanner.nameAppend | default ""}}"
spec:
extractResults:
type: nmap-xml
location: "/home/securecodebox/nmap-results.xml"
jobTemplate:
spec:
{{- if .Values.scanner.ttlSecondsAfterFinished }}
ttlSecondsAfterFinished: {{ .Values.scanner.ttlSecondsAfterFinished }}
{{- end }}
backoffLimit: {{ .Values.scanner.backoffLimit }}
{{- if .Values.scanner.activeDeadlineSeconds }}
activeDeadlineSeconds: {{ .Values.scanner.activeDeadlineSeconds }}
{{- end }}
template:
spec:
restartPolicy: OnFailure
affinity:
{{- toYaml .Values.scanner.affinity | nindent 12 }}
tolerations:
{{- toYaml .Values.scanner.tolerations | nindent 12 }}
containers:
- name: nmap
image: "{{ .Values.scanner.image.repository }}:{{ .Values.scanner.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.scanner.image.pullPolicy }}
command:
- "nmap"
- "-oX"
- "/home/securecodebox/nmap-results.xml"
resources:
{{- toYaml .Values.scanner.resources | nindent 16 }}
securityContext:
{{- toYaml .Values.scanner.securityContext | nindent 16 }}
env:
{{- toYaml .Values.scanner.env | nindent 16 }}
volumeMounts:
{{- toYaml .Values.scanner.extraVolumeMounts | nindent 16 }}
{{- if .Values.scanner.extraContainers }}
{{- toYaml .Values.scanner.extraContainers | nindent 12 }}
{{- end }}
volumes:
{{- toYaml .Values.scanner.extraVolumes | nindent 12 }}And the nmap parse definition to this: # SPDX-FileCopyrightText: 2021 iteratec GmbH
#
# SPDX-License-Identifier: Apache-2.0
apiVersion: "execution.securecodebox.io/v1"
kind: ParseDefinition
metadata:
name: "nmap-xml"
spec:
image: "{{ .Values.parser.image.repository }}:{{ .Values.parser.image.tag | default .Chart.Version }}"
imagePullPolicy: {{ .Values.parser.image.pullPolicy }}
ttlSecondsAfterFinished: {{ .Values.parser.ttlSecondsAfterFinished }}
env:
{{- toYaml .Values.parser.env | nindent 4 }}
affinity:
{{- toYaml .Values.parser.affinity | nindent 4 }}
tolerations:
{{- toYaml .Values.parser.tolerations | nindent 4 }}You can then set affinity and tolerations for scaner and parser by writing a file like this and passing it to the helm install command using the scanner:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
parser:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"Of course, this does not work if you install nmap from the official sources, you are going to have to do a local install from source for that. If memory serves, the following are the commands required for that: cd parser-sdk/nodejs/
make docker-build
cd ../../scanners/nmap
make docker-build
make docker-export
make kind-import
rm *.tarThen run the helm upgrade command like this, assuming your values are set in a file called helm upgrade --install nmap ./ --wait --values values2.yaml \
--set="parser.image.repository=docker.io/securecodebox/parser-nmap" \
--set="parser.image.tag="sha-$(git rev-parse --short HEAD)"" \
--set="scanner.image.repository=docker.io/securecodebox/scanner-nmap" \
--set="scanner.image.tag="sha-$(git rev-parse --short HEAD)""To test the same for the hooks, change your cascading-scan-hook.yaml to this: # SPDX-FileCopyrightText: 2021 iteratec GmbH
#
# SPDX-License-Identifier: Apache-2.0
apiVersion: "execution.securecodebox.io/v1"
kind: ScanCompletionHook
metadata:
name: {{ include "cascading-scans.fullname" . }}
labels:
{{- include "cascading-scans.labels" . | nindent 4 }}
securecodebox.io/internal: "true"
{{- with .Values.hook.labels }}
{{ toYaml . }}
{{- end }}
spec:
priority: {{ .Values.hook.priority }}
type: ReadOnly
image: "{{ .Values.hook.image.repository }}:{{ .Values.hook.image.tag | default .Chart.Version }}"
imagePullSecrets:
- name: "securecodebox"
ttlSecondsAfterFinished: {{ .Values.hook.ttlSecondsAfterFinished }}
serviceAccountName: cascading-scans
affinity:
{{- toYaml .Values.hook.affinity | nindent 4 }}
tolerations:
{{- toYaml .Values.hook.tolerations | nindent 4 }}And use the following format for the values.yaml: hook:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"You'll also have to install the hook manually, but the instructions are almost the same as above - you just need to build the hook-sdk instead of the parser-sdk. Gotchas:
@J12934 requested that I keep the changes to the YAML files separate to make reviewing easier. So, once this PR is merged, I will create a new PR that adds the changes to all the YAML files, and also takes care of #820, which requires changes to the same files. |
Signed-off-by: Max Maass <max.maass@iteratec.com>
Signed-off-by: Max Maass <max.maass@iteratec.com>
|
As requested by @Weltraumschaf, I have added an ADR that describes the architecture considerations of this change, and how the behavior in this PR differs from the standard behavior in other places. I have also verified that the inheritAffinity / inheritTolerations flags work as intended. So, I now consider this ready to review. |
|
I created a companion PR with documentation updates: secureCodeBox/documentation#199 I am currently unsure if more documentation is needed, since this is a fairly niche feature, so it probably does not warrant its own howto entry. Since the updates to the values.yaml etc. are going to be done in a separate PR, I will also keep the required updates of the "how to build a scanner / hook" sections in a separate documentation PR. |
|
bj j'ai un probleme avec nmap |
|
@porcinnealabatoir |
Description
This PR resolves #803 by adding affinity and tolerations to the ScanSpec and ScanCompletionHookSpec. Supercedes #804 to avoid having an ugly history.
Currently in draft state. Tasks:
ScanSpecandScanCompletionHookSpecCRDs and re-generate them