Skip to content

ROX-25797: Central Scanner v4 communication#12448

Merged
Maddosaurus merged 19 commits intomasterfrom
mm/rox-25797-enrich
Oct 9, 2024
Merged

ROX-25797: Central Scanner v4 communication#12448
Maddosaurus merged 19 commits intomasterfrom
mm/rox-25797-enrich

Conversation

@Maddosaurus
Copy link
Copy Markdown
Contributor

@Maddosaurus Maddosaurus commented Aug 19, 2024

Description

This PR adds part of the Node Indexing on Scanner v4 feature.
The pipeline is not fully implemented, as a conversion function is missing. This will be introduced in a follow up PR (see TODO(ROX-26089) comments in the code and #12798).

The most important changes summarized:

  • nodes/enricher.go now also creates Scanner v4 NodeScanners
  • nodes/enricher_impl.go's EnrichNodeWithInventory was enhanced with a third parameter, IndexReports

The goal is to ensure that an incoming IndexReport that was generated by a Node will be matched by Scanner v4's Matcher.
The resulting VulnerabilityReport will be converted into a storage.NodeScan and upserted to storage.Node in the aforementioned follow-up PR.

User-facing documentation

  • CHANGELOG is updated OR update is not needed
  • documentation PR is created and is linked above OR is not needed

Testing and quality

  • the change is production ready: the change is GA or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • Create an OS4 Cluster on Infra making sure to set Ensure an SSD StorageClass is the default StorageClass for the cluster to true
  • Deploy this image with ROX_SCANNER_V4=true
  • Wait for Matcher to finish initial sync (check logs of both matchers)
  • Set ROX_NODE_INDEX_ENABLED=true for the Collector DaemonSet
  • Recommendation: Lower Node Scan intervals by setting on the Collector DaemonSet
    • ROX_NODE_SCANNING_INTERVAL=45s and ROX_NODE_SCANNING_MAX_INITIAL_WAIT=10s
  • Set debug loglevel for Central
  • Observe the following log messages on Central:
    • received node index report with %d packages from %d content sets for node %s (from pipeline.go)
    • Received Vulnerability Report with %d vulnerabilities: %v (from scannerv4.go)

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Aug 19, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Aug 20, 2024

Images are ready for the commit at 0626984.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.6.x-679-g0626984156.

@Maddosaurus Maddosaurus force-pushed the mm/rox-24117-nindexv4-acs branch 2 times, most recently from 63de2e2 to ab62d13 Compare August 22, 2024 14:09
Base automatically changed from mm/rox-24117-nindexv4-acs to master August 26, 2024 12:15
An error occurred while trying to automatically change base from mm/rox-24117-nindexv4-acs to master August 26, 2024 12:15
@Maddosaurus Maddosaurus mentioned this pull request Sep 4, 2024
9 tasks
@Maddosaurus Maddosaurus force-pushed the mm/rox-25797-enrich branch 2 times, most recently from c689a03 to b30f011 Compare September 26, 2024 10:32
@Maddosaurus Maddosaurus requested a review from a team September 27, 2024 09:42
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 27, 2024

Codecov Report

Attention: Patch coverage is 26.77596% with 134 lines in your changes missing coverage. Please review.

Project coverage is 48.15%. Comparing base (dd945a5) to head (0626984).
Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
pkg/scanners/scannerv4/scannerv4.go 0.00% 73 Missing ⚠️
compliance/collection/compliance/compliance.go 0.00% 31 Missing ⚠️
pkg/nodes/enricher/enricher.go 0.00% 10 Missing ⚠️
pkg/nodes/enricher/enricher_impl.go 75.00% 3 Missing and 4 partials ⚠️
central/enrichment/integration.go 73.91% 6 Missing ⚠️
pkg/scanners/clairify/clairify.go 0.00% 6 Missing ⚠️
central/enrichment/singleton.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12448      +/-   ##
==========================================
- Coverage   48.20%   48.15%   -0.06%     
==========================================
  Files        2442     2446       +4     
  Lines      175662   176067     +405     
==========================================
+ Hits        84671    84778     +107     
- Misses      84170    84462     +292     
- Partials     6821     6827       +6     
Flag Coverage Δ
go-unit-tests 48.15% <26.77%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Maddosaurus Maddosaurus marked this pull request as ready for review September 27, 2024 11:01
@Maddosaurus Maddosaurus requested review from a team as code owners September 27, 2024 11:01
@vikin91
Copy link
Copy Markdown
Contributor

vikin91 commented Sep 27, 2024

making sure to set Ensure an SSD StorageClass is the default StorageClass for the cluster to true

Why is this important?

Copy link
Copy Markdown
Contributor

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good with no major blockers. I still mark it ❌ as there are many tiny issues that must be looked at before we can merge this.

@Maddosaurus
Copy link
Copy Markdown
Contributor Author

making sure to set Ensure an SSD StorageClass is the default StorageClass for the cluster to true

Why is this important?

This is important because the Matcher intialization with VEX data is currently not well optimized.
We have observed timeouts when not running on SSD-backed storage.
This leads to the Matcher never becoming ready, which leads to no VulnerabilityReports being generated.

@vikin91
Copy link
Copy Markdown
Contributor

vikin91 commented Sep 30, 2024

I think you might have forgotten to push few things. I unresolved some comments as the code does not contain the fixes for them. Ping me again when you check what has happened

Copy link
Copy Markdown
Contributor

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better! I left few comments, but they are not blocking.
The two open things are:

  • renaming of interfaces
  • making sure the metrics for v4 are collected and the scanner version is indicated in the metrics.

I am fine if those two topics are addressed in a followup.

Please make the CI happy and test that in a deployed cluster. After that I can give it a final look and mark it ✅

Copy link
Copy Markdown
Contributor

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and can be merged if CI is happy. Good job!

Maddosaurus and others added 17 commits October 7, 2024 09:48
# Conflicts:
#	pkg/scannerv4/mappers/mappers.go
#	scanner/enricher/fixedby/fixedby.go
#	scanner/enricher/nvd/nvd.go
#	scanner/enricher/nvd/nvd_test.go
#	scanner/updater/manual/manual.go
Move to latest master claircore
Tidy dependencies
Switch to uber gomock
Fix import cycle, add key to sensor deduper
Make Scanner v4 known to NewWithCreator
Set up default Scannerv4 to be Node Scanner
Expand GetNodeInventoryScan to handle v4 IndexReports
# Conflicts:
#	generated/storage/node_integration_vtproto.pb.go
- Remove duplication between log messages and error wraps
- Document static layer SHA
- Remove TODO and keep shared max concurrent config for Node Index and Image Scans
Improve conditional readability

Co-authored-by: Piotr Rygielski <114479+vikin91@users.noreply.github.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Oct 9, 2024

@Maddosaurus: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-sensor-integration-tests 0626984 link false /test gke-sensor-integration-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@Maddosaurus Maddosaurus merged commit f45adfb into master Oct 9, 2024
@Maddosaurus Maddosaurus deleted the mm/rox-25797-enrich branch October 9, 2024 11:40
cr := report.CloneVT()

// Read the node from the database, if not found we fail.
node, found, err := p.nodeDatastore.GetNode(ctx, event.GetId())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the node's ID is the same as the event ID? Is that allowed? Is it possible to set the Index Report's "HashID" to the node's ID instead?

}

// Send the Node and Index Report to Scanner for enrichment
err = p.enricher.EnrichNodeWithInventory(node, nil, cr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for a followup: I'm thinking a separate function. We seem to replace Inventory with Index, so I'm thinking we should have a separate Index function, too. If underneath it stays the same, that's ok, as at least the interface would look a bit cleaner


func NewNodeMatchSemaphoreWithValue(val int64) NodeMatchSemaphore {
return &nodeMatchSemaphoreImpl{
NewSemaphoreWithValue(val),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: explicitly set ScanSemaphore: NewSemaphoreWithValue(val), so it's clear what this is doing

// The Scanner endpoint requires a digest for each image layer before analyzing it - TODO(ROX-25614)
// As the Node contents are treated as one big image layer, they also need a bogus digest.
// This digest is taken from the test of the digest library we're using (go-containerregistry).
const mockDigest = "registry/repository@sha256:deadb33fdeadb33fdeadb33fdeadb33fdeadb33fdeadb33fdeadb33fdeadb33f"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unfortunate this is required, but I get why you did it. If not handled differently in the future, can we at least update this to make it clear this is for nodes?

Comment on lines +226 to +229
nodeDigest, err := name.NewDigest(mockDigest)
if err != nil {
log.Errorf("Failed to parse digest from node %q: %v", node.GetName(), err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is hardcoded, this can be put into an init() function or perhaps we ignore the error and just do this in a var block. No need to waste cycles recreating this every time this is called

var (
_ scannerTypes.Scanner = (*clairify)(nil)
_ scannerTypes.ImageVulnerabilityGetter = (*clairify)(nil)
_ scannerTypes.NodeScanner = (*clairify)(nil)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this 😃. I'm a fan of this, as it makes it clear (and will give us compilation errors once something deviates)

@@ -26,13 +28,20 @@ type CVESuppressor interface {

// New returns a new NodeEnricher for the given Prometheus metrics subsystem and the Clair node scanner creator.
func New(cves CVESuppressor, subsystem pkgMetrics.Subsystem) NodeEnricher {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this is all combined?

}

if integration.GetType() == scannerTypes.ScannerV4 {
log.Debugf("Scanner v4 is not an orchestrator Scanner, exiting")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanner V4

// Currently, only StackRox Scanner and Scanner v4 are supported node integrations.
// Assumes integration.GetCategories() includes storage.ImageIntegrationCategory_NODE_SCANNER.
func ImageIntegrationToNodeIntegration(integration *storage.ImageIntegration) (*storage.NodeIntegration, error) {
i := &storage.NodeIntegration{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do this after the switch. No need to allocate space for this when we may just exit early in the default

IntegrationConfig: &storage.NodeIntegration_Clairify{
}

switch integration.GetType() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably (hopefully) unlikely, but I guess it's possible GetType and the type of the IntegrationConfig may not match. I'm thinking we can revert back to the check before (check for the config's type rather than this string type). We can do something like:

switch integration.GetIntegrationConfig().(type) {
case *v4.ImageIntegration_Clairify:
case *v4.ImageIntegration_ScannerV4:
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants