ROX-34929: Dedupe Scanner V4 Packages and Vulns#20996
Conversation
|
Skipping CI for Draft Pull Request. |
|
Worried about impact? Review this PR in Change Stack to explore blast radius before you approve or request changes. 📝 WalkthroughWalkthroughThis PR adds a new ChangesScanner V4 Deduplication
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/scanners/scannerv4/convert.go`:
- Around line 64-71: The loop that builds referencedSourceIDs when
features.ScannerV4Dedupe.Enabled() currently collects source IDs from every
package kind and must be restricted to only binary packages to avoid dropping
valid source+ancestry components; update the population logic so it only adds
srcID when pkg.GetKind() indicates a binary (check the enum/value used for
binary packages in this codebase) before calling
referencedSourceIDs.Add(pkg.GetSource().GetId()), replicate the same change for
the other analogous block (the block mentioned at lines 75-83), and add a
regression test that exercises a report containing only a source and an ancestry
package (no binary) to assert both components are retained.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: f90ad52b-bbfe-43eb-ab17-c3ab7fc086cf
📒 Files selected for processing (4)
pkg/features/list.gopkg/scanners/scannerv4/convert.gopkg/scanners/scannerv4/convert_test.gopkg/scanners/scannerv4/vm_convert.go
🚀 Build Images ReadyImages are ready for commit 15d1419. To use with deploy scripts: export MAIN_IMAGE_TAG=4.12.x-116-g15d1419620 |
|
/test all |
vikin91
left a comment
There was a problem hiding this comment.
Despite not being in the Scanner-core, I gave it a selective look. My human eyes didn't catch much, but I threw it at AI and there are two things that you may want to check:
- Question:
does this layer intentionally rely on PackageVulnerabilities already being advisory-deduped upstream when ROX_SCANNER_V4_RED_HAT_CVES=false?
I noticed vulnerabilities() now merges by ccVuln.GetName(). In isolation that would collapse two entries like:
- RHSA-2024:10775 -> ...CVE-2024-24789...
- RHSA-2024:10775 -> ...CVE-2024-24790...
into one result.
I think toProtoV4PackageVulnerabilitiesMap() / dedupeAdvisories() currently makes that unreachable, so this may be fine. Just wanted to confirm that this function is deliberately depending on that upstream invariant.
- Potential issue:
I think mergeFixFields() may regress remediation data in a real way, not just reshuffle metadata. Could we add a regression test for the case where a newer advisory has no FixedInVersion, but an older duplicate does?
Something like:
t.Run("newer advisory does not erase existing fix version", func(t *testing.T) {
vulnMap := map[string]*v4.VulnerabilityReport_Vulnerability{
"a": {
Id: "a", Name: "CVE-2024-1",
Advisory: &v4.VulnerabilityReport_Advisory{Name: "RHSA-2024:100"},
FixedInVersion: "1.0.0",
},
"b": {
Id: "b", Name: "CVE-2024-1",
Advisory: &v4.VulnerabilityReport_Advisory{Name: "RHSA-2024:200"},
},
}
got := vulnerabilities(vulnMap, []string{"a", "b"}, "", "")
require.Len(t, got, 1)
// Prefer the newer advisory metadata, but do not lose known fixability.
assert.Equal(t, "RHSA-2024:200", got[0].GetAdvisory().GetName())
assert.Equal(t, "1.0.0", got[0].GetFixedBy())
})This would help answer whether losing fixability here is really intended.
That's correct, by the time the the information gets to Central there should only be a single RHSA, and those RHSAs do not map to a single CVE, they generally represent many CVEs - the behavior in dedupeAdvisories addresses that. Regardless - ROX_SCANNER_V4_RED_HAT_CVES was to be removed when the dedicated advisory field was added (ROX-26672) but never was. I wouldn't put too much focus on that path, but either way it should behave OK.
That is the opposite of the existing test case: t.Run("advisory wins with no fix version", func(t *testing.T) {
vulnMap := map[string]*v4.VulnerabilityReport_Vulnerability{
"a": {
Id: "a", Name: "CVE-2024-1",
Advisory: &v4.VulnerabilityReport_Advisory{Name: "RHSA-2024:100"},
FixedInVersion: "1.0.0",
},
"b": {
Id: "b", Name: "CVE-2024-1",
Advisory: &v4.VulnerabilityReport_Advisory{Name: "RHSA-2024:200"},
},
}
got := vulnerabilities(vulnMap, []string{"a", "b"}, "", "")
require.Len(t, got, 1)
// "b" wins on advisory — fix version is empty from "b".
assert.Equal(t, "RHSA-2024:200", got[0].GetAdvisory().GetName())
assert.Equal(t, "", got[0].GetFixedBy())
})This scenario shouldn't occur - an RHSA describes a 'fix', any versions prior to the version mentioned are considered vulnerable, an RHSA wouldn't be published for a package it doesn't fix. But lets say that can happen - if we surface to the user that this vulnerability is fixed in 1.0.0, but a later RHSA says the package is no longer fixed (not possible), that user will still be affected by the vuln in that case. I could argue that 'no fix' is the more accurate conclusion (which is why I created that test / expected outcome) If we feel that we should carry the fix version forward in this case, I'm OK changing it - either way should not occur in practice. |
|
Thank you for answering! The answers make sense! |
vikin91
left a comment
There was a problem hiding this comment.
Approving as in my comment (suggesting that second pair of eyes give that a look, as I may lack some assumptions).
jvdm
left a comment
There was a problem hiding this comment.
Would it be simpler to buildEmbeddedVulnerability for the dedup candidate first, then have the merge functions compare two *storage.EmbeddedVulnerability values? The merge functions re-implement parts of buildEmbeddedVulnerability() (advisory(), vulnDataSource(), link()...) and nothing guarantees these two paths stay in sync. There might be bugs there.
|
@dcaravel: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
Adds logic to dedupe packages and vulnerabilities in Scanner V4 results.
The deduping logic is disabled by default because it will be backported, to enable set
ROX_SCANNER_V4_DEDUPE: true.Deduping scenarios addressed
kind:source,binary,ancestry)Placement
Originally the logic was added to
mappers.go, was later moved toconvert.goas it was more readable and allowed for a 'rolling' approach when modifying the vuln details. Inmappers.go, as vulns were combined new 'synthetic' vulns had to be created and added to thevulnerabilitiesmap to not break other packages that may not reference the same set of duplicates (unnecessary complexity).Logic
Comparing by
severityorCVSSwas not enough, OSV/Red Hat have priority over NVD and in various scenarios the 'lower severity' needed to be top level.To address known duplicate scenarios and keep data actionable, fields were grouped as 'fix' related and 'scoring' related (loose), the grouped fields are kept together such that the 'fix' info may come from duplicate A and the scoring from duplicate B.
Deduping examples from real images
Scenario 1: Duplicate RHSAs — same CVE, different advisories
Image:
registry.redhat.io/compliance/openshift-compliance-openscap-rhel8@sha256:b64926c3cd92381b31f31f28ca01f05c52bc0fbbc581941ba6f0da5b3b142d36Package:
expat 2.5.0-3.el9_5.1(rpm)Scoring: equal — both have identical CVSS, severity, and summary.
Fix: B wins — newer advisory, newer fixed-by version, later fix timestamp.
0:2.5.0-3.el9_5.30:2.5.0-5.el9_6Scenario 2: Different perspectives — different CVSS scores
Image:
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:29e3563d5fc11a54ffbf7b28694f1cc3975a72e24035fca4afa44d7937e932c7Package:
github.com/docker/docker v27.5.1+incompatible(Go binary)Scoring: A wins — has 2 CVSS metric sources vs 1.
Fix: equal — neither has an advisory or fix version.
AV:N/AC:H/...AV:N/AC:L/...Scenario 3: Different perspectives — one has fix version
Same image as Scenario 2
Package:
github.com/docker/docker v27.5.1+incompatible(Go binary)Scoring: A wins — higher CVSS (8.8 vs 7.8), 2 metric sources vs 1.
Fix: A wins — has fix version
29.3.1; B has none.29.3.1Scenario 4: Different perspectives — one has no scoring at all
Same image as Scenario 2
Package:
github.com/go-jose/go-jose/v3 v3.0.4(Go binary)Scoring: A wins — has CVSS, severity, and metric source; B has none.
Fix: equal — both have
3.0.5as the fix version.3.0.53.0.5Scenario 5: Different perspectives — different CVSS severity
Same image as Scenario 2
Package:
setuptools 53.0.0(Python)Scoring: A wins — higher CVSS (7.5 vs 5.9), 2 metric sources vs 1.
Fix: equal — both have
65.5.1as the fix version.AV:N/AC:L/...AV:N/AC:H/...65.5.165.5.1User-facing documentation
Testing and quality
Automated testing
How I validated my change
unit tests + CI + manually
Using a custom dupe detector and two different infra clusters, one with the fix the other without, analyzed each active image for duplicates.
The cluster without the fix: