Skip to content

ROX-30570: add repository-to-CPE mapping API to indexer#18703

Closed
BradLugo wants to merge 1 commit intodc/sbom-ingest-central-apifrom
blugo/ROX-30570-GetRepositoryToCPEMapping
Closed

ROX-30570: add repository-to-CPE mapping API to indexer#18703
BradLugo wants to merge 1 commit intodc/sbom-ingest-central-apifrom
blugo/ROX-30570-GetRepositoryToCPEMapping

Conversation

@BradLugo
Copy link
Copy Markdown
Contributor

@BradLugo BradLugo commented Jan 27, 2026

Description

Adds a new RPC, GetRepositoryToCPEMapping, to the Indexer service that returns the Red Hat repository-to-CPE mapping used for RHEL package vulnerability matching. This mapping will be needed by the matcher's ScanSBOM API to enrich RHEL packages with CPE information during SBOM vulnerability scanning.

Alternatives

  • Individual repository-to-cpe lookup: Frankly, this would be my long-term solution if we weren't planning on moving toward other indexing improvements (i.e., "slim index reports"), which is really my preferred solution here. This is a valid option, and we could prove whether it's better than sending the entire mapping over with performance testing, but the current changes seemed easier to implement at the time. We can revisit this during the Tech Preview implementation.
  • repository-to-cpe mapping in Matcher: We'd need to have the same repository-to-cpe related configurations in the Matcher, which requires deployment configuration changes. Given that this is meant to be a somewhat temporary solution, I didn't see the benefit in that effort.
  • SBOM parsing in the Indexer: Some of the SBOMs are already too large to send over the wire as-is. Sending that data between the Indexer and Matcher seemed like too much. The repository-to-cpe data is much smaller.

Stacked on #18484.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

No automated testing. Will follow up in ROX-27690.

How I validated my change

  1. Deploy stackrox
  2. Optionally set the ROX_SBOM_SCANNING feature flag:
    ❯ kubectl set env deploy/scanner-v4-indexer ROX_SBOM_SCANNING=true
    
  3. Port forward Indexer
    ❯ kubectl port-forward svc/scanner-v4-indexer 8443
    
  4. Hit the endpoint
    ❯ grpcurl -insecure \
        -import-path proto \
        -proto internalapi/scanner/v4/indexer_service.proto \
        localhost:8443 scanner.v4.Indexer/GetRepositoryToCPEMapping | head
    {
      "mapping": {
        "3scale-amp-2-for-rhel-8-ppc64le-debug-rpms": {
          "cpes": [
            "cpe:/a:redhat:3scale:2.13::el8",
            "cpe:/a:redhat:3scale:2.14::el8",
            "cpe:/a:redhat:3scale:2.15::el8",
            "cpe:/a:redhat:3scale:2.16::el8",
            "cpe:/a:redhat:3scale_amp:2.11::el8",
            "cpe:/a:redhat:3scale_amp:2.12::el8",
    

@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Jan 27, 2026

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new httputil.Updater and repository-to-CPE mapping code use slog while the rest of the scanner codebase appears to standardize on zlog; consider switching to zlog for consistency and to preserve existing logging context behavior.
  • The updater interval is hardcoded as a package-level variable (24h) in scanner/internal/httputil; consider making this configurable (e.g., via constructor parameter or config) so you can tune refresh frequency without code changes and better support different environments.
  • In RepositoryToCPEUpdater.NewUpdater you set a default URL when the passed URL is empty, but indexer.NewIndexer never calls NewUpdater with an empty URL (it returns early instead), so this fallback is currently dead code; either move the defaulting logic up into NewIndexer or remove it here to avoid confusion.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new httputil.Updater and repository-to-CPE mapping code use slog while the rest of the scanner codebase appears to standardize on zlog; consider switching to zlog for consistency and to preserve existing logging context behavior.
- The updater interval is hardcoded as a package-level variable (24h) in scanner/internal/httputil; consider making this configurable (e.g., via constructor parameter or config) so you can tune refresh frequency without code changes and better support different environments.
- In RepositoryToCPEUpdater.NewUpdater you set a default URL when the passed URL is empty, but indexer.NewIndexer never calls NewUpdater with an empty URL (it returns early instead), so this fallback is currently dead code; either move the defaulting logic up into NewIndexer or remove it here to avoid confusion.

## Individual Comments

### Comment 1
<location> `scanner/internal/httputil/updater.go:22` </location>
<code_context>
+
+// Updater returns a value that's periodically updated.
+// Largely based on https://github.com/quay/claircore/blob/v1.5.48/rhel/internal/common/updater.go.
+type Updater struct {
+	url          string
+	typ          reflect.Type
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring Updater to use generics (or a factory function) instead of reflection and any to make the type handling clearer and safer.

The main complexity comes from the type-erasure via `any` + `reflect.Type` + `reflect.New`. You can keep the behavior (periodic HTTP JSON fetching with rate limiting and atomic swaps) while removing reflection and `any` by making `Updater` generic.

This reduces cognitive load and keeps it reusable if more types appear later.

### Suggested refactor using generics

```go
type Updater[T any] struct {
    url          string
    value        atomic.Pointer[T]
    reqRate      *rate.Limiter
    mu           sync.RWMutex // protects lastModified
    lastModified string
}

func NewUpdater[T any](url string, init *T) *Updater[T] {
    u := &Updater[T]{
        url:     url,
        reqRate: rate.NewLimiter(interval, 1),
    }
    u.value.Store(init)
    return u
}

// Get returns a pointer to the current copy of the value.
func (u *Updater[T]) Get(ctx context.Context, c *http.Client) (*T, error) {
    var err error
    if u.url != "" && u.reqRate.Allow() {
        slog.DebugContext(ctx, "got unlucky, updating mapping file")
        if err = u.Fetch(ctx, c); err != nil {
            slog.ErrorContext(ctx, "error updating mapping file", "reason", err)
        }
    }
    return u.value.Load(), err
}

func (u *Updater[T]) Fetch(ctx context.Context, c *http.Client) error {
    log := slog.With("url", u.url)
    log.DebugContext(ctx, "attempting fetch of mapping file")

    req, err := http.NewRequestWithContext(ctx, http.MethodGet, u.url, nil)
    if err != nil {
        return err
    }

    u.mu.RLock()
    if u.lastModified != "" {
        req.Header.Set("if-modified-since", u.lastModified)
    }
    u.mu.RUnlock()

    resp, err := c.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    switch resp.StatusCode {
    case http.StatusOK:
    case http.StatusNotModified:
        log.DebugContext(ctx, "response not modified; no update necessary", "since", u.lastModified)
        return nil
    default:
        return fmt.Errorf("received status code %d querying mapping url", resp.StatusCode)
    }

    var v T
    if err := json.NewDecoder(resp.Body).Decode(&v); err != nil {
        return fmt.Errorf("failed to decode mapping file: %w", err)
    }

    u.mu.Lock()
    u.lastModified = resp.Header.Get("last-modified")
    u.mu.Unlock()

    u.value.Store(&v)
    log.DebugContext(ctx, "atomic update of local mapping file complete")
    return nil
}
```

Usage for your mapping type becomes type-safe and clearer:

```go
// Assuming type MappingFile struct { ... }
updater := NewUpdater[MappingFile](url, &MappingFile{})
m, err := updater.Get(ctx, httpClient)
// m is *MappingFile, no type assertions/reflect needed
```

This preserves:
- periodic fetch behavior
- `If-Modified-Since` caching
- `rate.Limiter`
- atomic swaps of the current value

while removing:
- `reflect.Type`
- `reflect.New`
- `any` + type-erasure

If generics are not an option in this codebase, a second-best simplification is to replace `reflect.Type` / `reflect.New` with a factory function:

```go
type Updater struct {
    url          string
    newValue     func() any
    value        atomic.Value
    // ...
}

func NewUpdater(url string, newValue func() any, init any) *Updater {
    u := &Updater{
        url:      url,
        newValue: newValue,
        reqRate:  rate.NewLimiter(interval, 1),
    }
    u.value.Store(init)
    return u
}

// in Fetch:
v := u.newValue()
if err := json.NewDecoder(resp.Body).Decode(v); err != nil { ... }
u.value.Store(v)
```

This keeps the abstraction but avoids `reflect.Type` and `reflect.New`, making it easier to follow.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


// Updater returns a value that's periodically updated.
// Largely based on https://github.com/quay/claircore/blob/v1.5.48/rhel/internal/common/updater.go.
type Updater struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring Updater to use generics (or a factory function) instead of reflection and any to make the type handling clearer and safer.

The main complexity comes from the type-erasure via any + reflect.Type + reflect.New. You can keep the behavior (periodic HTTP JSON fetching with rate limiting and atomic swaps) while removing reflection and any by making Updater generic.

This reduces cognitive load and keeps it reusable if more types appear later.

Suggested refactor using generics

type Updater[T any] struct {
    url          string
    value        atomic.Pointer[T]
    reqRate      *rate.Limiter
    mu           sync.RWMutex // protects lastModified
    lastModified string
}

func NewUpdater[T any](url string, init *T) *Updater[T] {
    u := &Updater[T]{
        url:     url,
        reqRate: rate.NewLimiter(interval, 1),
    }
    u.value.Store(init)
    return u
}

// Get returns a pointer to the current copy of the value.
func (u *Updater[T]) Get(ctx context.Context, c *http.Client) (*T, error) {
    var err error
    if u.url != "" && u.reqRate.Allow() {
        slog.DebugContext(ctx, "got unlucky, updating mapping file")
        if err = u.Fetch(ctx, c); err != nil {
            slog.ErrorContext(ctx, "error updating mapping file", "reason", err)
        }
    }
    return u.value.Load(), err
}

func (u *Updater[T]) Fetch(ctx context.Context, c *http.Client) error {
    log := slog.With("url", u.url)
    log.DebugContext(ctx, "attempting fetch of mapping file")

    req, err := http.NewRequestWithContext(ctx, http.MethodGet, u.url, nil)
    if err != nil {
        return err
    }

    u.mu.RLock()
    if u.lastModified != "" {
        req.Header.Set("if-modified-since", u.lastModified)
    }
    u.mu.RUnlock()

    resp, err := c.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    switch resp.StatusCode {
    case http.StatusOK:
    case http.StatusNotModified:
        log.DebugContext(ctx, "response not modified; no update necessary", "since", u.lastModified)
        return nil
    default:
        return fmt.Errorf("received status code %d querying mapping url", resp.StatusCode)
    }

    var v T
    if err := json.NewDecoder(resp.Body).Decode(&v); err != nil {
        return fmt.Errorf("failed to decode mapping file: %w", err)
    }

    u.mu.Lock()
    u.lastModified = resp.Header.Get("last-modified")
    u.mu.Unlock()

    u.value.Store(&v)
    log.DebugContext(ctx, "atomic update of local mapping file complete")
    return nil
}

Usage for your mapping type becomes type-safe and clearer:

// Assuming type MappingFile struct { ... }
updater := NewUpdater[MappingFile](url, &MappingFile{})
m, err := updater.Get(ctx, httpClient)
// m is *MappingFile, no type assertions/reflect needed

This preserves:

  • periodic fetch behavior
  • If-Modified-Since caching
  • rate.Limiter
  • atomic swaps of the current value

while removing:

  • reflect.Type
  • reflect.New
  • any + type-erasure

If generics are not an option in this codebase, a second-best simplification is to replace reflect.Type / reflect.New with a factory function:

type Updater struct {
    url          string
    newValue     func() any
    value        atomic.Value
    // ...
}

func NewUpdater(url string, newValue func() any, init any) *Updater {
    u := &Updater{
        url:      url,
        newValue: newValue,
        reqRate:  rate.NewLimiter(interval, 1),
    }
    u.value.Store(init)
    return u
}

// in Fetch:
v := u.newValue()
if err := json.NewDecoder(resp.Body).Decode(v); err != nil { ... }
u.value.Store(v)

This keeps the abstraction but avoids reflect.Type and reflect.New, making it easier to follow.

@BradLugo BradLugo force-pushed the blugo/ROX-30570-GetRepositoryToCPEMapping branch from ef9ea26 to b202bc4 Compare January 27, 2026 19:10
Adds a new RPC, GetRepositoryToCPEMapping, to the Indexer service that
returns the Red Hat repository-to-CPE mapping used for RHEL package
vulnerability matching. This mapping will be needed by the matcher's
ScanSBOM API to enrich RHEL packages with CPE information during SBOM
vulnerability scanning.
@BradLugo BradLugo force-pushed the blugo/ROX-30570-GetRepositoryToCPEMapping branch from b202bc4 to ae6392a Compare January 27, 2026 19:14
@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Jan 27, 2026

Images are ready for the commit at ae6392a.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.10.x-880-gae6392a37a.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 0% with 159 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.16%. Comparing base (01e65a6) to head (ae6392a).

Files with missing lines Patch % Lines
scanner/internal/httputil/updater.go 0.00% 54 Missing ⚠️
scanner/indexer/repositorytocpeupdater.go 0.00% 47 Missing ⚠️
pkg/scannerv4/client/client.go 0.00% 21 Missing ⚠️
scanner/indexer/indexer.go 0.00% 18 Missing ⚠️
scanner/services/indexer.go 0.00% 15 Missing ⚠️
scanner/indexer/remote.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@                      Coverage Diff                       @@
##           dc/sbom-ingest-central-api   #18703      +/-   ##
==============================================================
- Coverage                       49.20%   49.16%   -0.04%     
==============================================================
  Files                            2659     2661       +2     
  Lines                          199858   200017     +159     
==============================================================
- Hits                            98344    98343       -1     
- Misses                          94082    94241     +159     
- Partials                         7432     7433       +1     
Flag Coverage Δ
go-unit-tests 49.16% <0.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@BradLugo
Copy link
Copy Markdown
Contributor Author

Closing in favor of #18705 (changing branch name to correct issue number)

@BradLugo BradLugo closed this Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants