feat: scoped and global custom metric trackers#18302
Conversation
|
Skipping CI for Draft Pull Request. |
|
Images are ready for the commit at 95b304d. To use with deploy scripts, first |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18302 +/- ##
=======================================
Coverage 49.39% 49.40%
=======================================
Files 2745 2747 +2
Lines 207225 207296 +71
=======================================
+ Hits 102368 102419 +51
- Misses 97265 97281 +16
- Partials 7592 7596 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new tests in
Test_scopecompare full metric outputs as raw strings, which can be brittle due to potential non-deterministic ordering from Prometheus; consider parsing the metrics or comparing via structured data (e.g., decoded metric families) instead of exact string equality. - In
trackerRunner.ServeHTTP, you construct a newpromhttp.HandlerForon every request; you could improve efficiency by creating a reusable handler (or at least a reusableprometheus.Gathererswrapper) and only varying the underlying registries, as those are already long-lived.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new tests in `Test_scope` compare full metric outputs as raw strings, which can be brittle due to potential non-deterministic ordering from Prometheus; consider parsing the metrics or comparing via structured data (e.g., decoded metric families) instead of exact string equality.
- In `trackerRunner.ServeHTTP`, you construct a new `promhttp.HandlerFor` on every request; you could improve efficiency by creating a reusable handler (or at least a reusable `prometheus.Gatherers` wrapper) and only varying the underlying registries, as those are already long-lived.
## Individual Comments
### Comment 1
<location> `central/metrics/custom/tracker/tracker_base.go:115` </location>
<code_context>
+func makeTrackerBase[F Finding](metricPrefix, description string, scoped bool,
+ getters LazyLabelGetters[F], generator FindingGenerator[F],
+) *TrackerBase[F] {
+ registryFactory := globalRegistryFactory
+ if scoped {
+ registryFactory = metrics.GetCustomRegistry
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the scoped flag and magic global ID with an explicit scope ID provider strategy injected via constructors to clarify tracker behavior and simplify Gather.
You can keep the new functionality while making the semantics more explicit and removing the boolean + magic ID coupling by turning “scope” into an explicit strategy on the tracker.
### 1. Replace `scoped` + `globalScopeID` with an explicit scope ID provider
Add a function field that defines how to get the scope ID, instead of storing a boolean and using `""` as a magic value:
```go
type TrackerBase[F Finding] struct {
metricPrefix string
description string
getters LazyLabelGetters[F]
generator FindingGenerator[F]
config *Configuration
metricsConfigMux sync.RWMutex
gatherers sync.Map
cleanupWG sync.WaitGroup
registryFactory func(userID string) (metrics.CustomRegistry, error)
scopeIDProvider func(ctx context.Context) (string, bool) // <— new
}
```
Then `Gather` becomes simpler and self-describing:
```go
func (tracker *TrackerBase[Finding]) Gather(ctx context.Context) {
id, ok := tracker.scopeIDProvider(ctx)
if !ok {
return
}
cfg := tracker.getConfiguration()
if cfg == nil {
return
}
gatherer := tracker.getGatherer(id, cfg)
if gatherer == nil {
return
}
defer tracker.cleanupInactiveGatherers()
defer gatherer.running.Store(false)
if cfg.period == 0 || time.Since(gatherer.lastGather) < cfg.period {
return
}
// ...
}
```
This removes `scoped` and the need for `globalScopeID` in `Gather`, and makes the “auth required or not” decision explicit via the `scopeIDProvider`.
### 2. Pass behavior directly from constructors instead of via `scoped` flag
You can still share the common constructor logic, but without a boolean that needs to be re-interpreted later:
```go
func MakeTrackerBase[F Finding](metricPrefix, description string,
getters LazyLabelGetters[F], generator FindingGenerator[F],
) *TrackerBase[F] {
return makeTrackerBase(
metricPrefix,
description,
metrics.GetCustomRegistry,
func(ctx context.Context) (string, bool) {
userID, err := authn.IdentityFromContext(ctx)
if err != nil {
return "", false
}
return userID.UID(), true
},
getters,
generator,
)
}
func MakeGlobalTrackerBase[F Finding](metricPrefix, description string,
getters LazyLabelGetters[F], generator FindingGenerator[F],
) *TrackerBase[F] {
return makeTrackerBase(
metricPrefix,
description,
globalRegistryFactory,
func(ctx context.Context) (string, bool) {
return "global", true // or keep "" if you prefer, but it's now encapsulated
},
getters,
generator,
)
}
func makeTrackerBase[F Finding](
metricPrefix, description string,
registryFactory func(string) (metrics.CustomRegistry, error),
scopeIDProvider func(context.Context) (string, bool),
getters LazyLabelGetters[F],
generator FindingGenerator[F],
) *TrackerBase[F] {
return &TrackerBase[F]{
metricPrefix: metricPrefix,
description: description,
getters: getters,
generator: generator,
registryFactory: registryFactory,
scopeIDProvider: scopeIDProvider,
}
}
```
This keeps:
- Both public constructors (`MakeTrackerBase` and `MakeGlobalTrackerBase`)
- The global vs. scoped registry behavior
- The global vs. user-scoped `Gather` behavior
But removes:
- The `scoped` field
- The `globalScopeID` sentinel from control flow
- The boolean → factory selection → later reinterpretation chain
The resulting code makes the variant behavior explicit and localized to construction, while `Gather` just uses the injected strategies.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Comparing strings is more visual, and I didn't see any variations in Prometheus output in the tests.
A cache could indeed be introduced to cache handlers per user ID. But that looks like an overkill. |
58a6dae to
e6ac4e1
Compare
e6ac4e1 to
8da234c
Compare
Description
Not all custom metrics need to be scoped (create a separate registry for each scrape user ID). This PR adds support for global trackers and makes a couple of existing trackers as such.
User-facing documentation
Testing and quality
Automated testing
How I validated my change
CI. Manual.
Current dependencies on/for this PR: