Add composite indexes for alert decision filters#4468
Conversation
|
@jimstrang: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.
DetailsI am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository. |
|
@jimstrang: There are no area labels on this PR. You can add as many areas as you see fit.
DetailsI am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository. |
|
/kind fix |
|
Small update: I’m moving this to draft for now. The indexes still look useful, and they do improve the access path for So I don’t want to present this as the confirmed fix for #4464 yet. It may still be part of the fix, or a worthwhile performance improvement on its own, but I’d like to verify the concurrent case before asking for a full review. |
Overview
This PR addresses the
/v1/alertsslowdowns anddatabase is lockedreports in #4464 by adding a few composite indexes on thedecisionstable. The reports started showing up after the 1.7.8 decision-stream chunking changes, but I do not think the chunked stream is the slow query itself. My read is that it made the SQLite/LAPI workload more likely to overlap with already-expensive alert-list queries, which is where the lock symptoms surface.The slow path I was able to reproduce directly is an alert-list query that combines decision filters such as
decision_type,origin, andhas_active_decision. Today those filters are translated into separateEXISTSchecks againstdecisions. With only the existingalert_decisionsindex, SQLite can find the decisions attached to an alert, but it still has to walk all of them to checktype,origin, oruntil.That gets painful when an alert has a large number of linked decisions. In the database I used for validation, one alert was linked to roughly 15,000 decisions (CAPI community blocklist), and the original query shape could take tens of seconds.
What Changed
This adds composite indexes for the columns used by the alert decision filters:
(alert_decisions, type)(alert_decisions, origin)(alert_decisions, until)The schema change is in
pkg/database/ent/schema/decision.go, with the generated migration schema updated inpkg/database/ent/migrate/schema.go. Existing installations should get the indexes through the normal Ent auto-migration path on startup.Why This Approach
There is another possible fix that keeps everything in code: combine the decision-related alert filters into one
HasDecisionsWith(...)call so Ent emits a singleEXISTSsubquery instead of several. That also performs well.I did not take that route here because it can change the meaning of
/v1/alertsfilters.The current query matches alerts that have at least one linked decision for each filter, even if those filters match different decision rows. The combined-predicate version would require the same decision row to satisfy all filters at once. That may be the intended behavior, but it is a semantic decision rather than just a performance fix, and I did not see existing tests or documentation that clearly pin it either way.
If maintainers can clarify the intended filter semantics here, I think that code-only predicate change would still be worth exploring. In other words, should multiple decision filters on
/v1/alertsmean "find alerts where one linked decision matches all of these conditions" or "find alerts where the linked decisions collectively satisfy these conditions"?The index-only change keeps the current behavior and just gives the database a better access path.
Validation
I tested this against a SQLite database from an affected setup with:
For the original
/v1/alertsdecision-filter query shape:The row count stayed the same before and after the indexes, so this is a performance-only change.
After the change, SQLite uses covering indexes for the correlated decision lookups, for example:
On the same snapshot, creating the three indexes took about 125ms total for ~30k decisions. Larger databases will take longer, but this should be a one-time migration cost.
Refs #4464