fix: Cap memory consumption during merge by rebasedming · Pull Request #4893 · paradedb/paradedb

rebasedming · 2026-04-25T00:20:07Z

Ticket(s) Closed

Closes #

What

Twice in prod now we've seen situations where very few segments get created because the target segment count defaults to CPU count. Once, this led to a very skewed index (where one segment contained most of the matches for a particularly query) and another time it led to OOM because merging down to a few huge segments consumes more memory.

The solution is to try and cap merges to maintenance_work_mem. It's not a perfect heuristic, but during CREATE INDEX we can set the target segment count such that merges don't exceed the memory budget, and during normal merge we can skip merging segments that exceed the budget.

Why

How

Tests

Derive a per-merge doc budget from `maintenance_work_mem` (using a hardcoded `MERGE_BYTES_PER_DOC = 1024` estimate) and enforce it in two places: - `LayeredMergePolicy` splits a layer's candidates whenever the running doc count would exceed the budget, and skips segments that exceed the budget on their own rather than emitting degenerate single-segment candidates. - `WorkerBuildState::try_merge` trims its chunk size to fit the budget before draining, guaranteeing at least 2 segments per merge so we keep making progress. Adds three new unit tests covering the split, oversized-segment, and loose-budget cases.

…merge Move the per-merge doc-budget enforcement out of `try_merge` and into `adjusted_target_segment_count`: floor the segment count by `ceil(reltuples / merge_doc_budget())` so the per-worker chunking math naturally produces merges that fit in `maintenance_work_mem`. This is cleaner than the runtime cap — there's one place that knows about the budget instead of two, no "what if bounded chunk_size < 2" edge case, and the final segment count matches what the planner says upfront. The `LayeredMergePolicy` doc-budget enforcement stays as-is; it's the safety net for INSERT/VACUUM/background merges that don't go through `build_parallel`.

codecov · 2026-04-25T00:44:18Z

Codecov Report

❌ Patch coverage is 38.46154% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.20%. Comparing base (1706dac) to head (418d8e3).
⚠️ Report is 104 commits behind head on main.

Files with missing lines	Patch %	Lines
pg_search/src/index/merge_policy.rs	26.22%	45 Missing ⚠️
pg_search/src/postgres/build_parallel.rs	57.14%	3 Missing ⚠️

❌ Your patch check has failed because the patch coverage (38.46%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4893      +/-   ##
==========================================
- Coverage   77.26%   77.20%   -0.06%     
==========================================
  Files         194      194              
  Lines       52121    52188      +67     
==========================================
+ Hits        40269    40294      +25     
- Misses      11852    11894      +42

Files with missing lines	Coverage Δ
pg_search/src/gucs.rs	`71.95% <100.00%> (+0.38%)`	⬆️
pg_search/src/postgres/merge.rs	`70.53% <100.00%> (-0.50%)`	⬇️
pg_search/src/postgres/build_parallel.rs	`82.74% <57.14%> (-0.34%)`	⬇️
pg_search/src/index/merge_policy.rs	`53.26% <26.22%> (-4.24%)`	⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mdashti

LGTM if the comments are resolved as suggested.

mdashti · 2026-04-28T20:20:57Z

+/// value is intentionally conservative: tantivy's merge keeps per-field readers
+/// and posting buffers in memory, and 1 KB/doc is a rough floor that empirically
+/// keeps peak RSS near `maintenance_work_mem` on text-heavy workloads.
+const MERGE_BYTES_PER_DOC: usize = 1024;


Should this be a GUC? 1 KB/doc is a reasonable floor for text-heavy schemas, but fat-JSON or mixed-fast-field schemas consume way more, and numeric-only ones way less. If the estimate is wrong in prod, we'd want a knob to tune without redeploying — paradedb.merge_bytes_per_doc would fit right next to the other GUCs in this file.

mdashti · 2026-04-28T20:21:06Z

+                // memory cap.  The next layer (or a future merge cycle) can deal with it.
+                if segment_doc_count > self.doc_budget {
+                    continue;
+                }


Quick sanity check on "the next layer (or a future merge cycle) can deal with it" — doc_budget is constant across layers, so if a segment has more docs than the budget here, it'll have more docs than the budget at every layer. We're not deferring it, we're stranding it. That's fine as an OOM cap, but it does mean an oversized segment that accumulates dead docs can't be compacted via merge anymore. Vacuum becomes the only path to reclaim those docs. Worth either softening the comment or carving an exception for single-segment compactions of oversized segments with deletes.

mdashti · 2026-04-28T20:21:09Z

+        let min_segments_for_budget = (reltuples / doc_budget as f64).ceil() as usize;
+        if min_segments_for_budget > target_segment_count {
+            pgrx::debug1!(
+                "bumping target_segment_count from {target_segment_count} to {min_segments_for_budget} to keep per-merge docs under merge_doc_budget ({doc_budget}); reltuples: {reltuples}"


The user explicitly asked for a target_segment_count in the WITH clause and we silently override it. A debug1! is invisible to almost everyone running CREATE INDEX. I'd promote this to notice! so people understand why their index has more segments than they configured — otherwise the next bug report is going to be "why does my target=4 index have 17 segments."

mdashti · 2026-04-28T20:21:13Z

+        // one shot.  If the user asked for fewer segments than the doc budget allows, the
+        // per-worker chunking math would eventually try to merge more docs than fit in memory;
+        // bumping the target up means each chunk naturally stays under [`gucs::merge_doc_budget`].
+        let doc_budget = gucs::merge_doc_budget();


With N parallel workers, each one can run a merge pulling in merge_doc_budget docs concurrently, so peak memory across workers is roughly N × mwm. adjust_maintenance_work_mem already splits the writer budget by nlaunched, but this floor uses the full mwm. Is that intentional? My naive read is that the build-time floor should use mwm / nlaunched so parallel-worker peak memory stays near mwm — but I might be missing something about how tantivy schedules merge memory across workers.

mdashti · 2026-04-28T20:21:16Z

+        // per-worker chunking math would eventually try to merge more docs than fit in memory;
+        // bumping the target up means each chunk naturally stays under [`gucs::merge_doc_budget`].
+        let doc_budget = gucs::merge_doc_budget();
+        let min_segments_for_budget = (reltuples / doc_budget as f64).ceil() as usize;


reltuples is whatever pg_class has, and it's frequently stale or zero right after a bulk load without ANALYZE. The fallback path estimates from page count which is also rough. If reltuples comes in much higher than reality, min_segments_for_budget blows up and we end up with way more segments than needed. Worth a sanity cap — maybe 10× the user's target_segment_count — so a bad statistics estimate doesn't shred the segment topology.

mdashti · 2026-04-28T20:21:19Z

+        assert_eq!(candidates[0].0.len(), 2);
+        assert_eq!(largest_layer_size, 1000);
+    }
 }


A regression test that builds an index over a large-ish table with a small maintenance_work_mem and asserts the bumped segment count would catch regressions and also document the user-visible behavior change.

rebasedming added 2 commits April 24, 2026 16:45

rebasedming added the cherry-pick/0.23.x Request that this PR to `main` should get an automatic cherry-pick PR to `0.23.x` after it lands. label Apr 25, 2026

rebasedming marked this pull request as ready for review April 25, 2026 01:05

rebasedming requested a review from a team as a code owner April 25, 2026 01:05

rebasedming requested a review from mdashti April 25, 2026 01:05

Merge branch 'main' into cap-merge-memory-by-mwm

418d8e3

mdashti approved these changes Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Cap memory consumption during merge#4893

fix: Cap memory consumption during merge#4893
rebasedming wants to merge 3 commits into
mainfrom
cap-merge-memory-by-mwm

rebasedming commented Apr 25, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

mdashti left a comment

Uh oh!

mdashti Apr 28, 2026

Uh oh!

mdashti Apr 28, 2026

Uh oh!

mdashti Apr 28, 2026

Uh oh!

mdashti Apr 28, 2026

Uh oh!

mdashti Apr 28, 2026

Uh oh!

mdashti Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rebasedming commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket(s) Closed

What

Why

How

Tests

Uh oh!

codecov Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mdashti left a comment

Choose a reason for hiding this comment

Uh oh!

mdashti Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

mdashti Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

mdashti Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

mdashti Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

mdashti Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

mdashti Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rebasedming commented Apr 25, 2026 •

edited

Loading

codecov Bot commented Apr 25, 2026 •

edited

Loading