Skip to content

[dbsp] Spill to storage on insert less aggressively.#5914

Merged
ryzhyk merged 1 commit intomainfrom
no-write-on-insert
Mar 25, 2026
Merged

[dbsp] Spill to storage on insert less aggressively.#5914
ryzhyk merged 1 commit intomainfrom
no-write-on-insert

Conversation

@ryzhyk
Copy link
Copy Markdown
Contributor

@ryzhyk ryzhyk commented Mar 24, 2026

Until now, when adding a batch to a spine, the foreground thread would write it to storage if the batch exceeded min_storage_bytes. The goal was to prevent OOM failures when operators produce very large batches.

Since introducing the mechanism, we've implemented measures that prevent the pipeline from producing large batches: (1) most operators split their outputs into 10K-record batches; (2) inputs are chunked 10K records per connector per worker.

Nevertheless, there are still situations where this mechanism is activated:

  • Some operators, e.g., skewed exchanges, or broadcast exchanges, can produce batches with many records.
  • Wide records can exceed the 10MiB limit even if the batch is <=10K records.

The problem is that when this mechanism is activated, it can have negative impact of performance even if the pipeline is not under any memory pressure and there is no real need to write to storage more aggressively.

We now have a better solution thanks to the new memory backpressure mechanism: with this commit we only write large batches to storage if memory pressure is moderate or higher.

One other option I considered is to have yet another user configurable threshold for batches pushed to storage on insert, but I wanted to avoid introducing new control knobs.

Until now, when adding a batch to a spine, the foreground thread would write it
to storage if the batch exceeded `min_storage_bytes`. The goal was to prevent
OOM failures when operators produce very large batches.

Since introducing the mechanism, we've implemented measures that prevent the
pipeline from producing large batches: (1) most operators split their outputs
into 10K-record batches; (2) inputs are chunked 10K records per connector per
worker.

Nevertheless, there are still situations where this mechanism is activated:

- Some operators, e.g., skewed exchanges, or broadcast exchanges, can produce
  batches with many records.
- Wide records can exceed the 10MiB limit even if the batch is <=10K records.

The problem is that when this mechanism is activated, it can have negative
impact of performance even if the pipeline is not under any memory pressure and
there is no real need to write to storage more aggressively.

We now have a better solution thanks to the new memory backpressure mechanism:
with this commit we only write large batches to storage if memory pressure is
moderate or higher.

One other option I considered is to have yet another user configurable
threshold for batches pushed to storage on insert, but I wanted to avoid
introducing new control knobs.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
@ryzhyk ryzhyk requested a review from blp March 24, 2026 21:45
@ryzhyk ryzhyk added DBSP core Related to the core DBSP library performance storage Persistence for internal state in DBSP operators labels Mar 24, 2026
Comment on lines 1112 to +1125
if inner.memory_pressure() >= MemoryPressure::High {
Some(0)
} else if inner.memory_pressure() >= MemoryPressure::Moderate {
// Moderate pressure: spill large batches to storage in the foreground; the merger will take care of the rest.
Some(
storage
.options
.min_storage_bytes
.unwrap_or(10 * 1024 * 1024),
)
} else {
Some(storage.options.min_storage_bytes.unwrap_or({
// This reduces the files stored on disk to a reasonable number.

10 * 1024 * 1024
}))
// When there is no memory pressure, we leave it to the merger to write the batches to storage
// eventually.
Some(usize::MAX)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might better be written as a match.

(This is petty and feel free to ignore it.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@ryzhyk ryzhyk added this pull request to the merge queue Mar 25, 2026
Merged via the queue into main with commit 741e654 Mar 25, 2026
37 checks passed
@ryzhyk ryzhyk deleted the no-write-on-insert branch March 25, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DBSP core Related to the core DBSP library performance storage Persistence for internal state in DBSP operators

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants