[dbsp] Spill to storage on insert less aggressively. by ryzhyk · Pull Request #5914 · feldera/feldera

ryzhyk · 2026-03-24T21:45:08Z

Until now, when adding a batch to a spine, the foreground thread would write it to storage if the batch exceeded min_storage_bytes. The goal was to prevent OOM failures when operators produce very large batches.

Since introducing the mechanism, we've implemented measures that prevent the pipeline from producing large batches: (1) most operators split their outputs into 10K-record batches; (2) inputs are chunked 10K records per connector per worker.

Nevertheless, there are still situations where this mechanism is activated:

Some operators, e.g., skewed exchanges, or broadcast exchanges, can produce batches with many records.
Wide records can exceed the 10MiB limit even if the batch is <=10K records.

The problem is that when this mechanism is activated, it can have negative impact of performance even if the pipeline is not under any memory pressure and there is no real need to write to storage more aggressively.

We now have a better solution thanks to the new memory backpressure mechanism: with this commit we only write large batches to storage if memory pressure is moderate or higher.

One other option I considered is to have yet another user configurable threshold for batches pushed to storage on insert, but I wanted to avoid introducing new control knobs.

Until now, when adding a batch to a spine, the foreground thread would write it to storage if the batch exceeded `min_storage_bytes`. The goal was to prevent OOM failures when operators produce very large batches. Since introducing the mechanism, we've implemented measures that prevent the pipeline from producing large batches: (1) most operators split their outputs into 10K-record batches; (2) inputs are chunked 10K records per connector per worker. Nevertheless, there are still situations where this mechanism is activated: - Some operators, e.g., skewed exchanges, or broadcast exchanges, can produce batches with many records. - Wide records can exceed the 10MiB limit even if the batch is <=10K records. The problem is that when this mechanism is activated, it can have negative impact of performance even if the pipeline is not under any memory pressure and there is no real need to write to storage more aggressively. We now have a better solution thanks to the new memory backpressure mechanism: with this commit we only write large batches to storage if memory pressure is moderate or higher. One other option I considered is to have yet another user configurable threshold for batches pushed to storage on insert, but I wanted to avoid introducing new control knobs. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

blp · 2026-03-24T21:49:14Z

crates/dbsp/src/circuit/runtime.rs

            if inner.memory_pressure() >= MemoryPressure::High {
                Some(0)
+            } else if inner.memory_pressure() >= MemoryPressure::Moderate {
+                // Moderate pressure: spill large batches to storage in the foreground; the merger will take care of the rest.
+                Some(
+                    storage
+                        .options
+                        .min_storage_bytes
+                        .unwrap_or(10 * 1024 * 1024),
+                )
            } else {
-                Some(storage.options.min_storage_bytes.unwrap_or({
-                    // This reduces the files stored on disk to a reasonable number.
-
-                    10 * 1024 * 1024
-                }))
+                // When there is no memory pressure, we leave it to the merger to write the batches to storage
+                // eventually.
+                Some(usize::MAX)


This might better be written as a match.

(This is petty and feel free to ignore it.)

ryzhyk requested a review from blp March 24, 2026 21:45

ryzhyk added DBSP core Related to the core DBSP library performance storage Persistence for internal state in DBSP operators labels Mar 24, 2026

mihaibudiu approved these changes Mar 24, 2026

View reviewed changes

blp reviewed Mar 24, 2026

View reviewed changes

blp approved these changes Mar 24, 2026

View reviewed changes

ryzhyk temporarily deployed to ci March 24, 2026 22:05 — with GitHub Actions Inactive

ryzhyk added this pull request to the merge queue Mar 25, 2026

Merged via the queue into main with commit 741e654 Mar 25, 2026
37 checks passed

ryzhyk deleted the no-write-on-insert branch March 25, 2026 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dbsp] Spill to storage on insert less aggressively.#5914

[dbsp] Spill to storage on insert less aggressively.#5914
ryzhyk merged 1 commit intomainfrom
no-write-on-insert

ryzhyk commented Mar 24, 2026

Uh oh!

blp Mar 24, 2026

Uh oh!

ryzhyk Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ryzhyk commented Mar 24, 2026

Uh oh!

blp Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ryzhyk Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants