[dbsp] Spill to storage on insert less aggressively.#5914
Merged
Conversation
Until now, when adding a batch to a spine, the foreground thread would write it to storage if the batch exceeded `min_storage_bytes`. The goal was to prevent OOM failures when operators produce very large batches. Since introducing the mechanism, we've implemented measures that prevent the pipeline from producing large batches: (1) most operators split their outputs into 10K-record batches; (2) inputs are chunked 10K records per connector per worker. Nevertheless, there are still situations where this mechanism is activated: - Some operators, e.g., skewed exchanges, or broadcast exchanges, can produce batches with many records. - Wide records can exceed the 10MiB limit even if the batch is <=10K records. The problem is that when this mechanism is activated, it can have negative impact of performance even if the pipeline is not under any memory pressure and there is no real need to write to storage more aggressively. We now have a better solution thanks to the new memory backpressure mechanism: with this commit we only write large batches to storage if memory pressure is moderate or higher. One other option I considered is to have yet another user configurable threshold for batches pushed to storage on insert, but I wanted to avoid introducing new control knobs. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
mihaibudiu
approved these changes
Mar 24, 2026
blp
reviewed
Mar 24, 2026
Comment on lines
1112
to
+1125
| if inner.memory_pressure() >= MemoryPressure::High { | ||
| Some(0) | ||
| } else if inner.memory_pressure() >= MemoryPressure::Moderate { | ||
| // Moderate pressure: spill large batches to storage in the foreground; the merger will take care of the rest. | ||
| Some( | ||
| storage | ||
| .options | ||
| .min_storage_bytes | ||
| .unwrap_or(10 * 1024 * 1024), | ||
| ) | ||
| } else { | ||
| Some(storage.options.min_storage_bytes.unwrap_or({ | ||
| // This reduces the files stored on disk to a reasonable number. | ||
|
|
||
| 10 * 1024 * 1024 | ||
| })) | ||
| // When there is no memory pressure, we leave it to the merger to write the batches to storage | ||
| // eventually. | ||
| Some(usize::MAX) |
Member
There was a problem hiding this comment.
This might better be written as a match.
(This is petty and feel free to ignore it.)
blp
approved these changes
Mar 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Until now, when adding a batch to a spine, the foreground thread would write it to storage if the batch exceeded
min_storage_bytes. The goal was to prevent OOM failures when operators produce very large batches.Since introducing the mechanism, we've implemented measures that prevent the pipeline from producing large batches: (1) most operators split their outputs into 10K-record batches; (2) inputs are chunked 10K records per connector per worker.
Nevertheless, there are still situations where this mechanism is activated:
The problem is that when this mechanism is activated, it can have negative impact of performance even if the pipeline is not under any memory pressure and there is no real need to write to storage more aggressively.
We now have a better solution thanks to the new memory backpressure mechanism: with this commit we only write large batches to storage if memory pressure is moderate or higher.
One other option I considered is to have yet another user configurable threshold for batches pushed to storage on insert, but I wanted to avoid introducing new control knobs.