Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 1 addition & 22 deletions crates/dbsp/src/circuit/checkpointer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,15 +70,7 @@ impl Checkpointer {
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the startup behavior: previously, if checkpoint_list was empty (no checkpoint or unreadable), storage files were preserved and only measured. Now gc_startup() runs unconditionally and will delete everything it doesn't recognize as belonging to a known checkpoint.

The fix is correct in intent, but it needs a test for the specific case this addresses: pipeline force-stopped before first checkpoint -> restart -> orphaned files are cleaned up. Without it there's no regression guard if gc_startup()'s handling of an empty checkpoint list ever changes.


fn init_storage(&self) -> Result<(), Error> {
let usage = if !self.checkpoint_list.is_empty() {
self.gc_startup()?
} else {
// There's no checkpoint file, or we couldn't read it. Don't run GC,
// to ensure that we don't accidentally remove everything.
//
// We still know the amount of storage in use.
self.measure_storage_use()?
};
let usage = self.gc_startup()?;

// We measured the amount of storage in use. Give it to the backend as
// the initial value.
Expand All @@ -87,19 +79,6 @@ impl Checkpointer {
Ok(())
}

fn measure_storage_use(&self) -> Result<u64, Error> {
let mut usage = 0;
StorageError::ignore_notfound(self.backend.list(
&StoragePath::default(),
&mut |_path, file_type| {
if let StorageFileType::File { size } = file_type {
usage += size;
}
},
))?;
Ok(usage)
}

pub(super) fn measure_checkpoint_storage_use(&self, uuid: uuid::Uuid) -> Result<u64, Error> {
let mut usage = 0;
StorageError::ignore_notfound(self.backend.list(
Expand Down
Loading