Skip to content

[dbsp] Transaction support for asof joins.#5372

Merged
ryzhyk merged 1 commit intomainfrom
asof-splitter
Jan 7, 2026
Merged

[dbsp] Transaction support for asof joins.#5372
ryzhyk merged 1 commit intomainfrom
asof-splitter

Conversation

@ryzhyk
Copy link
Copy Markdown
Contributor

@ryzhyk ryzhyk commented Jan 5, 2026

Refactor the asof join operator to split outputs over multiple steps, so it doesn't produce very large output batches.

This will go in after #5370

@ryzhyk ryzhyk requested a review from mihaibudiu January 5, 2026 08:59
@ryzhyk ryzhyk added the DBSP core Related to the core DBSP library label Jan 5, 2026
delta2: Cow<'_, Option<Spine<I2>>>,
delayed_trace2: Cow<'_, T2>,
) -> Z {
) -> impl futures::Stream<Item = (Z, bool, Option<Position>)> + 'static {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you seem to have used an alias of AsyncStream elsewhere.
It would be nice to have that defined somewhere globally and used uniformly

Refactor the asof join operator to split outputs over multiple steps,
so it doesn't produce very large output batches.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
@ryzhyk ryzhyk marked this pull request as ready for review January 6, 2026 21:57
Copilot AI review requested due to automatic review settings January 6, 2026 21:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the asof join operator to split outputs over multiple steps using a streaming approach, preventing very large output batches and supporting transactions. The key change is converting the operator from a synchronous QuaternaryOperator to a StreamingQuaternaryOperator that yields results in chunks.

Key changes:

  • Converted asof join from synchronous to streaming operator that yields chunked results
  • Changed internal state fields (flush, delta1, delta2) to use RefCell for interior mutability
  • Updated tests to handle chunked output using concat() and added assertions comparing against reference implementations

Comment on lines +608 to +618
let delayed_trace1 = if *self.flush.borrow() {
Some(delayed_trace1.as_ref().ro_snapshot())
} else {
None
};

let delayed_trace2 = if *self.flush.borrow() {
Some(delayed_trace2.ro_snapshot())
} else {
None
};
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flush flag is checked twice via borrow() at lines 608 and 614. Consider storing the result once to avoid redundant borrow operations.

Copilot uses AI. Check for mistakes.
Comment on lines +631 to +632
let delta1 = self.delta1.take().unwrap();
let delta2 = self.delta2.take().unwrap();
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using take() directly on RefCell fields requires calling methods on the borrowed value. This should be self.delta1.borrow_mut().take() - verify this compiles correctly.

Suggested change
let delta1 = self.delta1.take().unwrap();
let delta2 = self.delta2.take().unwrap();
let delta1 = self.delta1.borrow_mut().take().unwrap();
let delta2 = self.delta2.borrow_mut().take().unwrap();

Copilot uses AI. Check for mistakes.
if output_tuples.len() >= chunk_size {
let result = Z::dyn_from_tuples(&self.factories.output_factories, (), &mut output_tuples);
self.output_batch_stats.borrow_mut().add_batch(result.len());
yield (result, false, delta1_cursor.position());
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line uses delta1_cursor.position() but is inside the loop processing delta2 only (after delta1 is exhausted). Should use delta2_cursor.position() instead.

Suggested change
yield (result, false, delta1_cursor.position());
yield (result, false, delta2_cursor.position());

Copilot uses AI. Check for mistakes.
@ryzhyk ryzhyk added this pull request to the merge queue Jan 6, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 6, 2026
@ryzhyk ryzhyk added this pull request to the merge queue Jan 6, 2026
Merged via the queue into main with commit 1c6fed3 Jan 7, 2026
5 of 7 checks passed
@ryzhyk ryzhyk deleted the asof-splitter branch January 7, 2026 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DBSP core Related to the core DBSP library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants