Skip to content

[adapters] Add metrics and stats for transaction sizes and durations.#5800

Merged
blp merged 1 commit intomainfrom
txn-stats
Mar 12, 2026
Merged

[adapters] Add metrics and stats for transaction sizes and durations.#5800
blp merged 1 commit intomainfrom
txn-stats

Conversation

@blp
Copy link
Copy Markdown
Member

@blp blp commented Mar 11, 2026

Tested manually.

@blp blp requested a review from ryzhyk March 11, 2026 19:16
@blp blp self-assigned this Mar 11, 2026
@blp blp added performance connectors Issues related to the adapters/connectors crate rust Pull requests that update Rust code metrics Metrics about feldera pipelines labels Mar 11, 2026
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean refactor: moving the transaction sender into TransactionInfo is the right ownership model, and the AdvanceTransaction enum makes advance_transaction_state() easier to follow. The new transaction_msecs and transaction_records fields on the API and the two histograms (run time + commit time) are genuinely useful for debugging slow commits. Docs updated.

});
self.circuit
.start_transaction()
.expect("should have been able to start transaction");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these now panics?
They didn't seem to be previously.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are now panics. I reviewed the code that they call and it seemed to me that any failure is a bug in the caller.

pub fn tid(&self) -> Option<TransactionId> {
match self {
TransactionState::None => None,
TransactionState::Started { tid, .. } | TransactionState::Committing { tid, .. } => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know you can do this

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! It is occasionally a useful feature of matches.

@@ -4834,8 +4918,8 @@ impl TransactionInitiators {
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come no one complained about these two being in the wrong order?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a preferred order.

I didn't intentionally reorder these; it was a mistake as a result of intermediate changes that I ended up reverting before submission.

transaction_state: TransactionState,

/// For sending updates to the coordination status to the coordinator.
sender: tokio::sync::watch::Sender<TransactionCoordination>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also add the sender in the stats struct in other places?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this question.

This isn't a stats struct; it's part of what we use to track transactions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the name I assumed it's for stats

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should change the name. Which name is the one that causes the misconception?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Info". But I guess it's not "Stats"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that TransactionInfo isn't the best name. I think @ryzhyk invented it. Maybe TransactionManager or TransactionTracker or even just Transaction.

}
(TransactionState::None | TransactionState::Started { .. }, true) => Some(true),
(TransactionState::Committing { .. }, true) => {
error!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected to be unreachable?
What will happen if this is reached? Will the pipeline ever recover from this state?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is unreachable code. This code is moved from elsewhere in the same file, where it used error!. I think that we could change it to unreachable!() (but @ryzhyk might know better).

@blp blp added this pull request to the merge queue Mar 11, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 11, 2026
Copy link
Copy Markdown
Contributor

@ryzhyk ryzhyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you got rid of the duplicate fields in the global metrics.

commit_progress.remaining
);
metrics.histogram(
"transaction_run_seconds",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name may be too easy to misinterpret (although the description explains what it means). Maybe we should rename it to something more verbose (but possibly awkward): transaction_pre_commit_time_seconds

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of calling this the ingest phase? transaction_ingest_time_seconds

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works too!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}

pub fn start(&self) -> Option<Instant> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to start_time? Otherwise the name suggests that the method changes transaction state.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

///
/// - [TransactionStatus::CommitInProgress]: Time that this transaction has
/// been committing.
pub transaction_msecs: Option<u64>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new fields will need to be added to the Python SDK.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@blp blp force-pushed the txn-stats branch 5 times, most recently from a156f66 to f55e05a Compare March 12, 2026 18:13
Tested manually.

Signed-off-by: Ben Pfaff <blp@feldera.com>
@blp blp enabled auto-merge March 12, 2026 19:39
@blp blp added this pull request to the merge queue Mar 12, 2026
Merged via the queue into main with commit c55e36e Mar 12, 2026
1 check passed
@blp blp deleted the txn-stats branch March 12, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

connectors Issues related to the adapters/connectors crate metrics Metrics about feldera pipelines performance rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants