Skip to content

[adapters] Wake up Kafka receiver thread when backpressure is relieved.#5934

Merged
blp merged 1 commit intomainfrom
kafka-backpressure
Mar 27, 2026
Merged

[adapters] Wake up Kafka receiver thread when backpressure is relieved.#5934
blp merged 1 commit intomainfrom
kafka-backpressure

Conversation

@blp
Copy link
Copy Markdown
Member

@blp blp commented Mar 26, 2026

The Kafka input connector uses threads to read data from librdkafka queues for the main connector thread to poll. Particularly when "synchronize_partitions" is turned on, queues could accumulate an arbitrary amount of data for the main thread to read, so we implement a backpressure mechanism that prevents any given queue from getting too big, with an arbitrary 1 MB per-queue limit.

Each of our receiver threads, in RecvThread::run, polls a number of queues in turn. If any work can be done in any of them, it loops, but if all of them either have an empty queue or are subject to backpressure, we park for up to one second to avoid wasting CPU. If we're parking for empty queues, a callback should wake the thread up if any of them become nonempty (see the call to set_nonempty_callback), so in that case the 1-second park timeout is "just in case".

However, until now there's been nothing similar for parking if all of the queues are subject to backpressure. This commit fixes that: it gives each PartitionReceiver the ability to wake up its ReceiverThread, and makes them do that whenever they dequeue data from the thread such that it should no longer be subject to backpressure.

This made a simple test case with "synchronize_partitions" enabled much faster for me, reducing it from several times as slow as with that feature disabled to more like ~5% slower, based on eyeballing the performance in the web console.

Describe Manual Test Plan

I manually tested the performance.

The Kafka input connector uses threads to read data from librdkafka queues
for the main connector thread to poll.  Particularly when
"synchronize_partitions" is turned on, queues could accumulate an arbitrary
amount of data for the main thread to read, so we implement a backpressure
mechanism that prevents any given queue from getting too big, with an
arbitrary 1 MB per-queue limit.

Each of our receiver threads, in RecvThread::run, polls a number of queues
in turn.  If any work can be done in any of them, it loops, but if all of
them either have an empty queue or are subject to backpressure, we park
for up to one second to avoid wasting CPU.  If we're parking for empty
queues, a callback should wake the thread up if any of them become
nonempty (see the call to `set_nonempty_callback`), so in that case the
1-second park timeout is "just in case".

However, until now there's been nothing similar for parking if all of
the queues are subject to backpressure.  This commit fixes that: it gives
each PartitionReceiver the ability to wake up its ReceiverThread, and makes
them do that whenever they dequeue data from the thread such that it should
no longer be subject to backpressure.

This made a simple test case with "synchronize_partitions" enabled much
faster for me, reducing it from several times as slow as with that feature
disabled to more like ~5% slower, based on eyeballing the performance
in the web console.

Signed-off-by: Ben Pfaff <blp@feldera.com>
@blp blp requested a review from ryzhyk March 26, 2026 22:30
@blp blp self-assigned this Mar 26, 2026
@blp blp added bug Something isn't working performance connectors Issues related to the adapters/connectors crate rust Pull requests that update Rust code user-reported Reported by a user or customer labels Mar 26, 2026

/// Returns true if a partition queue that holds `n_bytes` should pause for
/// backpressure.
fn needs_backpressure(n_bytes: usize) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this become some kind of configuration parameter?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see value in that yet.

Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@blp blp added this pull request to the merge queue Mar 26, 2026
Merged via the queue into main with commit 824bb15 Mar 27, 2026
1 check passed
@blp blp deleted the kafka-backpressure branch March 27, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working connectors Issues related to the adapters/connectors crate performance rust Pull requests that update Rust code user-reported Reported by a user or customer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants