[fix] Fix an issue where zero queue consumers are unable to receive messages after topic unloading #473
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
If a consumer with a receiver queue size of 0 is waiting for messages using
receiveorreceiveAsync, and the topic is unloaded or the broker is restarted, the consumer will no longer be able to receive messages. This is becausereceiveandreceiveAsynchave different bugs. Both bugs are highly reproducible.receive
When the
receivemethod of a zero queue consumer is executed,fetchSingleMessageFromBrokeris called internally. InfetchSingleMessageFromBroker, the connection of the received message is compared with the connection when the flow permit was sent, and if they do not match, the message is discarded.pulsar-client-cpp/lib/ConsumerImpl.cc
Lines 918 to 942 in 115d64a
If the topic is unloaded or the broker is restarted between sending the flow permit and receiving the message, the connections at the two times will no longer match, so the message will be discarded and no further flow permits will be sent, causing message delivery to stop.
receiveAsync
If the consumer is reopened by unloading the topic or restarting the broker, flow permits that were sent before the reopening must be sent again. However, if a consumer with a receiver queue size of 0 is waiting for messages with
receiveAsync, it does not seem to send flow permits even if the consumer is reopened.pulsar-client-cpp/lib/ConsumerImpl.cc
Lines 311 to 332 in 115d64a
Modifications
receive
In
fetchSingleMessageFromBroker, I think the connection of the received message should be compared with the current connection, not the connection at the time the flow permit was sent. In fact, the Java client makes such a comparison.https://github.com/apache/pulsar/blob/8a40b30cf47a91ec02d931e6371d02409ba5951e/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L121
In addition, if the
handleCreateConsumermethod, which is executed when the consumer is reopened, is executed "after settingwaitingForZeroQueueSizeMessageto true" and "before sending a flow permit", there is a possibility that the flow permit will be sent twice, so we lockmutex_to perform exclusive control.receiveAsync
We can find the number of flow permits sent before the reconnection and the number of callbacks waiting for
receiveAsyncto complete by checking the number of elements inpendingReceives_.pulsar-client-cpp/lib/ConsumerImpl.cc
Lines 976 to 981 in 115d64a
Therefore, when a zero queue consumer is reopened, it should send the same number of flow permits as the elements of
pendingReceives_.In this case too, if
handleCreateConsumeris executed "after adding an element topendingReceives_" and "before sending a flow permit", it is possible that more flow permits will be sent than expected. So we need to lockmutex_inreceiveAsynctoo.Verifying this change
Documentation
doc-not-needed