[Issue 1128][Producer] Regard ProducerBlockedQuotaExceededException as a retryable exception for reconnecting#1457
Conversation
4de6a9f to
2c6d942
Compare
2c6d942 to
c5c3ffd
Compare
There was a problem hiding this comment.
Pull request overview
This pull request modifies the producer's reconnection behavior when encountering a ProducerBlockedQuotaExceededException. The change aligns the Go client with the Java client by treating this exception as retryable, allowing the producer to continue reconnecting after failing pending messages rather than closing the producer entirely.
Changes:
- Modified
producer_partition.goto remove early return whenProducerBlockedQuotaExceededExceptionis encountered, allowing reconnection to continue - Added test
TestProducerReconnectWhenBacklogQuotaExceedto verify the reconnection behavior - Updated golangci-lint version from v1.61.0 to v1.64.2
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| pulsar/producer_partition.go | Removed early return for ProducerBlockedQuotaExceededException to allow reconnection retry |
| pulsar/producer_test.go | Added test and custom backoff policy implementation to verify quota exceeded reconnection behavior |
| Makefile | Updated golangci-lint version to v1.64.2 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
RobertIndie
left a comment
There was a problem hiding this comment.
PTAL comments from copilot. Thanks.
de206e1 to
70d6b0b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Step 4: Send 512KB messages and monitor statistics | ||
| // Limit to 10 iterations to avoid infinite loop in test | ||
| isReachMaxBackoff := false | ||
| for i := 0; i < 10; i++ { |
There was a problem hiding this comment.
Why do we need 10 iterations? This seems to be consistently reproducible. Perhaps we don’t need the for loop and time.Sleep.
There was a problem hiding this comment.
@RobertIndie
The broker updates the backlog quota based on metrics calculations. Since metrics are collected approximately every 30 seconds, sending a single message larger than 10KB will not immediately trigger backlog quota throttling. Therefore, a for loop is needed to send multiple messages repeatedly, waiting for the backlog to reach the quota limit.
…tion to continue to reconnect
70d6b0b to
1c5c317
Compare
Fixes #1134
Master Issue: #1128
Motivation
Currently, the Go client handles connections in two ways:
grabCnx()method will not trigger retry logic. If an error occurs, it will be directly returned to the caller.runEventsLoop()method will detect the disconnection and call thereconnectToBroker()method to re-establish the connection. During reconnection, the method identifies whether the current exception is a retryable error. If it is a retryable error, theinternal.Retry()inside thereconnectToBroker()method will retry indefinitely based on the backoff strategy. If it is a non-retryable error, such as the currenterrMsgProducerBlockedQuotaExceededException, the partition producer will be closed, causing any following messages sent to that partition to result in aSendTimeoutError.Since the
backlogQuotaExceedExceptioncan be resolved by modifying the TTL or increasing consumption speed, it is likely to recover after some retry attempts. I believe the exception handling forerrMsgProducerBlockedQuotaExceededExceptionin thereconnectToBroker()method should align as closely as possible with the Java implementation. Java’s current retry logic for connections is as follows:handleSendError()is triggered to close the connection.connectionClosedmethod is called, which in turn invokes thegrabCnx()method, and subsequentlygrabCnx()calls theconnectionOpened()method to reconnect:connectionOpened()method, the client will execute different logic based on the current exception. For theProducerBlockedQuotaExceededExceptionexception, the Java client does not treat it as anisUnrecoverableError. Instead, it first cleans up the pending messages and then attempts to reconnect.Modifications
errMsgProducerBlockedQuotaExceededExceptionis encountered inreconnectToBroker(), onlyfailPendingMessages()should be executed, but the connection should not be closed. Specifically, the code should be modified as follows:Verifying this change
This change added tests and can be verified as follows:
TestProducerReconnectWhenBacklogQuotaExceedDoes this pull request potentially affect one of the following parts:
Documentation