-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Failed to verify checksum #21557
Comments
@zifengmo Could you share the code of the producer builder? And a Broker heap dump when the issue occurs. |
BTW, the messages will be packaged as a batched message if you enabled |
It seems a Netty bug.
|
@poorbarcode Is there any workaround if it's a Netty bug? |
The code of the producer builder: private Producer getProducer(String topic) throws PulsarClientException { |
The code of the message builder : And because of this problem, the broker has been rolled back.But it cannot be reproduced locally. I will redeploy it later to check. |
Thanks. Could you provide the client logs around the issue occurring? |
I have the same problem. version: broker 3.0.1 client 3.0.1 The same problem often occurs when I use remote replication in multiple clusters. This prevents messages from being replicated to the remote cluster. At the moment, I can only temporarily resolve this by restarting the pulsar brokers. logs like this: 2023-11-08T08:01:04,101+0000 [pulsar-io-8-8] INFO org.apache.pulsar.client.impl.ProducerImpl - [persistent://it_develop/gndp-enp/event03] [pulsar.repl.idc-pulsar-sit-->aliyun-pulsar-sit] Re-Sending 173 messages to server |
@qiaofazhan Could we start a meeting to talk about how to reproduce this issue? cc: @zifengmo @qiaofazhan Could you provide more logs that before the issue occurs? |
This is the debug log before and after the issue occurred. 2023-11-09T09:02:44,589+0000 [pulsar-io-4-4] DEBUG org.apache.pulsar.common.protocol.PulsarDecoder - [/172.22.2.22:44262] Received cmd SEND 2023-11-09T09:02:44,594+0000 [pulsar-io-4-4] DEBUG org.apache.pulsar.broker.service.ServerCnx - [/172.22.2.22:44262] connect state change to : [Failed] |
The client logs are the same as his. |
Sure, when is a suitable time for you? You can either directly send the meeting invitation code ,or your social media. |
Tencent Meeting: |
I think I found the root cause: The issue fixed by apache/bookkeeper#4140 is different from the issue you hit. |
…21684) Fixes #21557 ### Motivation There is a network package loss issue after enabling `haProxyProtocolEnabled`, which leads the error `Checksum failed on the broker` and `Adjusted frame length exceeds`, you can reproduce the issue by the test `testSlowNetwork`. ### Modifications Fix the bug. (cherry picked from commit 6e18874)
…21684) Fixes #21557 ### Motivation There is a network package loss issue after enabling `haProxyProtocolEnabled`, which leads the error `Checksum failed on the broker` and `Adjusted frame length exceeds`, you can reproduce the issue by the test `testSlowNetwork`. ### Modifications Fix the bug. (cherry picked from commit 6e18874)
…21684) Fixes #21557 ### Motivation There is a network package loss issue after enabling `haProxyProtocolEnabled`, which leads the error `Checksum failed on the broker` and `Adjusted frame length exceeds`, you can reproduce the issue by the test `testSlowNetwork`. ### Modifications Fix the bug. (cherry picked from commit 6e18874)
…pache#21684) Fixes apache#21557 ### Motivation There is a network package loss issue after enabling `haProxyProtocolEnabled`, which leads the error `Checksum failed on the broker` and `Adjusted frame length exceeds`, you can reproduce the issue by the test `testSlowNetwork`. ### Modifications Fix the bug. (cherry picked from commit 6e18874)
…pache#21684) Fixes apache#21557 ### Motivation There is a network package loss issue after enabling `haProxyProtocolEnabled`, which leads the error `Checksum failed on the broker` and `Adjusted frame length exceeds`, you can reproduce the issue by the test `testSlowNetwork`. ### Modifications Fix the bug. (cherry picked from commit 6e18874)
…pache#21684) Fixes apache#21557 There is a network package loss issue after enabling `haProxyProtocolEnabled`, which leads the error `Checksum failed on the broker` and `Adjusted frame length exceeds`, you can reproduce the issue by the test `testSlowNetwork`. Fix the bug. (cherry picked from commit 6e18874)
Search before asking
Version
brokers : 3.0.1
clients : 2.5.2 and 3.0.1
Minimal reproduce step
When I upgraded the broker to version 3.0.1, I started experiencing an issue with the producer sending messages. This issue doesn't always occur and seems to be related to the size of the messages being sent. It happens intermittently, with intervals ranging from a few hours to a few minutes.I can confirm that the message length is very short, less than 1k.And when I upgraded the client version from 2.5.2 to 3.0.1, the errors did not decrease, but instead became more frequent.
What did you expect to see?
No error is occurring.
What did you see instead?
2023-11-09T08:05:10,961+0000 [pulsar-io-4-3] ERROR org.apache.pulsar.broker.service.Producer - [PersistentTopic{topic=persistent://testa/out/event-partition-19}] [mqe-71-347] Failed to verify checksum
2023-11-09T08:05:10,964+0000 [pulsar-io-4-3] WARN org.apache.pulsar.broker.service.ServerCnx - [/172.0.3.33:53878] Got exception io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 5253120: 101194516 - discarded
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:507)
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:493)
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.exceededFrameLength(LengthFieldBasedFrameDecoder.java:377)
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:423)
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:333)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
at org.apache.pulsar.common.protocol.OptionalProxyProtocolDecoder.channelRead(OptionalProxyProtocolDecoder.java:52)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:842)
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: