Flink CDC2.1.1 varchar类型主键大表数据同步速度很慢的问题 #801
Unanswered
a120610114
asked this question in
Q&A
Replies: 2 comments 19 replies
-
@wuchong 大佬可以帮忙解答一下吗 |
Beta Was this translation helpful? Give feedback.
4 replies
-
现在varchar 类型慢,一般是卡在前面计算切片的时候(要把所有切片都分好,才开始每个切片的读取)。有一个优化思路是异步切片:
|
Beta Was this translation helpful? Give feedback.
15 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
在使用FlinkCDC2.1.1 版本进行数据同步的时候我发现对于主键是varchar类型的大表数据(大约3000w左右的数据)数据同步非常慢,而且还经常报错。在查看日志的时候发现了这样的日志
fb72562c-845f-11ea-88a6-b8599fe5d1ea:1-244504311, row=0, event=0} for split MySqlSnapshotSplit{tableId=wxqyh_learnonline.tb_qy_examination_exam_user_ref, splitId='wxqyh_learnonline.tb_qy_examination_exam_user_ref:2031', splitKeyType=[
idVARCHAR(32) NOT NULL], splitStart=[b6547e5906a64ae6a5b062492f1b4b91], splitEnd=[b66b1de7513043379a327af9086c8f80], highWatermark=null} 2022-01-07 00:28:28,429 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - Snapshot step 2 - Snapshotting data 2022-01-07 00:28:28,429 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - Exporting data from split 'wxqyh_learnonline.tb_qy_examination_exam_user_ref:2031' of table wxqyh_learnonline.tb_qy_examination_exam_user_ref 2022-01-07 00:28:28,429 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - For split 'wxqyh_learnonline.tb_qy_examination_exam_user_ref:2031' of table wxqyh_learnonline.tb_qy_examination_exam_user_ref using select statement: 'SELECT * FROM
wxqyh_learnonline.
tb_qy_examination_exam_user_refWHERE id >= ? AND NOT (id = ?) AND id <= ?' 2022-01-07 00:28:31,217 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - Finished exporting 8093 records for split 'wxqyh_learnonline.tb_qy_examination_exam_user_ref:2031', total duration '00:00:02.788' 2022-01-07 00:28:31,222 INFO com.ververica.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask [] - Snapshot step 3 - Determining high watermark {ts_sec=0, file=mysql-bin.001188, pos=53630482, gtids=1935d4a6-2e71-11e9-9330-6c92bf5f0aed:450812989-484900678, 4ddb2c3f-6f05-11e9-8a9c-6c0b84d5a828:1-411556268,
这让我很疑惑,查看源码也是一样的
这是直接对varchar类型进行切分然后查询,当我用这个语句去数据库查询的时候发现非常慢
我想问下这种想象是正常的吗?设计的原理是什么?有没有什么办法可以提高这种varchar 大表数据的同步?
Beta Was this translation helpful? Give feedback.
All reactions