-
Notifications
You must be signed in to change notification settings - Fork 581
How to enable transactions in large insert via spark #950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @chenrun0210, sorry I'm not familiar with the feature. Is that just about adding new settings for each insert? Can you share more details? The JDBC driver has fake transaction built-in for JDBC compliance and I should be able to enhance for atomic insertion as well. |
i am sorry i wrote the wrong link, the right link is: The JDBC driver has fake transaction built-in for JDBC compliance and I should be able to enhance for atomic insertion as well. -- has fake transaction built-in for JDBC compliance : does it mean the config about transaction of clickhouse jdbc actually dont work? if it works ,how do i config it |
I see. Yes, as of now, it's not supported. However, since JDBC driver supports executing multiple statements in one go, a workaround would be executing TCL manually, together with insertion. For example: begin transaction;
insert into mytable values (1),(10);
commit;
-- execute rollback when there's exception Another workaround would be using clickhouse-local to generate parts, copy them to ClickHouse server, and attach as needed. |
If you can upgrade ClickHouse to 22.7, you don't have to upgrade JDBC driver but simply add connection property |
Problem:
Batch insert into clickhouse using spark api Dataset.write.jdbc(). But when the spark job fail (including some task fail, and job fail, and executor lost) , data in clickhouse is not correct, usually more than origin data because of SPARK RETRY MECHANISM.
Code be like :
Dataset df = toclickhouse.spark.sql(" my query sql ").as(encoder);
df.write()
.mode("append")
.option("driver","com.clickhouse.jdbc.ClickHouseDriver")
.jdbc(Config.jdbcUrlAB, "clickhouse_table", Config.ckPropertiesAB);
when some task in spark job fails. data in clickhouse table is wrong.
Then i find Clickhouse Limited support for transactions in MergeTree tables :
ClickHouse/ClickHouse#22086
If large INSERT in single transaction, my problem would be solved perfectly.
But i didnt find how to use this feature in clickhouse-jdbc, and how to use it in spark api Dataset.write.jdbc(), is it to config the properties in Dataset.write.jdbc(url, table, properties) ?
The text was updated successfully, but these errors were encountered: