Skip to content

File data insert using CSVWithNames is failing in v0.4.1 #1254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rsim opened this issue Feb 22, 2023 · 6 comments · Fixed by #1291
Closed

File data insert using CSVWithNames is failing in v0.4.1 #1254

rsim opened this issue Feb 22, 2023 · 6 comments · Fixed by #1291
Labels
Milestone

Comments

@rsim
Copy link

rsim commented Feb 22, 2023

We see the same regression when using CSVWithNames with the latest jdbc driver that we reported earlier #1072
Here is the same problem description that we see:

We are using ClickHouse JDBC driver to import CSV files that include a header line with column names. Here is the JRuby code that we use https://github.com/rsim/mondrian-olap/blob/master/spec/rake_tasks.rb#L377-L381

conn.jdbc_connection.createStatement.write.
  query("INSERT INTO #{table_name}(#{columns_string})").
  format(Java::com.clickhouse.data.ClickHouseFormat::CSVWithNames).
  data(file_path).execute

It is now failing again with the same error:

Java::JavaUtilConcurrent::CompletionException: com.clickhouse.client.ClickHouseException: Code: 27. DB::ParsingException: Cannot parse input: expected ',' before: 'id,the_date,the_day,the_month,the_year,day_of_month,week_of_year,month_of_year,quarter\n1,2010-01-01 02:00:00,Friday,January,2010,1,0,1,Q1\n2,2010-01-02 02:00:00,':
Row 1:
Column 0,   name: id,            type: Int32,    ERROR: text "id,the_dat" is not like Int32

: While executing ParallelParsingBlockInputFormat: (at row 1)
. (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 22.1.4.1)
, server ClickHouseNode [uri=http://localhost:8123/mondrian_test]@802732211com.clickhouse.client.ClickHouseClientBuilder$Agent.handle(com/clickhouse/client/ClickHouseClientBuilder.java:272)
com.clickhouse.client.ClickHouseClientBuilder$Agent.send(com/clickhouse/client/ClickHouseClientBuilder.java:296)
com.clickhouse.client.ClickHouseClientBuilder$Agent.execute(com/clickhouse/client/ClickHouseClientBuilder.java:349)
com.clickhouse.client.ClickHouseRequest.execute(com/clickhouse/client/ClickHouseRequest.java:2064)
java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)

Probably it is the same problem again, that the CSV format (without the header line) is used instead of CSVWithNames and it complains about the first file line with column headers.

@zhicwu Do you have an idea what could cause this regression in in the latest version 0.4.1 that was previously fixed in #1082 ? Or should I try to bisect and find the commit that has broken it again?

@rsim
Copy link
Author

rsim commented Feb 22, 2023

@zhicwu I found out with git bisect that this huge commit 91e6777 broke this.
I couldn't compile this commit but a commit before 0855184 was still good and the first commit that compiled after that e51c2c4 was bad.

@zhicwu
Copy link
Contributor

zhicwu commented Feb 23, 2023

Hi @rsim, really sorry about this and appreciate for getting back to the issue with detailed inputs.

This is probably a good example of how stupid a client trying to play smart - calling data("~/a.csv") will change format to CSV(and potentially compression settings as well). Why? Because Java client took the file extension(.csv here) as hints to automatically set format to CSV.

Could you move format() after data()? I'll need to make sure format and compression etc. won't be override by this unless they're not set previously.

@rsim
Copy link
Author

rsim commented Feb 23, 2023

@zhicwu Thank you for the workaround. I moved format() after data(), and now it works correctly.

@zhicwu
Copy link
Contributor

zhicwu commented Feb 23, 2023

@zhicwu Thank you for the workaround. I moved format() after data(), and now it works correctly.

Thanks for confirming. Will add the specific case for regression.

rsim added a commit to rsim/mondrian-olap that referenced this issue Feb 23, 2023
@zhicwu zhicwu added the bug label Mar 6, 2023
@zhicwu zhicwu added this to the 0.4.2 release milestone Mar 6, 2023
@zhicwu zhicwu linked a pull request Mar 20, 2023 that will close this issue
3 tasks
@zhicwu
Copy link
Contributor

zhicwu commented Mar 20, 2023

I thought this over, but I think it's better to keep it as is. Actually in ClickHouse, select ... into outfile and insert into ... from infile have similar issue.

I'm open to all suggestions and please feel free to reopen the issue if you have better idea.

@avj-vaibhav
Copy link

avj-vaibhav commented Feb 26, 2024

I thought this over, but I think it's better to keep it as is. Actually in ClickHouse, select ... into outfile and insert into ... from infile have similar issue.

I'm open to all suggestions and please feel free to reopen the issue if you have better idea.

@zhicwu
This is something similar to what im trying to achieve.

If the issue with infile or outfile has some workaround, will be much interested to know them.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants