Skip to content

Parquet ingestion issue #909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
den-crane opened this issue Apr 28, 2022 · 2 comments
Closed

Parquet ingestion issue #909

den-crane opened this issue Apr 28, 2022 · 2 comments
Labels
Milestone

Comments

@den-crane
Copy link
Collaborator

den-crane commented Apr 28, 2022

clickhouse-client -q "select number A, now() B, 'x' C from numbers(1e6) format Parquet" > test.parquet
clickhouse-client -q "select number A, now() B, 'x' C from numbers(1e6) format CSV" > test.csv
            client.connect(server).write()
                    .query("insert into test ")
                    .format(ClickHouseFormat.Parquet)
                    .data("~/test.parquet")
                    .execute()
                    .get();

SEVERE: Failed to create stream response, closing input stream
Exception in thread "main" com.clickhouse.client.ClickHouseException: Unsupported format: Parquet, server ClickHouseNode(addr=http:localhost/<unresolved>:8123, db=default)@84425729
	at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:113)
	at Main.main(Main.java:54)

the same works with CSV

            client.connect(server).write()
                    .query("insert into test ")
                    .format(ClickHouseFormat.CSV)
                    .data("~/test.csv")
                    .execute()
                    .get();

Also WHY I cannot do insert into test format CSV ?

            client.connect(server).write()
                    .query("insert into test format CSV")
                    .data("~/test.csv")
                    .execute()
                    .get();

Exception in thread "main" com.clickhouse.client.ClickHouseException: Code: 27. DB::ParsingException: Cannot parse input: expected ',' before: 'FORMAT TabSeparated\n0,"2022-04-28 20:56:27","x"\n1,"2022-04-28 20:56:27","x"\n2,"2022-04-28 20:56:27","x"\n3,"2022-04-28 20:56:27","x"\n4,"2022-04-28 20:56:27","x"\n': 
Row 1:
Column 0,   name: A, type: Int64,    ERROR: text "FORMAT Tab" is not like Int64

: While executing CSVRowInputFormat: (at row 1)
. (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 22.3.2.1)
, server ClickHouseNode(addr=http:localhost/<unresolved>:8123, db=default)@-1212239676

full code

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.sql.*;
import java.util.Properties;
import java.util.UUID;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;

import com.clickhouse.client.*;
import com.clickhouse.client.config.ClickHouseClientOption;
import com.clickhouse.client.data.BinaryStreamUtils;
import com.clickhouse.client.data.ClickHouseExternalTable;
import com.clickhouse.client.data.ClickHousePipedStream;

public class Main {

//  clickhouse-client -q "select number A, now() B, 'x' C from numbers(1e6) format Parquet" > test.parquet

    public static void main(String[] args) throws ClickHouseException {
        final String DB_HOST = "localhost";
        final String USER = "default";
        final String PASS = "";


        ClickHouseNode server = ClickHouseNode.builder()
                .host("localhost")
                .port(ClickHouseProtocol.HTTP, 8123)
                .database("default")
                .credentials(ClickHouseCredentials.fromUserAndPassword(USER,PASS))
                .build();

        try (ClickHouseClient client = ClickHouseClient.newInstance(server.getProtocol())) {
            ClickHouseRequest<?> request = client.connect(server);
            request.query("drop table if exists test").execute().get();
            request.query("create table test(A Int64, B DateTime, C String) engine=MergeTree() order by A").execute().get();

            client.connect(server).write()
                    .query("insert into test")
                    .format(ClickHouseFormat.Parquet)
                    .data("~/test.parquet")
                    .execute()
                    .get();

            ClickHouseResponse response = request.query("select count() from test").execute().get();
            for (ClickHouseRecord rec : response.records()) {
                System.out.println(rec.getValue(0).asString());
            }

        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw ClickHouseException.forCancellation(e, server);
        } catch (ExecutionException e) {
            throw ClickHouseException.of(e, server);
        }
    }
@zhicwu zhicwu added the bug label Apr 28, 2022
@zhicwu
Copy link
Contributor

zhicwu commented Apr 28, 2022

Thanks @den-crane. Yes we should be able to load data into a table as long as the format is supported by ClickHouse.

Java client does not parse query which is one of the reasons why it's faster than JDBC driver. However, insert into test format CSV(with data specified) could be a simple case(unlike select query which may have settings after format clause) that we can support.

@zhicwu zhicwu added this to the 0.3.2-patch9 milestone Apr 30, 2022
zhicwu added a commit that referenced this issue May 10, 2022
@zhicwu
Copy link
Contributor

zhicwu commented May 11, 2022

Both cases should work now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants