Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Tuple write performance (open for discussion) #1473

Open
4 of 9 tasks
mdonkers opened this issue Jan 16, 2025 · 1 comment
Open
4 of 9 tasks

Improving Tuple write performance (open for discussion) #1473

mdonkers opened this issue Jan 16, 2025 · 1 comment

Comments

@mdonkers
Copy link
Contributor

mdonkers commented Jan 16, 2025

Observed

  1. high number of slices being allocated for inserting Tuple type objects

Details

Split off from #1426 , to have a further discussion on the direction and its usefulness.


When using the Tuple type for inserting values, the values need an underlying Array / Slice type for ClickHouse Go to process them correctly. But depending on how data is internally presented before inserting, it might mean new slices need to be allocated. If these tuples then form the values of a map, the overhead of those allocations can quickly become significant.

Take for example the following struct:

type ValueWithTypeTuple struct {
	V string
	T int8
}

Currently to insert these, you need to do something like:

func (avt ValueWithTypeTuple) Value() (driver.Value, error) {
	return []any{avt.V, avt.T}, nil
}

So I'm wondering if a specific Tuple type makes sense, with specific types for the most common lengths (like Tuple2, Tuple3 and Tuple4).

The method to insert these would then look something like this:

// Get implements the column.Tuple2 interface from ClickHouse. It returns two values that can be inserted as Tuple.
func (avt ValueWithTypeTuple) Get() (any, any) {
	return &avt.V, int8(avt.T)
}

It then no longer requires the slice allocation, instead directly referencing the values.

I'd like to know any thoughts or other ideas to see if, and what kind of implementation might make sense.

Environment

  • clickhouse-go version: v2.30.0
  • Interface: ClickHouse API
  • Go version: 1.23.4
  • Operating system: Linux
  • ClickHouse version:
  • Is it a ClickHouse Cloud? No
  • ClickHouse Server non-default settings, if any:
  • CREATE TABLE statements for tables involved:
  • Sample data for all these tables, use clickhouse-obfuscator if necessary
@serprex
Copy link
Member

serprex commented Jan 16, 2025

  1. I've recommended a general interface Tuple { Get(int) any; Len() int }
  2. another option would be to support Seq[any] from iter

All these will generally involve boxing each element in any & any system that would instead rely on tagging struct fields with tuple index values will have to go through reflect. It's a tough spot. You'd have to go the route of Tuple2[type1, type2] & even then golang does not yet do monomorphism & you're going to have to get interfaces involved at some point in the generic code unless generics leak through a lot more of the codebase

A POC would help answer a lot on feasibility / impact here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants