Skip to content

feat: Support Decimal type in approx_distinct#23190

Merged
Jefffrey merged 3 commits into
apache:mainfrom
mkleen:decimal_approx_distinct
Jun 27, 2026
Merged

feat: Support Decimal type in approx_distinct#23190
Jefffrey merged 3 commits into
apache:mainfrom
mkleen:decimal_approx_distinct

Conversation

@mkleen

@mkleen mkleen commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

  • Support the SQL Decimal type for approx_distinct

  • The Arrow types Decimal32, Decimal64, Decimal128 and Decimal256 can be directly supported for NumericHLLAccumulator and HllGroupsAccumulator

What changes are included in this PR?

  • Enable NumericHLLAccumulator and HllGroupsAccumulator to support Decimal32, Decimal64, Decimal128 and Decimal256.
  • Tests for the non-grouped and grouped path as part of approx_distinct.rst and aggregate.slt for Decimal128 and Decimal256
  • Benchmarks

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, approx_distinct supports now Decimal but no breaking changes.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 25, 2026
@mkleen mkleen changed the title feat: Support decimal in approx_distinct feat: Support Decimal in approx_distinct Jun 25, 2026
@mkleen mkleen force-pushed the decimal_approx_distinct branch from e15371c to bb405f7 Compare June 25, 2026 16:53
@mkleen mkleen changed the title feat: Support Decimal in approx_distinct feat: Support Decimal type in approx_distinct Jun 25, 2026
@mkleen mkleen changed the title feat: Support Decimal type in approx_distinct feat: Support Decimal type in approx_distinct Jun 25, 2026
@mkleen mkleen marked this pull request as ready for review June 25, 2026 17:24
@mkleen mkleen force-pushed the decimal_approx_distinct branch from bb405f7 to eef9d22 Compare June 25, 2026 21:57

@Jefffrey Jefffrey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add decimal32 & decimal64 as well

@mkleen

mkleen commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

lets add decimal32 & decimal64 as well

AFAIK no SQL Type mapping exists for Decimal32 and Decimal64 see https://datafusion.apache.org/user-guide/sql/data_types.html. Does it make sense to support it then? Following count also just supports Decimal128 and Decimal256.

DataType::Decimal128(_, _) => Box::new(PrimitiveDistinctCountAccumulator::<
Decimal128Type,
>::new(data_type)),
DataType::Decimal256(_, _) => Box::new(PrimitiveDistinctCountAccumulator::<
Decimal256Type,
>::new(data_type)),

@Jefffrey

Copy link
Copy Markdown
Contributor

i dont think we should base support only on sql type mapping; for example this could be called via dataframe api, or if sql has arrow_cast, etc.

count does support it via the generic catchall, but i think its just an oversight that decimal32/decimal64 dont have specialized paths since they were types added later (compared to decimal128/decimal256)

// Use the generic accumulator based on `ScalarValue` for all other types
_ => Box::new(DistinctCountAccumulator {
values: HashSet::default(),
state_data_type: data_type.clone(),
}),

@mkleen

mkleen commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

i dont think we should base support only on sql type mapping; for example this could be called via dataframe api, or if sql has arrow_cast, etc.

count does support it via the generic catchall, but i think its just an oversight that decimal32/decimal64 dont have specialized paths since they were types added later (compared to decimal128/decimal256)

// Use the generic accumulator based on `ScalarValue` for all other types
_ => Box::new(DistinctCountAccumulator {
values: HashSet::default(),
state_data_type: data_type.clone(),
}),

ok, sounds good.

@mkleen mkleen requested a review from Jefffrey June 26, 2026 09:01
@mkleen mkleen force-pushed the decimal_approx_distinct branch from 4898a2d to e2b4b8e Compare June 26, 2026 09:20
@mkleen

mkleen commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

@Jefffrey Thank you for the review!

@Jefffrey Jefffrey added this pull request to the merge queue Jun 27, 2026
@Jefffrey

Copy link
Copy Markdown
Contributor

thanks @mkleen

Merged via the queue into apache:main with commit 766f129 Jun 27, 2026
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants