generated from duckdb/extension-template
-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
Description
What happens?
Hello, it seems to me that when flushing inlined data to Parquet, record_count in ducklake_table_stats is incremented again for rows that were already counted during INSERT.
This complicates formulas for getting correct amount of inlined rows (inlined_rows = record_count - parquet_rows).
Reproduction steps:
- Insert 50k rows (inlined due to data_inlining_row_limit)
- Call ducklake_flush_inlined_data()
- record_count shows 100k instead of 50k
Minimal reproducible example here: https://github.com/JendaT/ducklake-phantom-rows-repro
To Reproduce
Minimal reproducible example here: https://github.com/JendaT/ducklake-phantom-rows-repro it uses postgresql as metadata db as this is what I have been using.
OS:
linux, amd64
DuckDB Version:
1.4.3
DuckLake Version:
0.3
DuckDB Client:
CLI
Hardware:
No response
Full Name:
Jenda T
Affiliation:
Bookbot s.r.o.
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have