Skip to content

ducklake_flush_inlined_data() double-counts rows in record_count #709

@JendaT

Description

@JendaT

What happens?

Hello, it seems to me that when flushing inlined data to Parquet, record_count in ducklake_table_stats is incremented again for rows that were already counted during INSERT.

This complicates formulas for getting correct amount of inlined rows (inlined_rows = record_count - parquet_rows).

Reproduction steps:

  1. Insert 50k rows (inlined due to data_inlining_row_limit)
  2. Call ducklake_flush_inlined_data()
  3. record_count shows 100k instead of 50k

Minimal reproducible example here: https://github.com/JendaT/ducklake-phantom-rows-repro

To Reproduce

Minimal reproducible example here: https://github.com/JendaT/ducklake-phantom-rows-repro it uses postgresql as metadata db as this is what I have been using.

OS:

linux, amd64

DuckDB Version:

1.4.3

DuckLake Version:

0.3

DuckDB Client:

CLI

Hardware:

No response

Full Name:

Jenda T

Affiliation:

Bookbot s.r.o.

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions