test(integration-tests): draft / regression repro — 0-column RecordBatch via custom TableProvider#65
Draft
Curricane wants to merge 1 commit into
Draft
Conversation
…(gated)
Add a `ZeroColTable` / `ZeroColExec` fixture in `integration-tests/src/data.rs`
and a gated `tests/zero_col_regression.slt` that drives a
0-column / `row_count > 0` RecordBatch through the dist execution path.
Why this matters
----------------
A custom `TableProvider` that does not normalize `projection = Some(empty)`
into `Some(vec![0])` will, for `COUNT(*)`, emit a RecordBatch with 0
columns but `row_count > 0`. This shape has historically paniced dist in
two places:
- `dist/src/util.rs:113` on the small-table / single-task path when
consolidating partial aggregates.
- `arrow-select coalesce.rs:462` on the repartition path when
`CoalescePartitionsExec` feeds the empty-column buffer.
The `data-fabric` project landed a TableProvider-side fix
(`normalize_projection()`, commit `6e7b30a3`) that converts the empty
projection vector into `Some(vec![0])` before the plan ever reaches
dist, sidestepping both panics. That fix is purely upstream of dist —
the dist side still needs its own regression guard so that any future
dist change that re-introduces a panic on this exact shape is caught
here regardless of upstream TableProvider hygiene.
What this PR adds
-----------------
- `ZeroColTable`: a `TableProvider` that intentionally does NOT
normalize empty projections. Two registered variants:
* `zero_col_small` — 1 partition / 2 rows, drives the small-table
path.
* `zero_col_large` — 4 partitions × 2 rows, drives the
repartition / coalesce path (also exercised through a
`Partitioned` hash join to force `RepartitionExec`).
- `tests/zero_col_regression.slt` — exercises both via `COUNT(*)`,
`SELECT *` sanity, an empty-result-set query, the partitioned join,
and a cross-table scalar subquery.
- A separate `#[tokio::test] #[ignore]` runner
(`sqllogictest_zero_col_regression`) so the default `cargo test` run
does not exercise this and main CI stays green until dist is proven
stable against the shape. Run on demand with
`cargo test --test sqllogictest -- --ignored sqllogictest_zero_col_regression`.
This is intentionally a draft PR / regression-repro: if dist already
handles the shape stably, the test passes and we can flip the gate
later; if dist panics, the failure stack is reproducible here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status
Draft / regression repro — not asking for merge yet.
This PR adds a reproducer / regression-guard fixture for the 0-column
RecordBatchboundary that paniced data-fabric's dist execution path(data-fabric task #7/#10). The fix in data-fabric was on the
TableProviderside (normalize_projection(), commit6e7b30a3);this PR adds a regression guard on the dist side so that future dist
changes can't silently regress on this shape regardless of upstream
TableProviderhygiene.The new test is gated
#[ignore]so defaultcargo testskips it andmain CI stays green. If dist already handles the 0-col batch shape
stably, we can later drop the gate and treat this as a normal PR; if it
panics, the failure stack is reproducible here as a regression artifact.
What this PR adds
ZeroColTable/ZeroColExecinintegration-tests/src/data.rs.TableProviderthat intentionally does NOT normalizeprojection = Some(empty)toSome(vec![0]).zero_col_small— 1 partition / 2 rows. Drives the small-table /single-task code path (historically:
dist/src/util.rs:113assert_eq!on column counts when consolidating partialaggregates).
zero_col_large— 4 partitions × 2 rows = 8 rows total. Drivesthe repartition / coalesce path (historically:
arrow-select coalesce.rs:462assert_eq!when feeding 0-column batchesthrough
RepartitionExec/CoalescePartitionsExec).integration-tests/tests/zero_col_regression.slt— exercisesboth fixtures via:
COUNT(*)on each table (the lowering that emitsprojection = Some(vec![])).SELECT *sanity check (non-empty projection through the sameprovider).
COUNT(*) WHEREalways-false (empty result through emptyprojection).
COUNT(*)over aPartitioned-mode hash join onzero_col_largeto force a
RepartitionExecabove the 0-col scan stream.adjacent MemTable batches.
A separate gated runner
sqllogictest_zero_col_regressioninintegration-tests/tests/sqllogictest.rs, marked#[ignore]anddocumented with run instructions. The main
sqllogictesttest isunchanged.
Repro instructions
Local validation status — needs a
docker compose-capable environmentCode is verified clean, but the test has not yet been observed
end-to-end from the machine that authored this PR.
What was verified locally:
cargo check -p datafusion-dist-integration-tests --tests— clean.cargo clippy --workspace --tests -- -D warnings— clean.cargo fmt --check— clean.cargo test -p datafusion-dist-integration-tests --test sqllogictest(default suite, without
--ignored) compiles and the unchangedsqllogictesttest still runs.What is blocked here:
The integration-tests harness uses Docker Compose v2 (
docker compose -p ... up/down) insideintegration-tests/src/docker.rsto bring upthe dist server / postgres / etc. The authoring environment is a WSL
distro where:
docker compose(v2 plugin) is not installed.docker-compose(v1) WSL integration is notenabled in this distro.
As a result,
setup_containers()immediately fails at thedown -v --remove-orphansprecondition with:This is an environment gap, not a dist-side panic. It would block
the existing
sqllogictesttest in exactly the same way; it is notspecific to the new fixture.
What needs to happen next
Someone with a
docker compose-capable environment (any of: Linux boxwith the
docker-compose-pluginpackage, macOS with Docker Desktop,WSL with Docker Desktop's WSL integration enabled, or the project's CI
runners) needs to run exactly:
cargo test -p datafusion-dist-integration-tests --test sqllogictest \ -- --ignored sqllogictest_zero_col_regressionand post the outcome on this PR. Based on that outcome we then decide:
#[ignore]in a follow-up commit, mark this PR as anormal regression-guard, ready for review.
open a separate dist-side implementation fix task; this PR stays as
the reproducer artifact for that fix.
Why this can't be done with a
MemTablefixtureDataFusion's built-in
MemoryExec/DataSourceExecpaths alreadynormalize
projection = Some(empty)viawith_row_countinternally —they never emit a true 0-column batch into downstream stages. The
five
count(*)cases that landed in #63 all run againstMemTableandtherefore do not exercise the dist 0-col path. A custom
TableProviderthat mirrors the data-fabric pre-fix shape is the onlyway to repro the boundary inside this repo.
Outcome interpretation
Flip the gate (remove
#[ignore]) in a follow-up commit and treatthis as a normal regression-guard PR.
likely
dist/src/util.rs:113orarrow-select coalesce.rs:462) isthe actionable artifact for a dist-side fix. The draft stays draft
until the dist-side fix lands.
Out of scope
data-fabricnormalize_projection()fix is upstream of distand is not modified or assumed here. Dist's regression guard must be
independent of upstream
TableProviderhygiene.check_insert_batch_fieldis_nullablemismatch(separate thread / separate task) — kept independent per review
guidance.
References
normalize_projection()commit6e7b30a3dist/src/util.rs:113,arrow-select coalesce.rs:462general slt scaffolding this builds on.