Skip to content

[fix](variant) Fix variant flat-leaf root read plan#63086

Merged
eldenmoon merged 1 commit intoapache:masterfrom
eldenmoon:fix-root-doc
May 9, 2026
Merged

[fix](variant) Fix variant flat-leaf root read plan#63086
eldenmoon merged 1 commit intoapache:masterfrom
eldenmoon:fix-root-doc

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented May 8, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Flat-leaf compaction/checksum mode should keep root variant reads on the root-flat path and should not fall back to doc-value root reconstruction. Avoid probing an empty leaf path, keep explicit doc bucket columns on DOC_COMPACT, and update the doc-bucket unit test to assert that root uses VariantRootColumnIterator in flat-leaf mode.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • ./run-be-ut.sh --run --filter='VariantColumnWriterReaderTest.test_read_doc_compact_from_doc_value_bucket': 1 test passed
    • ./run-be-ut.sh --run --filter='*Variant*': 172 tests from 16 test suites, 171 passed, 1 skipped
    • build-support/clang-format.sh be/src/storage/segment/variant/variant_column_reader.cpp be/test/storage/segment/variant_column_writer_reader_test.cpp
    • git diff --check
  • Behavior changed: No
  • Does this need documentation: No

introduced by #62848

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Flat-leaf compaction/checksum mode should keep root variant reads on the root-flat path and should not fall back to doc-value root reconstruction. Avoid probing an empty leaf path, keep explicit doc bucket columns on DOC_COMPACT, and update the doc-bucket unit test to assert that root uses VariantRootColumnIterator in flat-leaf mode.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - `./run-be-ut.sh --run --filter='VariantColumnWriterReaderTest.test_read_doc_compact_from_doc_value_bucket'`: 1 test passed
    - `./run-be-ut.sh --run --filter='*Variant*'`: 172 tests from 16 test suites, 171 passed, 1 skipped
    - `build-support/clang-format.sh be/src/storage/segment/variant/variant_column_reader.cpp be/test/storage/segment/variant_column_writer_reader_test.cpp`
    - `git diff --check`
- Behavior changed: No
- Does this need documentation: No
Copilot AI review requested due to automatic review settings May 8, 2026 12:02
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@eldenmoon eldenmoon changed the title [fix](be) Fix variant flat-leaf root read plan [fix](variant) Fix variant flat-leaf root read plan May 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the BE Variant flat-leaf (compaction/checksum) read planner so root Variant reads stay on the root-flat path (VariantRootColumnIterator) instead of falling back to doc-value based reconstruction, and ensures explicit doc-value bucket columns keep using the DOC_COMPACT read path.

Changes:

  • Adjust flat-leaf read planning to route root-path reads to ReadKind::ROOT_FLAT (VariantRootColumnIterator) and avoid doc-value hierarchical fallbacks in this mode.
  • Ensure doc-value bucket columns are planned as ReadKind::DOC_COMPACT in flat-leaf compaction.
  • Update the doc-bucket unit test to assert the root iterator type in flat-leaf mode.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
be/src/storage/segment/variant/variant_column_reader.cpp Updates flat-leaf read-plan selection to keep root reads on ROOT_FLAT and avoid doc-value hierarchical planning in this mode.
be/test/storage/segment/variant_column_writer_reader_test.cpp Updates unit test expectations to assert VariantRootColumnIterator is used for root reads in flat-leaf compaction mode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread be/test/storage/segment/variant_column_writer_reader_test.cpp
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found in this PR.

Critical checkpoint conclusions:

  • Goal and proof: The change keeps flat-leaf compaction/checksum root variant reads on ROOT_FLAT while preserving explicit doc-bucket reads through DOC_COMPACT; the updated unit test covers the iterator selection for both root and doc buckets.
  • Scope: The modification is small and focused on flat-leaf read planning plus the corresponding test expectation.
  • Concurrency/lifecycle: No new shared state, lock ordering, or lifecycle ownership changes were introduced. The existing _subcolumns_meta_mutex coverage remains unchanged.
  • Configuration/compatibility: No new configuration, storage format, Thrift, or FE/BE protocol compatibility changes were introduced.
  • Parallel paths: Query-mode hierarchical/doc reconstruction remains unchanged; only compaction/checksum flat-leaf planning is affected.
  • Data correctness: The changed root branch now avoids doc-value root reconstruction in flat-leaf mode, while doc bucket paths are still handled before sparse/default fallback.
  • Tests: Existing PR notes report the targeted and variant BE unit tests passed. I attempted to rerun ./run-be-ut.sh --run --filter='VariantColumnWriterReaderTest.test_read_doc_compact_from_doc_value_bucket', but this runner is missing thirdparty/installed/bin/protoc, so local verification could not complete here.
  • Observability/performance: No new hot-path allocations beyond the existing plan construction; no additional observability appears necessary for this narrow planner fix.
  • User focus: No additional user-provided review focus was present.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29537 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4ef3cc9e43d910c47b2da282f767a0e8bf04386b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17806	3893	3934	3893
q2	q3	10721	887	603	603
q4	4661	462	342	342
q5	7464	1319	1159	1159
q6	203	176	135	135
q7	937	940	757	757
q8	9592	1388	1274	1274
q9	6407	5383	5344	5344
q10	6354	2061	1782	1782
q11	470	260	259	259
q12	692	405	294	294
q13	18216	3363	2766	2766
q14	298	284	259	259
q15	q16	900	862	790	790
q17	1122	973	784	784
q18	6496	5691	5571	5571
q19	1424	1302	1079	1079
q20	512	396	262	262
q21	4932	2289	1885	1885
q22	418	350	299	299
Total cold run time: 99625 ms
Total hot run time: 29537 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4199	4084	4102	4084
q2	q3	4631	4770	4146	4146
q4	2093	2160	1390	1390
q5	4967	4962	5162	4962
q6	188	164	130	130
q7	2013	2169	1685	1685
q8	3423	3164	3116	3116
q9	8510	8463	8450	8450
q10	4478	4513	4223	4223
q11	636	428	402	402
q12	704	743	528	528
q13	3246	3631	2898	2898
q14	292	301	268	268
q15	q16	766	772	735	735
q17	1354	1289	1290	1289
q18	8250	7060	6984	6984
q19	1154	1152	1130	1130
q20	2206	2274	1951	1951
q21	6081	5414	4845	4845
q22	575	527	426	426
Total cold run time: 59766 ms
Total hot run time: 53642 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170753 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4ef3cc9e43d910c47b2da282f767a0e8bf04386b, data reload: false

query5	4380	666	520	520
query6	327	218	205	205
query7	4230	552	311	311
query8	324	235	216	216
query9	8851	4024	4038	4024
query10	510	360	308	308
query11	6133	2373	2224	2224
query12	195	137	129	129
query13	1292	615	455	455
query14	6944	5382	5087	5087
query14_1	4436	4354	4344	4344
query15	247	204	178	178
query16	1011	466	412	412
query17	1384	800	645	645
query18	2802	493	366	366
query19	330	216	182	182
query20	142	135	133	133
query21	216	142	122	122
query22	13579	13997	14475	13997
query23	17254	16495	16161	16161
query23_1	16276	16211	16305	16211
query24	8150	1829	1385	1385
query24_1	1376	1346	1419	1346
query25	562	486	416	416
query26	1290	308	182	182
query27	2669	572	337	337
query28	4302	1921	1969	1921
query29	1001	622	500	500
query30	297	234	196	196
query31	1104	1078	939	939
query32	84	69	69	69
query33	531	337	283	283
query34	1141	1116	650	650
query35	787	797	657	657
query36	1337	1350	1104	1104
query37	151	96	83	83
query38	3174	3130	3036	3036
query39	927	932	883	883
query39_1	883	865	889	865
query40	228	162	140	140
query41	63	65	61	61
query42	112	110	108	108
query43	367	339	291	291
query44	
query45	208	200	196	196
query46	1083	1202	721	721
query47	2290	2261	2144	2144
query48	392	428	292	292
query49	639	531	433	433
query50	710	281	212	212
query51	4269	4271	4231	4231
query52	107	102	93	93
query53	254	278	208	208
query54	318	267	265	265
query55	91	89	83	83
query56	300	337	325	325
query57	1436	1403	1283	1283
query58	312	276	276	276
query59	1526	1605	1372	1372
query60	344	336	325	325
query61	162	160	164	160
query62	673	622	577	577
query63	247	197	208	197
query64	2315	814	684	684
query65	
query66	1668	512	395	395
query67	30109	29954	29831	29831
query68	
query69	464	342	309	309
query70	1026	962	958	958
query71	319	279	269	269
query72	3010	2780	2422	2422
query73	866	807	437	437
query74	5110	4914	4722	4722
query75	2815	2665	2336	2336
query76	2285	1119	767	767
query77	407	424	352	352
query78	12951	12927	12421	12421
query79	1481	950	757	757
query80	1430	590	489	489
query81	496	293	238	238
query82	1323	163	125	125
query83	352	280	259	259
query84	262	144	114	114
query85	1072	538	458	458
query86	442	354	316	316
query87	3397	3401	3232	3232
query88	3575	2677	2654	2654
query89	444	376	338	338
query90	1946	185	181	181
query91	174	168	139	139
query92	78	75	71	71
query93	957	966	563	563
query94	721	337	306	306
query95	668	472	348	348
query96	987	733	335	335
query97	2709	2701	2618	2618
query98	241	233	228	228
query99	1106	1129	983	983
Total cold run time: 256897 ms
Total hot run time: 170753 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (10/10) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.74% (27758/37644)
Line Coverage 57.58% (300391/521660)
Region Coverage 54.66% (249639/456721)
Branch Coverage 56.29% (108179/192195)

Copy link
Copy Markdown
Contributor

@csun5285 csun5285 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

PR approved by anyone and no changes requested.

@eldenmoon
Copy link
Copy Markdown
Member Author

@codex

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

PR approved by at least one committer and no changes requested.

@eldenmoon eldenmoon merged commit 27913c9 into apache:master May 9, 2026
40 of 41 checks passed
github-actions Bot pushed a commit that referenced this pull request May 9, 2026
Problem Summary: Flat-leaf compaction/checksum mode should keep root
variant reads on the root-flat path and should not fall back to
doc-value root reconstruction. Avoid probing an empty leaf path, keep
explicit doc bucket columns on DOC_COMPACT, and update the doc-bucket
unit test to assert that root uses VariantRootColumnIterator in
flat-leaf mode.
yiguolei pushed a commit that referenced this pull request May 10, 2026
…#63104 (#63098)

Cherry-picked from #63086 #63104

---------

Co-authored-by: lihangyu <lihangyu@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants