[feature](be) Support expression zonemap pruning#63389
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
12e23d5 to
645db81
Compare
|
/review |
|
/review |
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 32343 ms |
TPC-DS: Total hot run time: 171489 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
5e0a42a to
149b65b
Compare
|
/review |
There was a problem hiding this comment.
I found one blocking issue in the new Parquet row-group expression zonemap pruning path. The overall feature direction is focused and has BE/unit/regression coverage, but this path can crash for complex columns before safely falling back.
Critical checkpoints:
- Goal/test: The PR implements expression-based zonemap pruning for internal and Parquet scans and adds targeted tests, but complex Parquet predicate coverage is missing.
- Scope/focus: The changed feature code is mostly localized to scan predicate/zoning evaluation; the issue is a missing guard in one new path.
- Concurrency/lifecycle: No new shared mutable lifecycle or locking issue found in the reviewed paths.
- Config: New mutable configs are present and read dynamically; no restart-only mismatch found.
- Compatibility/protocol: No new FE-BE protocol compatibility issue found.
- Parallel paths: The page-index Parquet path already guards complex physical columns; the row-group path needs the same fallback.
- Tests: Existing tests cover scalar pruning, but not complex Parquet columns with expression zonemap conjuncts.
- Data correctness/stability: The inline issue can lead to out-of-bounds access/crash instead of conservative fallback.
- Observability/performance: Counters were added; no blocking observability issue found.
User focus points: No additional user-provided review focus was specified.
|
/review |
There was a problem hiding this comment.
Automated review summary:
No additional blocking issues found in the current PR state. I also checked the existing inline thread about Parquet row-group expr-zonemap handling for complex parent columns and did not re-raise it.
Critical checkpoint conclusions:
- Goal/test coverage: The PR adds expression-based zonemap pruning for OLAP and Parquet paths, with BE unit tests and regression coverage for comparisons, IN, null predicates, starts_with, page/segment pruning, and the complex-parent Parquet fallback case.
- Scope/focus: The implementation is focused on expression zonemap evaluation and wiring into existing pruning paths.
- Concurrency/lifecycle: No new shared mutable state or lock-order risk found in the reviewed paths. Late runtime filters remain a missed optimization for already-initialized storage pruning, not a correctness issue.
- Configuration: New expr zonemap configs are mutable BE configs; reviewed call sites check them before applying pruning.
- Compatibility/storage correctness: No storage-format or FE-BE protocol compatibility issue found. Pruning remains conservative on unsupported expressions/types.
- Parallel paths: OLAP segment/page and Parquet row-group/page paths were reviewed; page pruning intentionally handles only single-slot conjunct groups, while unsupported/multi-slot cases fall back conservatively.
- Error handling/memory: Status-returning calls in the new paths are checked, and no untracked large allocation or ownership issue stood out.
- Performance/observability: New profile/stat counters cover filtered segments/pages and fallback reasons; no obvious hot-path anti-pattern requiring a blocking comment found.
User focus: No additional user-provided review focus was specified.
|
run buildall |
|
run external |
TPC-H: Total hot run time: 32367 ms |
TPC-DS: Total hot run time: 170946 ms |
|
run external |
| }; | ||
|
|
||
| inline void record_unsupported_zonemap_filter(const ZoneMapEvalContext& ctx) { | ||
| ++ctx.stats.unsupported_expr_count; |
There was a problem hiding this comment.
unsupported_expr_count 这个指导底统计的是什么东西啊?你这不应该是失效的这个zonemap数目吗?为什么搞一个这个名字还要额外搞个函数,拢共就一个加法
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Support expression-based ZoneMap pruning for internal and Parquet scan paths. Supported expressions include comparisons, IN/NOT IN, IS NULL/IS NOT NULL, and starts_with. The implementation adds segment/page/row-group pruning, conservative fallback semantics, and profile counters, with BE unit and SQL regression coverage. CHAR string range statistics are supported through the trimmed logical bounds produced by ZoneMap deserialization, so CHAR predicates can participate in expression ZoneMap pruning.
This also removes duplicated capability checks between `can_evaluate_zonemap_filter` and `evaluate_zonemap_filter`. Capability checks now focus on expression shape and NULL-literal cases, while storage/runtime datatype invariants are asserted with `DORIS_CHECK` in the evaluation path.
### Release note
Support expression-based ZoneMap pruning for scan predicates to skip data using ZoneMap statistics.
### Check List (For Author)
- Test: Unit Test / Regression test / Format check
- DORIS_HOME=<repo> ninja -C be/ut_build_ASAN Exprs Storage Exec Format doris_be_test
- DORIS_HOME=<repo> ninja -C be/ut_build_ASAN Exprs doris_be_test
- ./run-be-ut.sh --run --filter=SegmentFilterHelpersTest.*
- ./run-be-ut.sh --run --filter=ScanNormalizePredicateTest.*:RuntimeFilterConsumerHelperTest.*:VRuntimeFilterWrapperSamplingTest.*:ScannerLateArrivalRFTest.*
- ./run-be-ut.sh --run --filter=ParquetExprTest.test_expr_zonemap_*
- ./run-be-ut.sh --run --filter=ExprZonemapFilterTest.*
- ./run-be-ut.sh --run --filter=ParquetExprTest.test_expr_zonemap_page_filter_keeps_unsupported_results_and_counts_stats
- doris-local-regression.sh all -d inverted_index_p0 -s test_index_range_in_select
- ./run-regression-test.sh --conf <local-regression-conf> --run -d query_p0/expr_zonemap -s test_expr_zonemap_pruning
- ./run-regression-test.sh --conf <local-regression-conf> --run -d query_p1/expr_zonemap -s test_expr_zonemap_pruning_p1
- doris-local-regression --network 10.26.20.3/24 all -d query_p0/expr_zonemap -s test_expr_zonemap_pruning
- doris-local-regression --network 10.26.20.3/24 all -d query_p1/expr_zonemap -s test_expr_zonemap_pruning_p1
- build-support/clang-format.sh
- build-support/check-format.sh
- git diff --check
- Behavior changed: Yes. Supported scan predicates, including CHAR string predicates, can now prune data with expression ZoneMap evaluation.
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Refine expression ZoneMap pruning checks by keeping expression shape and NULL-literal gating in can_evaluate_zonemap_filter, while relying on DORIS_CHECK for runtime datatype invariants. Rename the runtime fallback statistic from unsupported expressions to unusable ZoneMap evaluations so the profile reflects cases where an evaluable expression cannot use the current ZoneMap or context, such as missing statistics or unusable range metadata. Add focused unit-test coverage for the updated can/evaluate contract and the renamed statistic.
### Release note
None
### Check List (For Author)
- Test: Unit Test / Format check
- build-support/clang-format.sh
- git diff --check
- ./run-be-ut.sh --run --filter=ExprZonemapFilterTest.ComparisonZonemapHandlesNullAndUnsupportedInputs:ExprZonemapFilterTest.StartsWithZonemapUsesPrefixRange
- ./run-be-ut.sh --run --filter=ExprZonemapFilterTest.ComparisonZonemapHandlesNullAndUnsupportedInputs:ExprZonemapFilterTest.StartsWithZonemapUsesPrefixRange:ExprZonemapFilterTest.CompoundPredicateEvaluatesChildrenForZonemap:ExprZonemapFilterTest.ExprContextZonemapEvaluationShortCircuitsOnNoMatch:ParquetExprTest.test_expr_zonemap_page_filter_prunes_pages_and_intersects_ranges:ParquetExprTest.test_expr_zonemap_page_filter_keeps_unsupported_results_and_counts_stats:ParquetExprTest.test_expr_zonemap_row_group_filter_skips_complex_parent_column:ParquetExprTest.test_expr_zonemap_row_group_filter_skips_type_mismatch_column
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve? Issue Number: None Related PR: None Problem Summary: Remove unrelated newline-only changes from runtime_filter_consumer_helper files so the expression ZoneMap merge request only includes relevant code and tests. ### Release note None ### Check List (For Author) - Test: No need to test (newline-only cleanup) - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Expression zonemap evaluation used a `fetch_zone_map` helper only to forward to `ZoneMapEvalContext::zone_map` and update unsupported-evaluation statistics on a missing zonemap. This made the statistics update happen at the fetch site instead of the decision site that turns the missing zonemap into an unsupported filter result. Remove the redundant helper, call `ctx.zone_map` directly, and update the unsupported statistics when the evaluator actually returns `kUnsupported` for the missing zonemap.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- `./run-be-ut.sh --run --filter='ExprZonemapFilterTest.ComparisonZonemapHandlesNullAndUnsupportedInputs:ExprZonemapFilterTest.StartsWithZonemapUsesPrefixRange:ExprZonemapFilterTest.NullZonemapUsesNullFlagsOnly:ExprZonemapFilterTest.CompoundPredicateEvaluatesChildrenForZonemap:ExprZonemapFilterTest.ExprContextZonemapEvaluationShortCircuitsOnNoMatch:ParquetExprTest.test_expr_zonemap_page_filter_keeps_unsupported_results_and_counts_stats'`
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Expression zonemap evaluation updated unusable-evaluation statistics inside the helper that fetches a compatible slot type. This was inconsistent with the rest of the evaluator flow where unsupported statistics are updated at the branch that returns `kUnsupported`. Move the missing-slot-type statistics update to the `kUnsupported` return sites for comparison, starts_with, and IN zonemap evaluation, and add unit coverage to ensure the count is incremented exactly once.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- `./run-be-ut.sh --run --filter='ExprZonemapFilterTest.MissingSlotTypeCountsUnsupportedZonemapEvalOnce:ExprZonemapFilterTest.ComparisonZonemapHandlesNullAndUnsupportedInputs:ExprZonemapFilterTest.StartsWithZonemapUsesPrefixRange:ParquetExprTest.test_expr_zonemap_page_filter_keeps_unsupported_results_and_counts_stats'`
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: The starts_with zonemap evaluator used a private helper that only wrapped `Field::create_field<TYPE_STRING>`. Remove the redundant helper and construct the string fields directly at the two call sites to keep the evaluator simpler.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- `./run-be-ut.sh --run --filter='ExprZonemapFilterTest.StartsWithZonemapUsesPrefixRange:ExprZonemapFilterTest.CharZonemapUsesTrimmedLogicalBounds:ExprZonemapFilterTest.MissingSlotTypeCountsUnsupportedZonemapEvalOnce'`
- Behavior changed: No
- Does this need documentation: No
- Require exact non-string type match in data_types_compatible - Move NULL literal check before DORIS_CHECK in comparison can_evaluate - Remove unused zonemap materialize sync members from VDirectInPredicate - Add DORIS_CHECK for type compatibility in VInPredicate materialize path Issue Number: None
|
/review |
There was a problem hiding this comment.
I found one blocking issue: the new regression suites verify deterministic SQL results with Groovy assertions and omit generated .out files. That bypasses the standard regression golden-output mechanism for a kernel feature, so the tests should be converted before this is ready.
Critical checkpoint conclusions:
- Goal and coverage: the PR implements expression ZoneMap pruning for OLAP and Parquet and adds BE/unit coverage plus regression suites, but the regression coverage does not yet satisfy Doris regression-test requirements because deterministic SQL results are not captured through generated outputs.
- Scope: the current-head changes are cohesive around expression ZoneMap pruning; I did not find unrelated code changes requiring a separate review comment.
- Concurrency and lifecycle: no new locks or static lifecycle risks were found in the reviewed paths. The existing shallow-clone/rebind race is already covered by an existing inline thread, so I did not duplicate it.
- Config and compatibility: the new FE session variable is forwarded through optional
TQueryOptionsand defaults to enabled when absent; I found no storage-format or rolling-upgrade compatibility blocker. - Parallel paths: OLAP segment/page and Parquet row-group/page paths were reviewed. The already-known concerns in the existing threads were treated as prior context and not re-raised.
- Tests and results: BE and Parquet unit tests exist, but the new regression suites need
qt_sql/order_qtcases and generated.outfiles for deterministic results. - Observability: counters are present for filtered segments/pages and IN/unsupported evaluations; no additional observability blocker found.
- Transactions, persistence, data writes, and security: not applicable to this PR's changed behavior; no security review was requested.
- Performance: unsupported-expression gating is present in the hot page-zonemap paths; I found no additional concrete performance blocker beyond the existing review threads.
- User focus: no additional user-provided review focus was supplied.
| def matchedRows = sql """ | ||
| SELECT COUNT(*) FROM test_expr_zonemap_pruning WHERE starts_with(v, 'a') | ||
| """ | ||
| assertEquals(4096L, matchedRows[0][0] as long) |
There was a problem hiding this comment.
The deterministic query results in this new regression suite are checked with Groovy assertEquals, and the PR does not add a generated .out file under regression-test/data/.... Doris regression tests should record deterministic SQL results with qt_sql/order_qt (or explicit ORDER BY) and commit the generated output; direct assertions should be reserved for checks that cannot be represented in golden output, such as profile counters. Please convert these result checks to qt_... cases and add the generated .out file.
| SELECT '${notNullToken}', COUNT(*) FROM test_expr_zonemap_pruning_p1 | ||
| WHERE nullable_v IS NOT NULL OR starts_with(v, 'never_') | ||
| """ | ||
| assertEquals(2048L, notNullRows[0][1] as long) |
There was a problem hiding this comment.
Same issue in this P1 suite: deterministic SQL result checks use Groovy assertEquals, and the PR does not add a generated .out file. Please use qt_sql/order_qt for the expected result queries and commit the generated output, leaving only the profile-counter checks as Groovy assertions.
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Page-level expression ZoneMap pruning uses reader-schema column ordinals to fetch page ZoneMaps. For layouts where storage expression slot ordinals need rebinding to the reader schema, VExprContext::clone() does not deep-copy the expression tree yet, so rebinding can mutate shared slot refs and make later page-level pruning read the wrong column's ZoneMap. This commit reuses the existing storage expression slot/schema layout guard for page-level expression ZoneMap pruning, conservatively skipping the optimization when the mapping is uncertain, and removes an unused include left by earlier cleanup.
### Release note
None
### Check List (For Author)
- Test: Unit Test / Compile check / Format check
- git diff --check
- ninja -C be/ut_build_ASAN src/storage/CMakeFiles/Storage.dir/segment/segment.cpp.o src/storage/CMakeFiles/Storage.dir/segment/segment_iterator.cpp.o
- ./run-be-ut.sh --run --filter='ExprZonemapFilterTest.*'
- Behavior changed: No
- Does this need documentation: No
|
/review |
There was a problem hiding this comment.
Automated review completed.
I did not find any additional distinct issue beyond the already-open inline review threads, so I did not add duplicate inline comments. The existing threads about regression result files/assert-based regression checks and type/rebinding safety remain part of the review context.
Part 1.3 checkpoint conclusions:
- Goal/test: The PR implements expression zonemap pruning for OLAP segment/page paths and Parquet row-group/page paths, with conservative fallback for unsupported predicates. BE unit tests and regression suites cover the main comparison/IN/null/starts_with paths; existing review threads already cover the regression-test result-file issue.
- Scope: The change is focused on expression zonemap evaluation, scan integration, counters, the FE session variable, and targeted tests. I did not see unrelated behavior mixed into the PR.
- Concurrency/lifecycle: I did not find new shared mutable cross-thread state. Per-reader/per-segment contexts are built or cloned for scan execution; existing index/page-loading synchronization is reused. No new static initialization or ownership-cycle risk stood out.
- Config/compatibility: The new FE session variable is wired to optional thrift query options, and BE treats an absent field as enabled, which is compatible with mixed versions. No storage format or transaction persistence change is introduced.
- Parallel paths/conditions: OLAP and Parquet row-group/page paths are both handled. Guard conditions for schema rebinding, missing stats, complex columns, type mismatch, null-aware runtime filters, and synthetic dictionary predicates are conservative and documented.
- Testing/results: I did not run the full test suite in this review pass;
git diff --checkis clean. No generated regression.outfiles are present, which is already raised in existing inline comments. - Observability: New filtered-row/group/page and unusable-evaluation counters are exposed in OLAP and Parquet profiles; I did not see noisy new logging.
- Transactions/writes: The PR affects read-time pruning only and does not modify transaction, delete-bitmap, EditLog, or data-write paths.
- FE-BE variable passing: The session variable is added to
SessionVariable.toThrift()and the thrift query option; no missing BE consumer path was found in this pass. - Performance: The implementation avoids evaluating unsupported expressions where possible and keeps point-check thresholds for large IN lists. No additional performance issue distinct from existing review threads was found.
User focus: review_focus.txt contains no additional user-provided focus points; no focus-specific issue was found.
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: The expression ZoneMap scan profile counters are easy to confuse because some counters describe pruning results while others describe evaluation fallback paths. Add comments near the Olap scan counter members to document the meaning of filtered segment/page, unusable evaluation, IN point-check, and IN range-only counters.
### Release note
None
### Check List (For Author)
- Test: Unit Test / Regression test / Format check
- build-support/clang-format.sh
- git diff --check
- ./run-be-ut.sh --run --filter='ExprZonemapFilterTest.*:ParquetExprTest.test_expr_zonemap_*'
- doris-local-regression --network 10.26.20.3/24 run -d query_p0/expr_zonemap -s test_expr_zonemap_pruning
- doris-local-regression --network 10.26.20.3/24 run -d query_p1/expr_zonemap -s test_expr_zonemap_pruning_p1
- Behavior changed: No
- Does this need documentation: No
|
/review |
There was a problem hiding this comment.
No additional non-duplicate blocking issues found in the current cumulative GitHub PR diff. I reviewed the existing inline threads first and did not re-submit concerns that were already raised there.
Critical checkpoint conclusions:
- Goal/test proof: The PR adds BE unit coverage and regression suites for expression ZoneMap pruning. The regression golden-output concern is already covered by an existing thread, so I did not duplicate it.
- Scope/focus: The actual GitHub PR diff is focused on expression ZoneMap pruning, scan/profile counters, the FE session variable, thrift plumbing, and related tests.
- Correctness/null/type semantics: The evaluator returns
NoMatchonly when min/max or null flags prove that a filter cannot match, and it keeps unsupported cases as may-match. Existing threads already cover the remaining type-compatibility/assertion concerns. - Storage and format paths: OLAP segment/page pruning and Parquet row-group/page pruning were checked for conservative fallback behavior. Existing threads already cover page-index overhead/counter and schema-layout concerns.
- Concurrency/lifecycle: I did not find a new concurrency or object-lifetime issue in the current diff.
- Config/FE-BE compatibility:
enable_expr_zonemap_filteris optional in thrift, defaults to enabled, and is guarded in BE for unset query options. - Performance/observability: New counters are plumbed through OLAP and Parquet profiles; I did not find a new distinct performance blocker beyond the already-known page-index discussion.
- Tests/results: New tests exercise OLAP, Parquet, runtime-filter, and expression semantics. Existing review context already flags the regression-result style issue.
User focus: review_focus.txt contains no additional user-provided focus points.
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: FunctionStringStartsWith adds starts_with-specific expression zonemap pruning, but the unit test did not explicitly cover the new function override. This adds coverage for the factory-registered starts_with function, supported and non-prunable literal shapes, prefix lower and upper range pruning, NULL-only zonemaps, and the no-next-prefix max-byte edge.
### Release note
None
### Check List (For Author)
- Test: Unit Test / Format check
- build-support/clang-format.sh
- git diff --check
- ./run-be-ut.sh --run --filter='ExprZonemapFilterTest.FunctionStringStartsWithZonemapUsesPrefixRange:function_string_test.function_starts_with_test'
- ./run-be-ut.sh --run --filter='ExprZonemapFilterTest.*'
- Behavior changed: No
- Does this need documentation: No
|
/review |
There was a problem hiding this comment.
Automated review completed against the net PR diff (43 files) and the existing review threads.
Requesting changes. I did not add new inline comments because the blocking points I still see are already covered by existing threads:
- The new P0/P1 regression suites still check deterministic SQL results with Groovy
assertEqualsand do not add generated.outfiles. Doris regression-test standards require deterministic SQL results to be represented withqt_sql/order_qtplus generated output; keep Groovy assertions for profile/counter checks only. - The comparison zonemap capability path still advertises support and then relies on a datatype-compatibility
DORIS_CHECKin the evaluation path. If an incompatible BE-materialized slot/literal comparison reaches this path, the scanner aborts instead of conservatively returning may-match/unsupported; this remains covered by the existing comparison thread. - The earlier OLAP/Parquet concerns around unsupported page-index reads, complex Parquet columns, and page-level row-group counters appear addressed in the current net diff.
Critical checkpoint conclusions:
- Goal/test: The expr-zonemap pruning goal is clear and has BE unit tests plus regression suites, but the regression outputs are not yet in Doris golden-file form.
- Scope/focus: The net PR is focused on expr-zonemap pruning; the cloud/broker changes visible in a raw base-tip diff are not part of the GitHub PR diff and were not reviewed as PR changes.
- Concurrency/lifecycle: No new thread/lifetime issue found beyond the existing expression clone/rebinding discussions; current OLAP segment/page paths now skip reader-schema mismatch layouts.
- Config/protocol:
enable_expr_zonemap_filteris added as an FE session variable and forwarded throughTQueryOptions; BE defaults to enabled when the field is absent, preserving mixed-version behavior. - Compatibility/storage: No storage-format change found; reviewed OLAP and Parquet paths fall back conservatively for missing or unusable stats.
- Parallel paths: OLAP segment/page and Parquet row-group/page paths are both wired.
- Observability/performance: Counters are added; the existing page-index hot-path concerns appear addressed in the current diff.
- Transactions/data writes/security: Not applicable to the net PR.
- User focus:
review_focus.txtcontains no additional user-provided focus points.
### What problem does this PR solve? Issue Number: None Related PR: apache#63389 Problem Summary: This change keeps expression zonemap pruning helper logic focused and reusable. It centralizes single-slot zonemap candidate detection, scopes HybridSet literal materialization to VDirectInPredicate, removes redundant defensive handling after can_evaluate_zonemap_filter has already validated the expression shape, asserts the segment zonemap invariant at the ColumnReader accessor boundary, and drops an unrelated runtime filter unit-test change from the PR diff. ### Release note None ### Check List (For Author) - Test: Unit Test; Code style - `./build-support/clang-format.sh be/src/exprs/expr_zonemap_filter.cpp be/src/exprs/expr_zonemap_filter.h be/src/exprs/vdirect_in_predicate.h be/src/exprs/vexpr.cpp be/src/exprs/vexpr.h be/src/exprs/vin_predicate.cpp be/src/storage/segment/column_reader.cpp be/src/storage/segment/segment.cpp be/src/storage/segment/segment_iterator.cpp be/test/exec/runtime_filter/runtime_filter_consumer_helper_test.cpp` - `git diff --check` - `git diff --cached --check` - `./run-be-ut.sh --run --filter='ExprZonemapFilterTest.*:ParquetExprTest.test_expr_zonemap_*'` - Behavior changed: No - Does this need documentation: No
|
/review |
There was a problem hiding this comment.
I reviewed the current final diff against the existing inline review threads and did not find additional distinct inline issues to add without duplicating already-known review context.
Requesting changes because the current PR still appears to contain known blocking items from existing threads:
be/src/exprs/function/functions_comparison.h:comparison_zonemap_detail::can_evaluate()still usesDORIS_CHECKfor slot/literal type compatibility. A pushed-down incompatible materialized literal can abort instead of conservatively disabling expr-zonemap evaluation.regression-test/suites/query_p0/expr_zonemap/test_expr_zonemap_pruning.groovyandregression-test/suites/query_p1/expr_zonemap/test_expr_zonemap_pruning_p1.groovy: the new regression suites still use GroovyassertEqualsfor deterministic SQL results and do not add generated.outfiles. Doris regression cases should useqt_sql/order_qtwith committed output for deterministic results.- Please also resolve or explicitly confirm the existing Parquet unsupported-expression page-index thread. The current implementation avoids the column-index read in some cases, but still parses offset indexes before computing
_has_expr_zonemap_page_filter.
Checkpoint conclusions:
- Goal and tests: expression zonemap pruning is implemented for BE expression, OLAP, and Parquet paths with unit/regression coverage, but the regression proof is incomplete while golden outputs are missing.
- Scope: the final GitHub PR file list is focused on expression-zonemap pruning and its tests; I did not find unrelated final-diff files.
- Concurrency and lifecycle: OLAP segment/page expr-zonemap now skips layouts needing rebind, addressing the earlier shared shallow-clone false-prune path; no new locks or threads were introduced.
- Config and protocol: the FE session variable and BE
TQueryOptionsplumbing are present with default-on behavior; I did not find a storage format change. - Parallel paths: OLAP segment/page and Parquet row-group/page pruning paths are covered; ORC only gets lazy-context propagation.
- Conditional checks: existing type-mismatch handling in comparison zonemap evaluation remains blocking because it asserts instead of returning unsupported.
- Test results: I did not run the full Doris build or regression suite during review.
- Observability: profile counters were added for filtered segments/pages, unusable evaluations, and IN-check modes.
- Performance: supported-expression gating is present, but the existing Parquet page-index concern should be resolved or confirmed.
- User focus: no additional user-provided review focus was supplied in
.code-review.Ms1ioT/review_focus.txt.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Support expression-based ZoneMap pruning for internal and Parquet scan paths. Supported expressions include comparisons, IN/NOT IN, IS NULL/IS NOT NULL, and starts_with. The implementation adds segment/page/row-group pruning, conservative fallback semantics, and profile counters, with BE unit and SQL regression coverage. CHAR string range statistics are supported through the trimmed logical bounds produced by ZoneMap deserialization, so CHAR predicates can participate in expression ZoneMap pruning.
This follow-up also removes duplicated capability checks between
can_evaluate_zonemap_filterandevaluate_zonemap_filter. Capability checks now focus on expression shape and NULL-literal cases, while storage/runtime datatype invariants are asserted withDORIS_CHECKin the evaluation path.Release note
Support expression-based ZoneMap pruning for scan predicates to skip data using ZoneMap statistics.
Check List (For Author)