Skip to content

[Improvement](scan) support push down limit to segment iterator#62222

Open
BiteTheDDDDt wants to merge 17 commits intoapache:masterfrom
BiteTheDDDDt:dev_0408_4
Open

[Improvement](scan) support push down limit to segment iterator#62222
BiteTheDDDDt wants to merge 17 commits intoapache:masterfrom
BiteTheDDDDt:dev_0408_4

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Apr 8, 2026

This pull request introduces significant improvements to scan operator logic, particularly enhancing the correctness and efficiency of LIMIT and predicate pushdown in OLAP scans. The main changes include a robust mechanism for sharing LIMIT counters among scanners, more precise control over predicate pushdown, and stricter validation for residual predicates. These changes help ensure that queries with LIMIT and TopN semantics return accurate results and avoid incorrect or inefficient execution paths.

Improvements to LIMIT handling and scanner coordination:

  • Introduced a shared atomic counter (_shared_scan_limit) for LIMIT queries, allowing all scanners to collectively respect the global row limit. Scanners now check and update this counter to stop early when the LIMIT is reached, preventing over-scanning and improving efficiency. TopN scans bypass this mechanism as required. [1] [2] [3] [4] [5] [6] [7] [8] [9]

  • Adjusted scanner and scan local state interfaces to support the new shared LIMIT mechanism, including new virtual methods and atomic counter management. [1] [2] [3]

Predicate pushdown and validation enhancements:

  • Refactored _should_push_down_common_expr to require the expression as an argument and added logic to ensure only eligible expressions are pushed down, considering storage merge requirements and key columns. [1] [2] [3] [4]

  • Added a validation step (validate_residual_scan_conjuncts) to prevent unsupported residual predicates (like SEARCH or disabled MATCH) and to ensure correctness when using COUNT_ON_INDEX pushdown.

OLAP scan and segment limit logic improvements:

  • Improved the logic for pushing down segment limits and TopN optimizations, ensuring that such optimizations are only enabled when all necessary conditions are met (e.g., no residual predicates, no runtime filters, storage does not require merging). The code now asserts correct states and disables shared LIMITs for TopN scans as appropriate. [1] [2]

Code cleanup and maintenance:

  • Removed unused or obsolete logic related to filter block conjuncts and the old limit quota acquisition method. [1] [2]

Dependency and include updates:

  • Added missing includes for expression types to support new logic.
CREATE DATABASE IF NOT EXISTS lm_bench;
USE lm_bench;

DROP TABLE IF EXISTS lm_wide_fact_10m;
CREATE TABLE lm_wide_fact_10m (
    id BIGINT NOT NULL,
    filter_key INT NOT NULL,
    sort_key BIGINT NOT NULL,
    metric BIGINT NOT NULL,
    payload_a VARCHAR(1024) NOT NULL,
    payload_b VARCHAR(1024) NOT NULL,
    long_payload_a VARCHAR(4096) NOT NULL,
    long_payload_b VARCHAR(4096) NOT NULL
)
ENGINE=OLAP
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 32
PROPERTIES (
    "replication_num" = "1"
);

INSERT INTO lm_wide_fact_10m
SELECT
    number AS id,
    CAST(number % 1000000 AS INT) AS filter_key,
    (number * 2654435761) % 1000000007 AS sort_key,
    (number * 1103515245 + 12345) % 2147483647 AS metric,
    REPEAT(CONCAT('payload_a_', CAST(number AS STRING), '_',
        CAST((number * 1315423911) % 1000000007 AS STRING), '_'), 16) AS payload_a,
    REPEAT(CONCAT('payload_b_', CAST(number AS STRING), '_',
        CAST((number * 2654435761) % 1000000009 AS STRING), '_'), 16) AS payload_b,
    REPEAT(CONCAT('long_payload_a_', CAST(number AS STRING), '_',
        CAST((number * 48271) % 2147483647 AS STRING), '_'), 64) AS long_payload_a,
    REPEAT(CONCAT('long_payload_b_', CAST(number AS STRING), '_',
        CAST((number * 69621) % 2147483629 AS STRING), '_'), 64) AS long_payload_b
FROM numbers("number" = "10000000");

ANALYZE TABLE lm_wide_fact_10m WITH SYNC;


-- ============================================================
-- Case 1: TopN next + LIMIT + high-filter-rate pushed-down predicate
-- Expected observation:
--   filter_key is a narrow pushed-down predicate and keeps about 1% rows.
--   TopN works on narrow metric/sort_key, while long strings are final output.
-- ============================================================

SELECT
    id,
    filter_key,
    metric,
    sort_key,
    payload_a,
    payload_b,
    long_payload_a,
    long_payload_b
FROM lm_wide_fact_10m
WHERE filter_key < 10000
ORDER BY metric DESC, sort_key ASC
LIMIT 10;
-- Result: 0.12 sec -> 0.10 sec

-- ============================================================
-- Case 2: Normal scan + LIMIT
-- Expected observation:
--   No ORDER BY and no predicate. This isolates ordinary scan limit behavior
--   while still returning long string columns.
-- ============================================================

SELECT
    id,
    filter_key,
    metric,
    sort_key,
    payload_a,
    payload_b,
    long_payload_a,
    long_payload_b
FROM lm_wide_fact_10m
LIMIT 10;
-- Result: 0.06 sec -> 0.01 sec

-- ============================================================
-- Case 3: Normal scan + LIMIT + high-filter-rate pushed-down predicate
-- Expected observation:
--   No ORDER BY. This isolates scan predicate + limit behavior with long
--   string result columns and a predicate that filters about 99% of rows.
-- ============================================================

SELECT
    id,
    filter_key,
    metric,
    sort_key,
    payload_a,
    payload_b,
    long_payload_a,
    long_payload_b
FROM lm_wide_fact_10m
WHERE filter_key < 10000
LIMIT 10;
-- Result: 0.07 sec -> 0.02 sec

Copilot AI review requested due to automatic review settings April 8, 2026 08:35
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a unified LIMIT pushdown mechanism that lets the storage layer (SegmentIterator) dynamically cap per-batch reads using both a local per-segment limit and a shared cross-scanner remaining-row budget.

Changes:

  • Added a unified StorageReadOptions::ReadLimit (local + shared remaining) and wired it through TabletReader/RowsetReader to SegmentIterator.
  • Updated scan execution to coordinate a shared scan limit across scanners (early-exit + decrement on produced blocks) and removed scheduler-side quota truncation.
  • Added a new regression test suite and a BE unit test for SegmentIterator limit optimization eligibility.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
regression-test/suites/query_p0/limit/test_unified_limit_pushdown.groovy New regression coverage for unified limit pushdown scenarios (inverted index + multi-bucket + offset/order-by).
be/test/storage/segment/segment_iterator_limit_opt_test.cpp New unit test targeting _can_opt_limit_reads() logic.
be/src/storage/tablet/tablet_reader.h / .cpp Propagates shared_scan_limit into storage reader context.
be/src/storage/rowset/rowset_reader_context.h Carries shared_scan_limit down to rowset readers.
be/src/storage/rowset/beta_rowset_reader.cpp Maps topn/general limits + shared remaining into StorageReadOptions.read_limit.
be/src/storage/iterators.h Replaces topn_limit with unified read_limit in StorageReadOptions.
be/src/storage/segment/segment_iterator.h / .cpp Renames/extends limit optimization check and applies unified limit cap/exhaustion behavior.
be/src/storage/iterator/vcollect_iterator.cpp Propagates general limit into rowset reader limit path for SegmentIterator I/O reduction.
be/src/exec/operator/scan_operator.h / .cpp Exposes shared_scan_limit_ptr() to scanners.
be/src/exec/scan/scanner.cpp Early-exits and decrements shared limit to coordinate scanners.
be/src/exec/scan/scanner_scheduler.cpp Removes scheduler-side shared limit quota truncate/discard logic.
be/src/exec/scan/scanner_context.cpp Removes “finish when shared limit exhausted” logic and pending-task short-circuit.
be/src/exec/scan/olap_scanner.cpp Propagates shared scan limit pointer to TabletReader params (no-merge path).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread be/src/exec/scan/scanner.cpp Outdated
Comment on lines +127 to +134
// Early exit when the global shared scan limit is exhausted.
// This avoids unnecessary I/O when other scanners have already
// collected enough rows for the SQL LIMIT.
auto* shared_limit = _local_state->shared_scan_limit_ptr();
if (shared_limit && shared_limit->load(std::memory_order_acquire) <= 0) {
*eof = true;
return Status::OK();
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared_scan_limit uses -1 as the sentinel for “no limit” (see ScanOperatorX::_shared_scan_limit docs), but this early-exit treats any value <= 0 as exhausted. If _shared_scan_limit is -1, scanners will immediately return EOF and queries without a SQL LIMIT can return zero rows. Please change the checks to treat negative values as “unlimited” (do not early-exit) and only stop when the counter is actually 0.

Copilot uses AI. Check for mistakes.
Comment thread be/src/exec/scan/scanner.cpp Outdated
Comment on lines +171 to +175
// satisfied. The counter may go negative when multiple scanners
// subtract concurrently; this is harmless because the operator's
// reached_limit() provides the authoritative truncation.
if (shared_limit && block->rows() > 0) {
shared_limit->fetch_sub(block->rows(), std::memory_order_acq_rel);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fetch_sub() here can drive the shared limit counter negative (e.g. concurrent scanners), which collides with the existing “-1 means no limit” sentinel and makes later <= 0 checks ambiguous. Consider switching to a CAS-based decrement that clamps at 0 for limited scans (and skips decrement entirely when the counter is negative/unlimited), or use the existing ScannerContext::acquire_limit_quota() logic to preserve the counter’s semantics.

Suggested change
// satisfied. The counter may go negative when multiple scanners
// subtract concurrently; this is harmless because the operator's
// reached_limit() provides the authoritative truncation.
if (shared_limit && block->rows() > 0) {
shared_limit->fetch_sub(block->rows(), std::memory_order_acq_rel);
// satisfied. Preserve negative values as the existing "no limit"
// sentinel, and clamp limited scans at 0 so concurrent scanners
// cannot drive the counter into the sentinel range.
if (shared_limit && block->rows() > 0) {
int64_t current = shared_limit->load(std::memory_order_acquire);
while (current > 0) {
const int64_t next = std::max<int64_t>(0, current - block->rows());
if (shared_limit->compare_exchange_weak(current, next,
std::memory_order_acq_rel,
std::memory_order_acquire)) {
break;
}
}

Copilot uses AI. Check for mistakes.
Comment thread be/src/exec/scan/scanner.cpp Outdated
Comment on lines +188 to +190
*eof = *eof || (_limit > 0 && _num_rows_return >= _limit);
// Also stop when the global shared scan limit is exhausted.
*eof = *eof || (shared_limit && shared_limit->load(std::memory_order_acquire) <= 0);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This EOF condition also uses <= 0 on the shared limit. With “-1 means no limit”, this will force EOF for unlimited scans. After fixing the counter semantics, this check should only stop when the shared limit is actually exhausted (typically == 0 for the limited case).

Copilot uses AI. Check for mistakes.
Comment thread be/src/storage/iterators.h Outdated
Comment on lines +145 to +172
struct ReadLimit {
size_t local_limit = 0;
std::atomic<int64_t>* global_remaining = nullptr;

bool active() const { return local_limit > 0 || global_remaining != nullptr; }

// Cap max_rows by the effective limit.
uint32_t cap(uint32_t max_rows) const {
uint32_t limit = max_rows;
if (local_limit > 0 && local_limit < limit) {
limit = static_cast<uint32_t>(local_limit);
}
if (global_remaining != nullptr) {
int64_t remaining =
global_remaining->load(std::memory_order_acquire);
if (remaining >= 0 &&
remaining < static_cast<int64_t>(limit)) {
limit = static_cast<uint32_t>(remaining);
}
}
return limit;
}

// True when the global budget is fully exhausted (0 or negative).
bool exhausted() const {
return global_remaining != nullptr &&
global_remaining->load(std::memory_order_acquire) <= 0;
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageReadOptions now exposes std::atomic and std::memory_order_* in this header, but <atomic> isn’t included here. Also, ReadLimit::exhausted() currently treats any value <= 0 as exhausted; that conflicts with the established “-1 means no limit” semantics and can cause readers to stop immediately when the pointer is set but unlimited. Please include <atomic> and make exhausted() consistent with the sentinel/unlimited behavior (and ideally keep the counter from going negative for the limited case).

Copilot uses AI. Check for mistakes.
Comment on lines +2624 to +2626
} else if (_opts.read_limit.exhausted()) {
// Even without index optimization eligibility, stop reading when
// the global limit is fully exhausted — no more rows needed.
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new read_limit.exhausted() branch, nrows_read_limit can be forced to 0 even when the scan is “unlimited” if the shared counter uses the -1 sentinel and exhausted() treats <= 0 as exhausted. Please ensure the shared-limit semantics are consistent end-to-end (negative = unlimited), otherwise SegmentIterator may stop reading rows unexpectedly.

Suggested change
} else if (_opts.read_limit.exhausted()) {
// Even without index optimization eligibility, stop reading when
// the global limit is fully exhausted — no more rows needed.
} else if (_opts.read_limit.local_limit >= 0 && _opts.read_limit.exhausted()) {
// Even without index optimization eligibility, stop reading when
// the global limit is fully exhausted — no more rows needed.
// Negative limits are treated as unlimited and must not force reads to stop.

Copilot uses AI. Check for mistakes.
Comment thread be/src/exec/operator/scan_operator.cpp Outdated

template <typename Derived>
std::atomic<int64_t>* ScanLocalState<Derived>::shared_scan_limit_ptr() {
return &_parent->cast<typename Derived::Parent>()._shared_scan_limit;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared_scan_limit_ptr() currently returns a non-null pointer unconditionally. Since _shared_scan_limit is documented as “-1 means no limit”, returning a pointer when the value is negative causes downstream code to misinterpret “no limit” as “limit exhausted” (see Scanner/SegmentIterator checks). Consider returning nullptr when _shared_scan_limit is negative (or otherwise exposing an API that makes the “unlimited” case unambiguous).

Suggested change
return &_parent->cast<typename Derived::Parent>()._shared_scan_limit;
auto& shared_scan_limit = _parent->cast<typename Derived::Parent>()._shared_scan_limit;
return shared_scan_limit.load() < 0 ? nullptr : &shared_scan_limit;

Copilot uses AI. Check for mistakes.
Comment thread be/src/storage/tablet/tablet_reader.h Outdated
Comment on lines +222 to +226
// Pointer to the shared remaining-row budget maintained by ScanOperatorX.
// Decremented by ScannerScheduler after each block is produced.
// Storage layer uses it as a dynamic hint to reduce I/O.
// nullptr means no dynamic limit. -1 means no limit.
std::atomic<int64_t>* shared_scan_limit = nullptr;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new field introduces std::atomic<int64_t> in a public header, but the header doesn’t include <atomic>. Also, the comment says the budget is “Decremented by ScannerScheduler”, but the decrement was moved into Scanner::get_block() in this PR; please update the comment so it reflects the current ownership/location of the decrement logic.

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +113
// Dynamic shared limit pointer from ScanOperatorX.
// See ReaderParams.shared_scan_limit for details.
std::atomic<int64_t>* shared_scan_limit = nullptr;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct now uses std::atomic<int64_t> but the header does not include <atomic>. Please add the include to keep the header self-contained (and avoid relying on transitive includes).

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +54
// Build a SegmentIterator with minimal opts for _can_opt_limit_reads() testing.
// The segment pointer is null — only _opts and internal maps are accessed.
std::unique_ptr<SegmentIterator> make_iter() {
auto iter = std::make_unique<SegmentIterator>(nullptr, _read_schema);
iter->_opts.tablet_schema = _tablet_schema;
iter->_opts.stats = &_stats;
// delete_condition_predicates is default-constructed (empty)
return iter;
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test directly accesses SegmentIterator private members (_opts, _column_predicate_index_exec_status) and calls the private method _can_opt_limit_reads(). As written it won’t compile unless you use the existing test pattern (e.g. #define private public before including the header, or declaring the test as a friend), or refactor to test via public APIs.

Copilot uses AI. Check for mistakes.
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 1 issue in this PR.

Critical checkpoint conclusions:

  • Goal of the task: The PR aims to push LIMIT into SegmentIterator for lower I/O. The runtime path is implemented end to end, and regression coverage was added, but the new BE unit test currently prevents the test target from compiling, so the task is not complete yet.
  • Scope/minimality: The production change is reasonably focused on scan/storage limit propagation.
  • Concurrency: Applicable. I checked the new shared-limit flow across ScanOperator, Scanner, ScannerContext, and SegmentIterator; the atomic accesses use acquire/release semantics and I did not find a confirmed lock-order or deadlock regression in the reviewed paths.
  • Lifecycle/static initialization: No special lifecycle or static-init issue found in the reviewed change.
  • Configuration: No new configuration items.
  • Compatibility: No incompatible FE/BE protocol or storage-format change identified.
  • Parallel code paths: Applicable. The change updates both the general-limit and order-by/topn-related storage-limit path; I did not find a second missed production path in the reviewed files.
  • Special conditional checks: The new limit checks are commented and understandable.
  • Test coverage: Applicable. Regression coverage and a BE UT were added, but the new UT is not currently buildable, so the added coverage is not runnable as submitted.
  • Observability: No additional observability looked necessary for this optimization.
  • Transaction/persistence/data writes: Not applicable for this patch.
  • FE-BE variable passing: Applicable. The new shared limit pointer is threaded through the reviewed storage read path consistently.
  • Performance: The intended I/O reduction path is clear and localized; I did not confirm an additional performance regression beyond the test/build issue below.
  • Other issues: The BE UT compile failure below should be fixed before merging.

Overall opinion: not ready as-is because the added unit test appears to break the BE test build.

// The segment pointer is null — only _opts and internal maps are accessed.
std::unique_ptr<SegmentIterator> make_iter() {
auto iter = std::make_unique<SegmentIterator>(nullptr, _read_schema);
iter->_opts.tablet_schema = _tablet_schema;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test reaches directly into SegmentIterator internals (_opts, _column_predicate_index_exec_status, and _can_opt_limit_reads()), but segment_iterator.h does not expose them to the test via friend/FRIEND_TEST or any public test hook. As written, this TU should fail to compile in the BE UT target because all of those members are private. Please add an explicit test hook/friend declaration or rewrite the test to exercise the behavior through a public interface.

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 1 correctness issue.

  1. Goal of current task
    Conclusion: Partially achieved. General LIMIT pushdown is implemented, but the new shared limit is also propagated into the ORDER BY key LIMIT path, where it can change results under parallel scan. Existing tests only cover single-bucket ORDER BY cases and therefore do not validate this path.

  2. Modification size and focus
    Conclusion: The patch is focused, but it couples two semantically different paths: plain LIMIT (where a global remaining-row hint is safe) and per-scanner top-N collection (where it is not).

  3. Concurrency
    Conclusion: The atomic accesses themselves are fine for a best-effort hint, but using the same shared counter as a hard stop for parallel top-N scanners is not concurrency-safe for correctness. A scanner that decrements the counter first can suppress other scanners that still hold better rows for the final top-N merge.

  4. Lifecycle / static initialization
    Conclusion: No special lifecycle or static initialization issue found in the touched code.

  5. Configuration
    Conclusion: No new configuration items were added.

  6. Compatibility
    Conclusion: No storage-format or symbol compatibility issue found.

  7. Parallel code paths
    Conclusion: There are parallel code paths here, and they should not be treated identically. Plain LIMIT pushdown and ORDER BY key LIMIT/top-N pushdown have different correctness constraints; this change should only apply the shared remaining-row budget to the former.

  8. Special conditional checks
    Conclusion: The new _storage_no_merge() gate is not sufficient to make the top-N path safe. The missing condition is whether the path still requires each scanner to independently produce its own local top-N candidate set.

  9. Test coverage
    Conclusion: Coverage is insufficient for the risky path. The new regression file adds multi-bucket tests for plain LIMIT, but the ORDER BY key tests are single-bucket only. No test exercises parallel ORDER BY key LIMIT with multiple scanners/tablets, which is exactly where this regression appears.

  10. Observability
    Conclusion: No additional observability issue required for this review.

  11. Transaction / persistence
    Conclusion: Not applicable; no transaction or persistence logic is changed here.

  12. Data writes / modifications
    Conclusion: Not applicable; the change is read-path only.

  13. FE-BE variable passing
    Conclusion: The new shared_scan_limit pointer is passed through the relevant BE layers consistently for the touched path. The issue is the semantic use of that variable in the top-N path, not missing propagation.

  14. Performance
    Conclusion: The intended I/O reduction is reasonable for plain LIMIT. However, correctness takes priority, and the current top-N integration is unsafe.

  15. Other issues
    Conclusion: No second confirmed bug beyond the incorrect top-N integration.

Overall opinion: the PR should not merge as-is because the shared remaining-row budget is being applied to a path where it can change query results.

Comment thread be/src/exec/scan/olap_scanner.cpp Outdated
// iterator can tighten its per-batch row budget based on how many rows
// the whole query still needs, avoiding unnecessary I/O.
if (olap_scan_local_state->_storage_no_merge()) {
_tablet_reader_params.shared_scan_limit = _local_state->shared_scan_limit_ptr();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applying shared_scan_limit to the read_orderby_key_limit path breaks the correctness contract of parallel top-N scanning. In this branch, each scanner is supposed to produce its own local top-N candidate set (_limit_per_scanner), and only the upper operator merges those candidates into the final global top-N.

After this change, BetaRowsetReader passes the global remaining-row budget down into SegmentIterator::read_limit, so one scanner can exhaust the shared counter and cause other scanners to stop before they have produced their own local top-N rows. That can return the wrong result under parallel scan.

Concrete case: with parallel scan enabled, imagine tablet A contains keys 100..199 and tablet B contains keys 1..99, and the query is SELECT ... ORDER BY k LIMIT 10. If scanner A runs first and returns 10 rows, the shared counter reaches 0. Scanner B can then stop without returning its local top-10 rows (1..10), so the final merge never sees the real best rows.

This shared budget is safe as an I/O hint for plain unordered LIMIT, but it is not safe for the ORDER BY key LIMIT / per-scanner top-N path. The new regression tests do not cover this because the ORDER BY cases use a single-bucket table only.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 85.39% (76/89) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.64% (27372/37169)
Line Coverage 57.28% (295377/515685)
Region Coverage 54.55% (246157/451264)
Branch Coverage 56.21% (106722/189859)

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

2 similar comments
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 91.94% (114/124) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.58% (27362/37189)
Line Coverage 57.20% (295200/516043)
Region Coverage 54.42% (245871/451782)
Branch Coverage 56.07% (106560/190034)

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 1 issue.

  1. [high] be/src/exec/scan/olap_scanner.cpp: the refactor only disables _shared_scan_limit when read_orderby_key_limit > 0, but ORDER BY key LIMIT scans that still require merge reads keep the scanner-level shared budget enabled. In that path one scanner can exhaust the global budget before peer scanners emit their local top-N candidates, so the final merge can miss smaller keys from later scanners and return the wrong rows.

Critical checkpoints:

  • Goal / correctness: Not fully satisfied. The change aims to unify the topn/general limit paths, but merge-required top-N can now under-produce rows.
  • Minimality / clarity: Mostly focused, but the shared-limit condition became broader than the top-N correctness boundary.
  • Concurrency: Applicable. The shared atomic coordination itself is race-safe enough here, but it is applied to the wrong top-N path.
  • Lifecycle / static init: No issue found.
  • Configuration: No new config.
  • Compatibility: No storage/protocol compatibility issue found.
  • Parallel code paths: Not all applicable paths were updated consistently; no-merge top-N disables shared limit, merge-required top-N does not.
  • Special conditions: The new comments correctly say top-N must not use a global budget, but the implementation only enforces that for read_orderby_key_limit.
  • Test coverage: New tests cover general limit and no-merge top-N, but miss merge-required ORDER BY key LIMIT.
  • Observability: No additional observability issue found.
  • Transaction / persistence: Not applicable.
  • Data write / atomicity: Not applicable.
  • FE-BE variable passing: Applicable only within the BE stack; propagation looks consistent there.
  • Performance: No separate performance concern beyond the correctness regression.
  • Other issues: None beyond the finding above.

Comment thread be/src/exec/scan/olap_scanner.cpp Outdated
@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 32.95% (29/88) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.01% (20129/37973)
Line Coverage 36.55% (189216/517640)
Region Coverage 32.82% (146895/447640)
Branch Coverage 33.92% (64278/189472)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (88/88) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.52% (27340/37188)
Line Coverage 57.17% (295044/516058)
Region Coverage 54.41% (245827/451773)
Branch Coverage 56.01% (106448/190054)

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

1 similar comment
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24491647856

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 30.99% (22/71) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.12% (20184/37997)
Line Coverage 36.69% (190128/518251)
Region Coverage 32.93% (147574/448175)
Branch Coverage 34.07% (64617/189687)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (71/71) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.71% (27425/37206)
Line Coverage 57.34% (296220/516633)
Region Coverage 54.59% (246884/452285)
Branch Coverage 56.20% (106931/190255)

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed against the current GitHub PR diff and the existing inline review context. I did not add duplicate inline comments for issues already raised, but the existing unresolved threads still include correctness/blocking concerns that should be addressed before merge.

Critical checkpoint conclusions:

  • Goal/test proof: The PR aims to push SegmentIterator filter/LIMIT handling deeper into storage and add a new controlling session variable. Regression and BE unit coverage were added, but existing review threads still identify gaps around SET_VAR alias behavior and order/value-sensitive validation.
  • Scope/focus: The change is focused on scan/storage LIMIT pushdown and related session plumbing, though it touches sensitive shared-limit and top-N paths.
  • Concurrency: Shared scan-limit coordination is involved. Existing threads already cover liveness/overshoot and top-N shared-budget hazards; I found no additional distinct concurrency issue in the current PR diff.
  • Lifecycle/static initialization: No new static initialization or special object lifecycle issue found beyond existing scanner/iterator ownership patterns.
  • Configuration/session variables: A new FE/BE query option is added. Existing threads already cover compatibility and statement-scoped SET_VAR revert behavior for the deprecated alias.
  • Compatibility/storage format: No storage format compatibility issue found. FE/BE query-option compatibility is already covered by existing comments.
  • Parallel code paths: Cloud/non-cloud and top-N/general scan paths were considered; existing threads already cover the distinct MOR-as-DUP and top-N path concerns.
  • Conditional checks: Several safety gates were added for runtime filters, top-N, and storage-no-merge. Existing comments cover the cases where those gates are insufficient or need tests.
  • Test coverage/results: Tests were added, but existing threads already identify count-only/order-insensitive coverage gaps and alias SET_VAR coverage needs.
  • Observability: No additional required observability issue found.
  • Transactions/persistence/data writes: Not applicable; this PR does not change transaction persistence or write visibility semantics.
  • FE/BE variables: New thrift/session variable plumbing exists; existing comments cover old/new variable mapping and compatibility.
  • Performance: The intended optimization reduces storage reads; no additional distinct performance regression was found beyond correctness issues already raised.

User focus: No additional user-provided review focus was supplied.

No new non-duplicate inline comments were submitted in this review; please resolve the already-open blocking threads.

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

if (remaining == 0) {
// Skip submitting more pending scanners once the LIMIT budget is
// exhausted; they would only open and immediately EOF.
if (_shared_scan_limit->load(std::memory_order_acquire) == 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be <= 0 like *eof = *eof || (_shared_scan_limit && _shared_scan_limit->load(std::memory_order_acquire) <= 0); the code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _shared_scan_limit here will not be nullptr. When there is no limit, it will be a negative number.

Comment thread be/src/runtime/runtime_state.h Outdated
bool enable_common_expr_pushdown() const {
return _query_options.__isset.enable_common_expr_pushdown &&
_query_options.enable_common_expr_pushdown;
bool enable_segment_filter_and_limit_pushdown() const {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should only control limit not the conjucnt he filter

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

return mode == ExprStorageFilterCheckMode::HAS_SEGMENT_EVALUABLE_EXPR ||
!_is_key_column(slot_ref->expr_name());
}
if (expr->is_virtual_slot_ref()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need check the logic, the virtual slot should not valid because the array should not be key column

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.07% (1844/2362)
Line Coverage 64.76% (33002/50963)
Region Coverage 65.26% (16375/25093)
Branch Coverage 55.83% (8742/15658)

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 26.91% (60/223) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.52% (20577/38447)
Line Coverage 37.14% (194281/523160)
Region Coverage 33.38% (151047/452519)
Branch Coverage 34.51% (66116/191601)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29675 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 604cdc2dadcc1b1328e6400c728e31803678dc9a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17687	3938	3859	3859
q2	q3	10652	921	616	616
q4	4666	476	352	352
q5	7465	1337	1146	1146
q6	198	173	139	139
q7	915	949	749	749
q8	9347	1411	1281	1281
q9	5804	5391	5288	5288
q10	6305	2093	1843	1843
q11	479	266	255	255
q12	645	415	292	292
q13	18078	3230	2729	2729
q14	288	283	260	260
q15	q16	896	866	785	785
q17	931	1045	736	736
q18	6484	5642	5643	5642
q19	1165	1284	1136	1136
q20	515	389	260	260
q21	4590	2374	1971	1971
q22	465	420	336	336
Total cold run time: 97575 ms
Total hot run time: 29675 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4714	4635	4815	4635
q2	q3	4706	4881	4193	4193
q4	2132	2215	1408	1408
q5	5173	5008	5263	5008
q6	192	167	135	135
q7	2021	1823	1628	1628
q8	3400	3089	3098	3089
q9	8430	8377	8439	8377
q10	4501	4503	4291	4291
q11	593	427	393	393
q12	691	741	509	509
q13	3259	3636	2845	2845
q14	301	306	281	281
q15	q16	751	944	692	692
q17	1340	1308	1268	1268
q18	7967	7142	7092	7092
q19	1161	1167	1169	1167
q20	2248	2207	1990	1990
q21	6297	5384	4786	4786
q22	545	497	421	421
Total cold run time: 60422 ms
Total hot run time: 54208 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170954 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 604cdc2dadcc1b1328e6400c728e31803678dc9a, data reload: false

query5	4320	675	517	517
query6	345	233	208	208
query7	4245	552	304	304
query8	319	227	219	219
query9	8853	4111	3980	3980
query10	455	371	277	277
query11	5837	2413	2172	2172
query12	181	129	126	126
query13	1252	639	426	426
query14	5982	5326	5042	5042
query14_1	4307	4298	4329	4298
query15	209	202	179	179
query16	995	457	419	419
query17	1145	741	613	613
query18	2571	482	354	354
query19	209	197	152	152
query20	136	134	128	128
query21	211	136	116	116
query22	13562	13646	13284	13284
query23	17294	16374	16663	16374
query23_1	16372	16359	16335	16335
query24	7519	1836	1375	1375
query24_1	1342	1327	1361	1327
query25	555	475	421	421
query26	1316	305	189	189
query27	2750	618	349	349
query28	4389	1948	1962	1948
query29	990	635	502	502
query30	292	224	198	198
query31	1097	1045	939	939
query32	81	68	71	68
query33	543	349	280	280
query34	1163	1132	602	602
query35	745	768	651	651
query36	1322	1356	1174	1174
query37	148	102	85	85
query38	3172	3125	3031	3031
query39	918	931	890	890
query39_1	900	876	851	851
query40	233	162	138	138
query41	62	61	62	61
query42	108	108	105	105
query43	320	315	277	277
query44	
query45	208	199	193	193
query46	1046	1141	712	712
query47	2315	2344	2212	2212
query48	422	403	299	299
query49	654	562	437	437
query50	709	283	217	217
query51	4345	4294	4187	4187
query52	107	106	97	97
query53	271	285	210	210
query54	339	286	265	265
query55	94	91	86	86
query56	348	330	312	312
query57	1412	1406	1368	1368
query58	307	273	272	272
query59	1569	1603	1396	1396
query60	359	344	321	321
query61	206	185	181	181
query62	686	631	564	564
query63	243	202	206	202
query64	2457	882	735	735
query65	
query66	1770	529	418	418
query67	29935	29818	29684	29684
query68	
query69	480	345	303	303
query70	1008	985	1007	985
query71	314	283	269	269
query72	3111	2739	2428	2428
query73	853	748	405	405
query74	5045	4963	4711	4711
query75	2773	2657	2331	2331
query76	2279	1102	754	754
query77	400	416	348	348
query78	12982	13060	12394	12394
query79	1497	938	720	720
query80	682	584	485	485
query81	448	275	240	240
query82	1283	164	127	127
query83	345	268	242	242
query84	263	139	108	108
query85	862	515	456	456
query86	390	321	325	321
query87	3402	3342	3245	3245
query88	3520	2636	2616	2616
query89	428	373	346	346
query90	1960	177	173	173
query91	180	173	139	139
query92	79	75	71	71
query93	1017	953	558	558
query94	535	336	307	307
query95	672	482	358	358
query96	1080	753	360	360
query97	2739	2706	2583	2583
query98	238	228	250	228
query99	1104	1113	980	980
Total cold run time: 253461 ms
Total hot run time: 170954 ms

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.07% (1844/2362)
Line Coverage 64.74% (32992/50963)
Region Coverage 65.25% (16372/25093)
Branch Coverage 55.77% (8732/15658)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29299 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d61b925a47af598997ad58f2620ba28f8d334a1d, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17800	3836	3895	3836
q2	q3	10791	891	608	608
q4	4721	463	344	344
q5	8049	1331	1148	1148
q6	348	169	139	139
q7	951	957	767	767
q8	10902	1415	1292	1292
q9	7232	5385	5378	5378
q10	6338	2059	1795	1795
q11	471	283	256	256
q12	692	419	291	291
q13	18214	3391	2763	2763
q14	303	282	270	270
q15	q16	907	876	785	785
q17	1011	1022	779	779
q18	6431	5656	5445	5445
q19	1187	1253	978	978
q20	502	394	262	262
q21	4539	2265	1858	1858
q22	413	349	305	305
Total cold run time: 101802 ms
Total hot run time: 29299 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4183	4089	4084	4084
q2	q3	4606	4870	4159	4159
q4	2107	2176	1377	1377
q5	4925	4999	5258	4999
q6	189	163	132	132
q7	2439	1848	1607	1607
q8	3487	3215	3127	3127
q9	8423	8478	8312	8312
q10	4478	4504	4295	4295
q11	607	417	404	404
q12	696	737	512	512
q13	3207	3583	2964	2964
q14	309	322	274	274
q15	q16	758	808	720	720
q17	1358	1336	1412	1336
q18	7988	7140	7137	7137
q19	1146	1181	1165	1165
q20	2271	2232	1926	1926
q21	6206	5370	4903	4903
q22	540	510	450	450
Total cold run time: 59923 ms
Total hot run time: 53883 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170743 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d61b925a47af598997ad58f2620ba28f8d334a1d, data reload: false

query5	4316	669	534	534
query6	359	243	210	210
query7	4312	563	317	317
query8	330	244	226	226
query9	8846	4048	4035	4035
query10	476	349	321	321
query11	6027	2441	2193	2193
query12	201	134	131	131
query13	1281	613	440	440
query14	6873	5406	5070	5070
query14_1	4365	4389	4337	4337
query15	221	208	184	184
query16	1025	474	441	441
query17	1396	773	647	647
query18	2765	508	379	379
query19	347	221	174	174
query20	142	134	134	134
query21	220	145	121	121
query22	13548	14087	14449	14087
query23	17438	16654	16193	16193
query23_1	16239	16259	16263	16259
query24	7909	1772	1361	1361
query24_1	1412	1331	1351	1331
query25	568	494	433	433
query26	1282	311	172	172
query27	2662	598	327	327
query28	4282	1959	1924	1924
query29	982	625	512	512
query30	304	238	196	196
query31	1122	1065	930	930
query32	93	72	73	72
query33	529	331	293	293
query34	1141	1165	648	648
query35	783	766	666	666
query36	1365	1389	1167	1167
query37	147	99	90	90
query38	3203	3142	3049	3049
query39	925	911	886	886
query39_1	873	859	901	859
query40	241	155	134	134
query41	63	104	60	60
query42	109	109	108	108
query43	320	321	286	286
query44	
query45	214	203	190	190
query46	1067	1160	730	730
query47	2326	2248	2159	2159
query48	424	420	305	305
query49	630	516	433	433
query50	710	289	213	213
query51	4317	4327	4173	4173
query52	104	103	98	98
query53	251	290	206	206
query54	316	277	249	249
query55	92	88	83	83
query56	301	320	288	288
query57	1411	1386	1298	1298
query58	289	267	272	267
query59	1534	1604	1392	1392
query60	353	340	327	327
query61	161	150	151	150
query62	673	619	559	559
query63	250	202	206	202
query64	2381	843	708	708
query65	
query66	1674	510	386	386
query67	30090	30023	29189	29189
query68	
query69	463	364	310	310
query70	1031	927	988	927
query71	304	274	270	270
query72	2957	2739	2017	2017
query73	828	783	434	434
query74	5078	4883	4725	4725
query75	2801	2653	2349	2349
query76	2317	1143	757	757
query77	407	441	362	362
query78	13030	13041	12427	12427
query79	1495	1034	703	703
query80	1391	579	493	493
query81	520	281	244	244
query82	964	157	120	120
query83	350	270	248	248
query84	266	145	107	107
query85	915	519	439	439
query86	461	349	338	338
query87	3398	3355	3222	3222
query88	3553	2674	2673	2673
query89	432	378	333	333
query90	1888	181	187	181
query91	182	169	142	142
query92	77	72	70	70
query93	1067	950	561	561
query94	687	331	315	315
query95	655	464	346	346
query96	1029	744	336	336
query97	2718	2680	2583	2583
query98	244	231	229	229
query99	1124	1105	974	974
Total cold run time: 256246 ms
Total hot run time: 170743 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 28.71% (60/209) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.49% (20564/38448)
Line Coverage 37.11% (194167/523181)
Region Coverage 33.38% (151059/452550)
Branch Coverage 34.49% (66079/191593)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 83.17% (173/208) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.85% (27808/37655)
Line Coverage 57.69% (301055/521828)
Region Coverage 54.67% (249821/456958)
Branch Coverage 56.40% (108475/192319)

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29721 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ddc0747b35f5fe3bf5498ff16ddda7db781fcda5, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17667	3938	4000	3938
q2	q3	10791	907	613	613
q4	4716	468	347	347
q5	8224	1331	1142	1142
q6	349	176	141	141
q7	964	943	766	766
q8	11123	1411	1288	1288
q9	7503	5459	5377	5377
q10	6488	2085	1819	1819
q11	467	267	255	255
q12	691	422	306	306
q13	18159	3273	2748	2748
q14	290	288	261	261
q15	q16	901	881	798	798
q17	1000	1067	743	743
q18	6505	5834	5566	5566
q19	1174	1303	1117	1117
q20	528	403	265	265
q21	4933	2306	1922	1922
q22	424	353	309	309
Total cold run time: 102897 ms
Total hot run time: 29721 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4218	4188	4167	4167
q2	q3	4685	4749	4202	4202
q4	2142	2204	1392	1392
q5	4990	5037	5624	5037
q6	212	174	140	140
q7	2092	1884	1655	1655
q8	3442	3190	3152	3152
q9	8519	8427	8474	8427
q10	4529	4484	4322	4322
q11	612	420	381	381
q12	705	748	537	537
q13	3277	3606	2961	2961
q14	307	306	283	283
q15	q16	787	793	698	698
q17	1390	1484	1333	1333
q18	8089	7174	7141	7141
q19	1211	1184	1200	1184
q20	2318	2223	1951	1951
q21	6205	5472	4917	4917
q22	551	541	438	438
Total cold run time: 60281 ms
Total hot run time: 54318 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 173224 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ddc0747b35f5fe3bf5498ff16ddda7db781fcda5, data reload: false

query5	4316	660	522	522
query6	360	223	208	208
query7	4280	540	319	319
query8	344	262	216	216
query9	8819	4098	4033	4033
query10	452	345	298	298
query11	5827	2455	2244	2244
query12	189	133	131	131
query13	1323	651	435	435
query14	6570	5404	5072	5072
query14_1	4394	4390	4365	4365
query15	220	212	193	193
query16	1057	479	475	475
query17	1157	755	608	608
query18	2724	475	356	356
query19	221	208	162	162
query20	142	138	135	135
query21	220	141	116	116
query22	13673	14760	14361	14361
query23	17422	16545	16191	16191
query23_1	16303	16303	16368	16303
query24	7468	1747	1375	1375
query24_1	1354	1343	1356	1343
query25	549	478	419	419
query26	1308	320	168	168
query27	2695	646	351	351
query28	4316	1977	1954	1954
query29	1021	630	525	525
query30	315	227	196	196
query31	1131	1064	931	931
query32	94	74	77	74
query33	535	344	299	299
query34	1176	1149	642	642
query35	780	787	662	662
query36	1330	1337	1191	1191
query37	147	102	87	87
query38	3209	3164	3065	3065
query39	972	905	895	895
query39_1	884	890	890	890
query40	231	161	134	134
query41	66	60	59	59
query42	112	112	112	112
query43	333	322	288	288
query44	
query45	208	209	190	190
query46	1087	1170	737	737
query47	2354	2297	2190	2190
query48	388	441	272	272
query49	636	529	434	434
query50	717	282	222	222
query51	4304	4319	4290	4290
query52	114	109	93	93
query53	263	284	208	208
query54	320	285	256	256
query55	98	90	84	84
query56	308	312	315	312
query57	1438	1404	1319	1319
query58	314	283	266	266
query59	1557	1614	1402	1402
query60	351	339	331	331
query61	163	157	163	157
query62	678	625	567	567
query63	243	202	203	202
query64	2348	826	673	673
query65	
query66	1709	523	391	391
query67	29418	30034	29757	29757
query68	
query69	472	354	315	315
query70	1036	1025	985	985
query71	323	276	271	271
query72	3110	2941	2711	2711
query73	862	753	445	445
query74	5092	4928	4720	4720
query75	2812	2681	2361	2361
query76	2290	1154	788	788
query77	433	450	369	369
query78	13044	12833	12368	12368
query79	1533	1022	760	760
query80	1287	566	498	498
query81	499	287	243	243
query82	1264	165	130	130
query83	351	273	251	251
query84	265	145	111	111
query85	923	524	438	438
query86	438	372	319	319
query87	3432	3369	3215	3215
query88	3679	2725	2715	2715
query89	442	390	352	352
query90	1853	196	185	185
query91	182	173	150	150
query92	85	78	72	72
query93	1002	995	586	586
query94	636	364	329	329
query95	685	469	375	375
query96	1044	858	348	348
query97	2685	2700	2550	2550
query98	246	237	233	233
query99	1099	1121	964	964
Total cold run time: 254980 ms
Total hot run time: 173224 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 83.17% (173/208) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.70% (27823/37751)
Line Coverage 57.61% (301164/522799)
Region Coverage 54.86% (251232/457918)
Branch Coverage 56.34% (108510/192608)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants