[fix](multi-catalog) fixes some issues caused by the data_lake_reader refactoring #62306 and legacy issues by hubgeter · Pull Request #62821 · apache/doris

hubgeter · 2026-04-24T10:32:14Z

What problem does this PR solve?

Related PR: #62306

Problem Summary:
This PR fixes some issues caused by the refactoring #62306 and legacy issues:

For Iceberg/Paimon systems, it's necessary to pass metadata partition values for each split. Simply relying on information from files to obtain partition values is unreliable, especially for tables migrated from Hive.
Condition cache conflicts with CountReader and Lazy RF; see comments in be/src/exec/scan/file_scanner.cpp for details.
PR [refactoring](multi-catalog)data_lake_reader_refactoring. #62306 omitted handling of Iceberg name_mapping.

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

Thearas · 2026-04-24T10:32:20Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

hubgeter · 2026-04-24T10:32:51Z

run buildall

hello-stephen · 2026-04-24T12:29:02Z

BE Regression && UT Coverage Report

Increment line coverage 28.12% (124/441) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.39% (26776/37508)
Line Coverage	53.73% (279564/520283)
Region Coverage	47.08% (214867/456418)
Branch Coverage	50.36% (97288/193169)

hello-stephen · 2026-04-24T12:34:53Z

FE Regression Coverage Report

Increment line coverage 37.04% (70/189) 🎉
Increment coverage report
Complete coverage report

hubgeter · 2026-04-25T13:36:14Z

run buildall

hubgeter · 2026-04-25T13:36:23Z

/review

github-actions · 2026-04-25T15:32:28Z

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24932133823

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

hello-stephen · 2026-04-25T16:05:33Z

FE Regression Coverage Report

Increment line coverage 0.00% (0/91) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-25T16:07:42Z

BE Regression && UT Coverage Report

Increment line coverage 76.29% (428/561) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.71% (27641/37499)
Line Coverage	57.45% (298996/520472)
Region Coverage	54.60% (248566/455285)
Branch Coverage	56.17% (107684/191696)

hubgeter · 2026-04-26T15:04:05Z

run buildall

hello-stephen · 2026-04-26T16:38:20Z

FE UT Coverage Report

Increment line coverage 27.78% (45/162) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-26T17:54:41Z

FE Regression Coverage Report

Increment line coverage 14.72% (34/231) 🎉
Increment coverage report
Complete coverage report

hubgeter · 2026-04-27T09:41:15Z

run buildall

hello-stephen · 2026-04-27T11:06:50Z

FE UT Coverage Report

Increment line coverage 41.38% (72/174) 🎉
Increment coverage report
Complete coverage report

hubgeter · 2026-04-27T12:33:25Z

run buildall

hello-stephen · 2026-04-27T15:23:19Z

BE Regression && UT Coverage Report

Increment line coverage 81.40% (442/543) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.72% (27657/37516)
Line Coverage	57.49% (299336/520697)
Region Coverage	54.59% (248725/455608)
Branch Coverage	56.23% (107845/191798)

hello-stephen · 2026-04-27T15:30:58Z

FE Regression Coverage Report

Increment line coverage 19.10% (34/178) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-27T15:33:39Z

FE UT Coverage Report

Increment line coverage 41.38% (72/174) 🎉
Increment coverage report
Complete coverage report

hubgeter · 2026-04-29T06:42:16Z

run buildall

hubgeter · 2026-04-29T07:21:34Z

run buildall

hubgeter · 2026-04-30T09:49:42Z

run buildall

Gabriel39 · 2026-05-06T08:20:31Z

/review

github-actions

I found two issues that should be fixed before merging. The main correctness concern is that Paimon partition keys are now exposed as path partition keys using the original Paimon key names, while Doris external table columns are normalized to lower case; mixed-case partition columns can therefore stop being treated as partition columns and fail to materialize metadata partition values. There is also a regression-test portability issue from hard-coding a specific EMR HDFS namenode instead of using the configured test environment.

Critical checkpoint conclusions: Goal: the PR addresses metadata partition values for Iceberg/Paimon, condition-cache safety, and Iceberg name mapping, with added unit/regression coverage, but the Paimon mixed-case partition path is not fully correct. Scope: the changes are generally focused, though the new regression test contains environment-specific configuration. Concurrency/lifecycle: no new concurrency or non-obvious lifecycle issue found in the reviewed paths. Config/compatibility: no new Doris config or storage-format incompatibility found; FE/BE path metadata remains within existing thrift fields. Parallel paths: Iceberg mostly lowercases identity partition keys, but Paimon does not, which creates the distinct issue noted inline. Tests: coverage was added for lowercase migrated partitions, but not for mixed-case Paimon partition column names and the new Paimon p2 suite may not run outside the author environment. Observability/performance: no additional blocking issue found beyond the correctness/test-portability concerns. User focus: no additional user-provided review focus was supplied.

github-actions · 2026-05-06T08:37:52Z

+            return Collections.emptyList();
+        }
+        return new ArrayList<>(source.getPaimonTable().partitionKeys());
+    }


This returns Paimon's original partition key names, but Doris external Paimon columns are normalized to lower case when building the schema (PaimonExternalTable.initSchema). For a table with a mixed-case partition key such as Dt, classifyColumn() will compare the lower-case slot name (dt) with partitionKeys containing Dt, so the partition column is classified as REGULAR. Later setPaimonParams() also writes columns_from_path_keys as Dt, which BE compares against lower-case ColumnDescriptor.name, so the metadata partition value is not filled. This regresses exactly the metadata-partition path this PR is adding for any migrated Paimon table whose partition column casing is not already lower-case. Please normalize the returned keys, and the keys stored by PaimonUtil.getPartitionInfoMap, to the same lower-case names used in Doris schema, and add a test with a mixed-case partition column.

github-actions · 2026-05-06T08:37:52Z

+        "type" = "paimon",
+        "paimon.catalog.type" = "hms",
+        "warehouse" = "hdfs://master-1-1.c-a212282673679a24.cn-beijing.emr.aliyuncs.com:9000/user/hive/warehouse/",
+        'hive.version' = '3.1.3',


This hard-codes one EMR cluster's HDFS namenode into the regression test. The same suite already gates on enableExternalEmrTest and reads emrCatalogCommonProp, so in other regression environments this catalog will point at an unreachable host even when the EMR/HMS properties are configured correctly. Please derive the warehouse from the regression config (for example the same external/env property used by the EMR catalog setup) instead of committing an environment-specific hostname.

hubgeter marked this pull request as draft April 24, 2026 10:32

hubgeter force-pushed the fix_refactor_error branch from e4393e5 to ab89427 Compare April 26, 2026 15:03

hubgeter force-pushed the fix_refactor_error branch 2 times, most recently from 8e8c7c1 to c2cf480 Compare April 27, 2026 09:40

hubgeter force-pushed the fix_refactor_error branch 2 times, most recently from a2ca7ab to 39c2537 Compare April 29, 2026 06:42

hubgeter changed the title ~~fix refactor error~~ [fix](multi-catalog) fixes some issues caused by the data_lake_reader refactoring #62306 and legacy issues Apr 29, 2026

hubgeter force-pushed the fix_refactor_error branch from 39c2537 to 9dfaed5 Compare April 29, 2026 07:21

hubgeter marked this pull request as ready for review April 29, 2026 07:21

fix refactor error

67b3ec3

hubgeter force-pushed the fix_refactor_error branch from 9dfaed5 to 67b3ec3 Compare April 30, 2026 09:49

github-actions Bot requested changes May 6, 2026

View reviewed changes

Conversation

hubgeter commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Thearas commented Apr 24, 2026

Uh oh!

hubgeter commented Apr 24, 2026

Uh oh!

hello-stephen commented Apr 24, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 24, 2026

FE Regression Coverage Report

Uh oh!

hubgeter commented Apr 25, 2026

Uh oh!

hubgeter commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

hello-stephen commented Apr 25, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented Apr 25, 2026

BE Regression && UT Coverage Report

Uh oh!

hubgeter commented Apr 26, 2026

Uh oh!

hello-stephen commented Apr 26, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Apr 26, 2026

FE Regression Coverage Report

Uh oh!

hubgeter commented Apr 27, 2026

Uh oh!

hello-stephen commented Apr 27, 2026

FE UT Coverage Report

Uh oh!

hubgeter commented Apr 27, 2026

Uh oh!

hello-stephen commented Apr 27, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 27, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented Apr 27, 2026

FE UT Coverage Report

Uh oh!

hubgeter commented Apr 29, 2026

Uh oh!

hubgeter commented Apr 29, 2026

Uh oh!

hubgeter commented Apr 30, 2026

Uh oh!

Gabriel39 commented May 6, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hubgeter commented Apr 24, 2026 •

edited

Loading