Skip to content

fix(ingest/powerbi): strip Athena catalog prefix from ODBC navigation lineage URNs#17729

Open
zlosim wants to merge 1 commit into
datahub-project:masterfrom
zlosim:fix/powerbi-odbc-navigation-athena-catalog-strip
Open

fix(ingest/powerbi): strip Athena catalog prefix from ODBC navigation lineage URNs#17729
zlosim wants to merge 1 commit into
datahub-project:masterfrom
zlosim:fix/powerbi-odbc-navigation-athena-catalog-strip

Conversation

@zlosim
Copy link
Copy Markdown

@zlosim zlosim commented Jun 4, 2026

Summary

PowerBI reports that connect to Athena via Odbc.DataSource(..., [HierarchicalNavigation=true]) and navigate Catalog → Schema → Table produce upstream lineage URNs that keep the Athena catalog as a third name segment, e.g.:

urn:li:dataset:(urn:li:dataPlatform:athena,awsdatacatalog.dimensions.country,PROD)

The standalone Athena connector emits 2-part athena,dimensions.country,PROD, so these navigation-based PowerBI edges dangle to orphan stub datasets and no lineage shows in the UI.

Problem

OdbcLineage has two paths:

So catalog stripping only ran for the SQL path. Reports using navigation (the common case for Athena-via-ODBC) got mismatched URNs.

Solution

  • Extract the Athena post-processing (catalog stripping + federated _apply_table_platform_override) into a shared OdbcLineage._apply_athena_post_processing(lineage, platform_pair, dsn) helper, gated on platform == athena (no-op otherwise).
  • Call it from both query_lineage() and expression_lineage() (the latter now threads dsn so federated overrides work there too). Single source of truth — the two paths can't drift.

Known limitation

Only 3-part navigation names are catalog-stripped. A degenerate 2-level navigation (Database + Table, no Schema) keeps the catalog because the single level is ambiguous (catalog vs glue database) and the shared stripper can't safely strip 2-part names (the SQL path uses 2-part for a legitimate database.table). This matches the native Athena connector, which requires a full catalog.schema.table hierarchy. Documented in code comments.

Testing

  • test_odbc_expression_lineage_strips_athena_catalog — navigation AwsDataCatalog → dimensions → countryathena,dimensions.country,PROD.
  • test_odbc_expression_lineage_integration_catalog_stripping_and_platform_override — navigation + override config → mysql,federated.orders,PROD (exercises both strip and override through expression_lineage).
  • ./gradlew :metadata-ingestion:testSingle -PtestFile=tests/unit/test_powerbi_parser.py → 29 passed.
  • ./gradlew :metadata-ingestion:lintFix → clean.

Checklist

  • PR conforms to the Contributing Guideline (PR title format)
  • Tests added
  • Breaking changes — none
  • Docs — none (behavior fix)

… lineage URNs

PowerBI reports that connect to Athena via Odbc.DataSource with
HierarchicalNavigation surface the Athena catalog (e.g. "AwsDataCatalog") as the
Database navigation level, producing 3-part catalog.database.table qualified
names. The standalone Athena connector omits the catalog and uses
database.table, so these navigation-based upstream URNs never match the Athena
entities and lineage silently dangles.

query_lineage() already normalized Athena lineage (catalog stripping +
federated _apply_table_platform_override). This extracts that into a shared
_apply_athena_post_processing() helper and applies it to the navigation path in
expression_lineage() too, threading the DSN through so federated overrides work
there as well.

Tests: add a navigation catalog-stripping test plus a navigation integration
test asserting catalog stripping + athena->mysql override run together.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the ingestion PR or Issue related to the ingestion of metadata label Jun 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Linear: ING-2807

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

@github-actions github-actions Bot added the community-contribution PR or Issue raised by member(s) of DataHub Community label Jun 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Bundle Report

Bundle size has no change ✅

@maggiehays maggiehays added the needs-review Label for PRs that need review from a maintainer. label Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants