feat(ownership-filter): aim group-ownership GraphQL filter by LynxPx · Pull Request #17695 · datahub-project/datahub

LynxPx · 2026-06-03T04:35:38Z

Group-ownership filtering of search / autocomplete / lineage / browse via a graphql-java Instrumentation chained onto the graphQLEngine bean (no upstream resolver or service edits).
Unowned assets are visible to all users (owners-EXISTS-negated disjunct).
Domain/platform cards and domain/platform search scoped to the actor's accessible set (DomainPlatformAccessResolver), including home-page recommendation-card filtering.
Keycloak / plain-JWT authenticator registered into DataHub's auth chain (KeycloakJwtAuthenticator + JwksSigningKeyResolver). Inert unless KEYCLOAK_JWKS_URI is set, so the build ships safely disabled by default.
QueryContext read from both the GraphQL context map and the legacy DataFetchingEnvironment context, so one module works on master and v1.3.x.

…(JDK 21) Ports the complete datahub-ownership-filter module onto master, carrying every constraint built during the v1.3.0.1 work, with the same minimal-footprint, sits-above-OSS design: - Group-ownership filtering of search / autocomplete / lineage / browse via a graphql-java Instrumentation chained onto the graphQLEngine bean (no upstream resolver or service edits). - Unowned assets are visible to all users (owners-EXISTS-negated disjunct). - Domain/platform cards and domain/platform search scoped to the actor's accessible set (DomainPlatformAccessResolver), including home-page recommendation-card filtering. - Keycloak / plain-JWT authenticator registered into DataHub's auth chain (KeycloakJwtAuthenticator + JwksSigningKeyResolver). Inert unless KEYCLOAK_JWKS_URI is set, so the build ships safely disabled by default. - QueryContext read from both the GraphQL context map and the legacy DataFetchingEnvironment context, so one module works on master and v1.3.x. Upstream footprint is 2 wiring lines (settings.gradle, metadata-service/war/build.gradle). Module toolchain set to JDK 21 to match master; compiles against master APIs (graphql-java 22.3); 34 unit tests pass.

github-actions · 2026-06-03T04:35:47Z

Linear: CAT-2291

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

aikido-pr-checks · 2026-06-03T04:36:10Z

CVE-2026-40976

CVE-2026-40976 in org.springframework.boot:spring-boot - critical severity
In certain circumstances, Spring Boot's default web security is ineffective allowing unauthorized access to all endpoints. For an application to be vulnerable, it must: be a servlet-based web application; have no Spring Security configuration of its own and rely on the default web security filter chain; depend on spring-boot-actuator-autoconfigure; not depend on spring-boot-health. If any of the above does not apply, the application is not vulnerable.

Affected: Spring Boot 4.0.0–4.0.5; upgrade to 4.0.6 or later per vendor advisory.

Details

Remediation Aikido suggests bumping this package to version 4.0.6 to resolve this issue

_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
More info

…set entities DomainPlatformAccessResolver passed entityNames=null to EntitySearchService.aggregateByValue. On DataHub v1.5.x that queries EVERY entity index (getAllEntityIndicesPatterns) while the ownership filter is only transformed for the default specs; on indices without an `owners` field the `owners EXISTS negated` clause then matches every document, so the accessible domain/platform set was computed over all entities. Effect: home-page Domains/Platforms recommendation cards were unfiltered (and on some ES/OpenSearch builds the same query returns empty or throws, blanking the cards). The dataset search path was unaffected because it never aggregates across all indices like this. Scope both aggregations to the concrete asset entity types (intersected with the live registry so unknown names are skipped), keeping the predicate meaningful and consistent with the asset search filter. Adds a regression test asserting the aggregation entityNames are asset-scoped and never null.

codecov · 2026-06-03T16:19:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

… assets only The prior fix scoped the access aggregation to a HARDCODED asset-entity list, which silently dropped any domain whose assets are an entity type not in the list (e.g. custom production entities) — and the broader owns-OR-unowned predicate leaked domains through unowned sub-entities: a schemaField of a DHS-owned dataset inherits domains=DHS but carries no owners, so the EXISTS-negated (unowned) disjunct matched it and every actor saw DHS. Two changes, both removing brittleness: - Scope the aggregation to entity types that carry the `domains` aspect, read LIVE from the registry — dynamic, no hardcoded list, automatically covers custom entities, and excludes the domain/platform entities themselves (whose self-referential fields would otherwise re-add every value). - Use an ownership-ONLY predicate (owners EQUAL [actor, groups]) for this computation, dropping the unowned/EXISTS-negated disjunct so nothing the actor does not own can leak a domain/platform. The accessible set is now exactly "domains/platforms where the actor or one of their groups owns an asset", which is the intended rule. Asset search keeps unowned-visible via its own filter. Verified end-to-end: a CBP-only user resolves to {CBP} on both the home-page Domains/Platforms modules and domain/platform search; a CBP+DHS user -> {CBP,DHS}; admin bypass unaffected. Regression test updated.

…; fail-safe home-page recs JWT/OAuth identity resolution: OAuth/OIDC flows can authenticate under a synthetic actor URN (urn:li:corpuser:__oauth_<issuer>_<subject>) while ownership/group data is keyed on the real user (urn:li:corpuser:alice). Filtering by the synthetic actor matched no owners/groups, so the user saw empty/incorrect results (including empty home-page domain/platform modules). OwnershipInstrumentation now resolves the "effective actor": when the actor URN has the synthetic OAuth prefix, it reads the real user URN from the __datahub_user_urn claim the auth layer attaches, and uses it for group resolution, ownership-filter injection, and domain/platform scoping. This is identity resolution for authorization filtering only -- it does not validate the token (the authenticator does that). Fail-safe recommendations: The home-page domain/platform recommendation filtering is best-effort navigation, not the security boundary (assets are filtered in search, fail-closed). If access computation or filtering throws, return recommendations UNFILTERED rather than blanking the entire home page -- so one failing aggregation can't take down all home-page modules (Platforms included). Adds oauthSyntheticActorUsesResolvedUserUrnWhenAvailable. 36 unit tests pass.

…I clients An external service calling the GraphQL API with a raw Keycloak JWT never goes through interactive OIDC login, so DataHub has no groupMembership aspect for it. The ownership filter derived groups solely from that aspect, so group-owned assets were invisible -- a service that owns assets via its groups saw ~0 results. OwnershipInstrumentation now reads the caller's groups from the JWT `groups` claim and unions them with DataHub-recorded membership, mapping each name to urn:li:corpGroup:<URLEncoder.encode(name)> -- the exact encoding OidcCallbackLogic uses when provisioning groups (matches e.g. urn:li:corpGroup:%2FDHS%2FCBP). Best-effort: a malformed/absent claim falls back to DataHub membership; PAT auth (no groups claim) is unaffected. Reproduced with an unprovisioned Keycloak user (in group CBP, no DataHub membership): visible datasets 2 -> 13 and domains [] -> [CBP], with provisioned users and PAT auth unchanged. Adds jwtGroupsClaimAugmentsOwnershipFilterForUnprovisionedUser. 37 unit tests pass.

…in (default off) Adds OWNERSHIP_FILTER_RECOMMENDATIONS_ENABLED (default false). When disabled -- the default -- listRecommendations is not intercepted at all, so the home-page Domains/Platforms modules render exactly like stock DataHub (unfiltered). Set the flag to true to scope those cards to the actor's accessible domains/platforms. Asset search ownership filtering (the security boundary) is unaffected by this flag and remains always on. Verified end-to-end: with the flag off a CBP-only user sees all domains on the home page (stock) while still seeing only their 13 owned datasets and CBP-scoped domain search. 38 unit tests pass (adds listRecommendationsWrappedOnlyWhenRecommendationFilteringEnabled).

github-actions Bot added devops PR or Issue related to DataHub backend & deployment smoke_test Contains changes related to smoke tests labels Jun 3, 2026

github-actions Bot added the community-contribution PR or Issue raised by member(s) of DataHub Community label Jun 3, 2026

aikido-pr-checks Bot reviewed Jun 3, 2026

View reviewed changes

vercel Bot deployed to Preview June 3, 2026 04:50 View deployment

maggiehays added the needs-review Label for PRs that need review from a maintainer. label Jun 3, 2026

vercel Bot deployed to Preview June 3, 2026 16:07 View deployment

vercel Bot deployed to Preview June 3, 2026 17:07 View deployment

vercel Bot deployed to Preview June 3, 2026 19:29 View deployment

vercel Bot deployed to Preview June 3, 2026 20:03 View deployment

vercel Bot deployed to Preview June 4, 2026 17:15 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ownership-filter): aim group-ownership GraphQL filter #17695

feat(ownership-filter): aim group-ownership GraphQL filter #17695
LynxPx wants to merge 6 commits into
datahub-project:masterfrom
LynxPx:feature/graphql-security-master

LynxPx commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

aikido-pr-checks Bot Jun 3, 2026

Uh oh!

codecov Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LynxPx commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

aikido-pr-checks Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 3, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants