Skip to content

feat(ownership-filter): aim group-ownership GraphQL filter #17695

Open
LynxPx wants to merge 6 commits into
datahub-project:masterfrom
LynxPx:feature/graphql-security-master
Open

feat(ownership-filter): aim group-ownership GraphQL filter #17695
LynxPx wants to merge 6 commits into
datahub-project:masterfrom
LynxPx:feature/graphql-security-master

Conversation

@LynxPx
Copy link
Copy Markdown

@LynxPx LynxPx commented Jun 3, 2026

  • Group-ownership filtering of search / autocomplete / lineage / browse via a graphql-java Instrumentation chained onto the graphQLEngine bean (no upstream resolver or service edits).
  • Unowned assets are visible to all users (owners-EXISTS-negated disjunct).
  • Domain/platform cards and domain/platform search scoped to the actor's accessible set (DomainPlatformAccessResolver), including home-page recommendation-card filtering.
  • Keycloak / plain-JWT authenticator registered into DataHub's auth chain (KeycloakJwtAuthenticator + JwksSigningKeyResolver). Inert unless KEYCLOAK_JWKS_URI is set, so the build ships safely disabled by default.
  • QueryContext read from both the GraphQL context map and the legacy DataFetchingEnvironment context, so one module works on master and v1.3.x.

…(JDK 21)

Ports the complete datahub-ownership-filter module onto master, carrying every
constraint built during the v1.3.0.1 work, with the same minimal-footprint,
sits-above-OSS design:

- Group-ownership filtering of search / autocomplete / lineage / browse via a
  graphql-java Instrumentation chained onto the graphQLEngine bean (no upstream
  resolver or service edits).
- Unowned assets are visible to all users (owners-EXISTS-negated disjunct).
- Domain/platform cards and domain/platform search scoped to the actor's
  accessible set (DomainPlatformAccessResolver), including home-page
  recommendation-card filtering.
- Keycloak / plain-JWT authenticator registered into DataHub's auth chain
  (KeycloakJwtAuthenticator + JwksSigningKeyResolver). Inert unless
  KEYCLOAK_JWKS_URI is set, so the build ships safely disabled by default.
- QueryContext read from both the GraphQL context map and the legacy
  DataFetchingEnvironment context, so one module works on master and v1.3.x.

Upstream footprint is 2 wiring lines (settings.gradle,
metadata-service/war/build.gradle). Module toolchain set to JDK 21 to match
master; compiles against master APIs (graphql-java 22.3); 34 unit tests pass.
@github-actions github-actions Bot added devops PR or Issue related to DataHub backend & deployment smoke_test Contains changes related to smoke tests labels Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Linear: CAT-2291

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

@github-actions github-actions Bot added the community-contribution PR or Issue raised by member(s) of DataHub Community label Jun 3, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CVE-2026-40976 in org.springframework.boot:spring-boot - critical severity
In certain circumstances, Spring Boot's default web security is ineffective allowing unauthorized access to all endpoints. For an application to be vulnerable, it must: be a servlet-based web application; have no Spring Security configuration of its own and rely on the default web security filter chain; depend on spring-boot-actuator-autoconfigure; not depend on spring-boot-health. If any of the above does not apply, the application is not vulnerable.

Affected: Spring Boot 4.0.0–4.0.5; upgrade to 4.0.6 or later per vendor advisory.

Details

Remediation Aikido suggests bumping this package to version 4.0.6 to resolve this issue

Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

@maggiehays maggiehays added the needs-review Label for PRs that need review from a maintainer. label Jun 3, 2026
…set entities

DomainPlatformAccessResolver passed entityNames=null to
EntitySearchService.aggregateByValue. On DataHub v1.5.x that queries EVERY entity
index (getAllEntityIndicesPatterns) while the ownership filter is only transformed
for the default specs; on indices without an `owners` field the `owners EXISTS
negated` clause then matches every document, so the accessible domain/platform set
was computed over all entities. Effect: home-page Domains/Platforms recommendation
cards were unfiltered (and on some ES/OpenSearch builds the same query returns
empty or throws, blanking the cards). The dataset search path was unaffected
because it never aggregates across all indices like this.

Scope both aggregations to the concrete asset entity types (intersected with the
live registry so unknown names are skipped), keeping the predicate meaningful and
consistent with the asset search filter. Adds a regression test asserting the
aggregation entityNames are asset-scoped and never null.
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

… assets only

The prior fix scoped the access aggregation to a HARDCODED asset-entity list,
which silently dropped any domain whose assets are an entity type not in the list
(e.g. custom production entities) — and the broader owns-OR-unowned predicate
leaked domains through unowned sub-entities: a schemaField of a DHS-owned dataset
inherits domains=DHS but carries no owners, so the EXISTS-negated (unowned)
disjunct matched it and every actor saw DHS.

Two changes, both removing brittleness:
- Scope the aggregation to entity types that carry the `domains` aspect, read
  LIVE from the registry — dynamic, no hardcoded list, automatically covers
  custom entities, and excludes the domain/platform entities themselves (whose
  self-referential fields would otherwise re-add every value).
- Use an ownership-ONLY predicate (owners EQUAL [actor, groups]) for this
  computation, dropping the unowned/EXISTS-negated disjunct so nothing the actor
  does not own can leak a domain/platform. The accessible set is now exactly
  "domains/platforms where the actor or one of their groups owns an asset", which
  is the intended rule. Asset search keeps unowned-visible via its own filter.

Verified end-to-end: a CBP-only user resolves to {CBP} on both the home-page
Domains/Platforms modules and domain/platform search; a CBP+DHS user -> {CBP,DHS};
admin bypass unaffected. Regression test updated.
…; fail-safe home-page recs

JWT/OAuth identity resolution:
OAuth/OIDC flows can authenticate under a synthetic actor URN
(urn:li:corpuser:__oauth_<issuer>_<subject>) while ownership/group data is keyed
on the real user (urn:li:corpuser:alice). Filtering by the synthetic actor matched
no owners/groups, so the user saw empty/incorrect results (including empty
home-page domain/platform modules). OwnershipInstrumentation now resolves the
"effective actor": when the actor URN has the synthetic OAuth prefix, it reads the
real user URN from the __datahub_user_urn claim the auth layer attaches, and uses
it for group resolution, ownership-filter injection, and domain/platform scoping.
This is identity resolution for authorization filtering only -- it does not
validate the token (the authenticator does that).

Fail-safe recommendations:
The home-page domain/platform recommendation filtering is best-effort navigation,
not the security boundary (assets are filtered in search, fail-closed). If access
computation or filtering throws, return recommendations UNFILTERED rather than
blanking the entire home page -- so one failing aggregation can't take down all
home-page modules (Platforms included).

Adds oauthSyntheticActorUsesResolvedUserUrnWhenAvailable. 36 unit tests pass.
…I clients

An external service calling the GraphQL API with a raw Keycloak JWT never goes
through interactive OIDC login, so DataHub has no groupMembership aspect for it.
The ownership filter derived groups solely from that aspect, so group-owned assets
were invisible -- a service that owns assets via its groups saw ~0 results.

OwnershipInstrumentation now reads the caller's groups from the JWT `groups` claim
and unions them with DataHub-recorded membership, mapping each name to
urn:li:corpGroup:<URLEncoder.encode(name)> -- the exact encoding OidcCallbackLogic
uses when provisioning groups (matches e.g. urn:li:corpGroup:%2FDHS%2FCBP).
Best-effort: a malformed/absent claim falls back to DataHub membership; PAT auth
(no groups claim) is unaffected.

Reproduced with an unprovisioned Keycloak user (in group CBP, no DataHub
membership): visible datasets 2 -> 13 and domains [] -> [CBP], with provisioned
users and PAT auth unchanged. Adds
jwtGroupsClaimAugmentsOwnershipFilterForUnprovisionedUser. 37 unit tests pass.
…in (default off)

Adds OWNERSHIP_FILTER_RECOMMENDATIONS_ENABLED (default false). When disabled -- the
default -- listRecommendations is not intercepted at all, so the home-page
Domains/Platforms modules render exactly like stock DataHub (unfiltered). Set the
flag to true to scope those cards to the actor's accessible domains/platforms.

Asset search ownership filtering (the security boundary) is unaffected by this flag
and remains always on. Verified end-to-end: with the flag off a CBP-only user sees
all domains on the home page (stock) while still seeing only their 13 owned
datasets and CBP-scoped domain search. 38 unit tests pass (adds
listRecommendationsWrappedOnlyWhenRecommendationFilteringEnabled).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community devops PR or Issue related to DataHub backend & deployment needs-review Label for PRs that need review from a maintainer. smoke_test Contains changes related to smoke tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants