feat(ownership-filter): aim group-ownership GraphQL filter #17695
feat(ownership-filter): aim group-ownership GraphQL filter #17695LynxPx wants to merge 6 commits into
Conversation
LynxPx
commented
Jun 3, 2026
- Group-ownership filtering of search / autocomplete / lineage / browse via a graphql-java Instrumentation chained onto the graphQLEngine bean (no upstream resolver or service edits).
- Unowned assets are visible to all users (owners-EXISTS-negated disjunct).
- Domain/platform cards and domain/platform search scoped to the actor's accessible set (DomainPlatformAccessResolver), including home-page recommendation-card filtering.
- Keycloak / plain-JWT authenticator registered into DataHub's auth chain (KeycloakJwtAuthenticator + JwksSigningKeyResolver). Inert unless KEYCLOAK_JWKS_URI is set, so the build ships safely disabled by default.
- QueryContext read from both the GraphQL context map and the legacy DataFetchingEnvironment context, so one module works on master and v1.3.x.
…(JDK 21) Ports the complete datahub-ownership-filter module onto master, carrying every constraint built during the v1.3.0.1 work, with the same minimal-footprint, sits-above-OSS design: - Group-ownership filtering of search / autocomplete / lineage / browse via a graphql-java Instrumentation chained onto the graphQLEngine bean (no upstream resolver or service edits). - Unowned assets are visible to all users (owners-EXISTS-negated disjunct). - Domain/platform cards and domain/platform search scoped to the actor's accessible set (DomainPlatformAccessResolver), including home-page recommendation-card filtering. - Keycloak / plain-JWT authenticator registered into DataHub's auth chain (KeycloakJwtAuthenticator + JwksSigningKeyResolver). Inert unless KEYCLOAK_JWKS_URI is set, so the build ships safely disabled by default. - QueryContext read from both the GraphQL context map and the legacy DataFetchingEnvironment context, so one module works on master and v1.3.x. Upstream footprint is 2 wiring lines (settings.gradle, metadata-service/war/build.gradle). Module toolchain set to JDK 21 to match master; compiles against master APIs (graphql-java 22.3); 34 unit tests pass.
|
Linear: CAT-2291 Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned. |
There was a problem hiding this comment.
CVE-2026-40976 in org.springframework.boot:spring-boot - critical severity
In certain circumstances, Spring Boot's default web security is ineffective allowing unauthorized access to all endpoints. For an application to be vulnerable, it must: be a servlet-based web application; have no Spring Security configuration of its own and rely on the default web security filter chain; depend on spring-boot-actuator-autoconfigure; not depend on spring-boot-health. If any of the above does not apply, the application is not vulnerable.
Affected: Spring Boot 4.0.0–4.0.5; upgrade to 4.0.6 or later per vendor advisory.
Details
Remediation Aikido suggests bumping this package to version 4.0.6 to resolve this issue
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
…set entities DomainPlatformAccessResolver passed entityNames=null to EntitySearchService.aggregateByValue. On DataHub v1.5.x that queries EVERY entity index (getAllEntityIndicesPatterns) while the ownership filter is only transformed for the default specs; on indices without an `owners` field the `owners EXISTS negated` clause then matches every document, so the accessible domain/platform set was computed over all entities. Effect: home-page Domains/Platforms recommendation cards were unfiltered (and on some ES/OpenSearch builds the same query returns empty or throws, blanking the cards). The dataset search path was unaffected because it never aggregates across all indices like this. Scope both aggregations to the concrete asset entity types (intersected with the live registry so unknown names are skipped), keeping the predicate meaningful and consistent with the asset search filter. Adds a regression test asserting the aggregation entityNames are asset-scoped and never null.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
… assets only
The prior fix scoped the access aggregation to a HARDCODED asset-entity list,
which silently dropped any domain whose assets are an entity type not in the list
(e.g. custom production entities) — and the broader owns-OR-unowned predicate
leaked domains through unowned sub-entities: a schemaField of a DHS-owned dataset
inherits domains=DHS but carries no owners, so the EXISTS-negated (unowned)
disjunct matched it and every actor saw DHS.
Two changes, both removing brittleness:
- Scope the aggregation to entity types that carry the `domains` aspect, read
LIVE from the registry — dynamic, no hardcoded list, automatically covers
custom entities, and excludes the domain/platform entities themselves (whose
self-referential fields would otherwise re-add every value).
- Use an ownership-ONLY predicate (owners EQUAL [actor, groups]) for this
computation, dropping the unowned/EXISTS-negated disjunct so nothing the actor
does not own can leak a domain/platform. The accessible set is now exactly
"domains/platforms where the actor or one of their groups owns an asset", which
is the intended rule. Asset search keeps unowned-visible via its own filter.
Verified end-to-end: a CBP-only user resolves to {CBP} on both the home-page
Domains/Platforms modules and domain/platform search; a CBP+DHS user -> {CBP,DHS};
admin bypass unaffected. Regression test updated.
…; fail-safe home-page recs JWT/OAuth identity resolution: OAuth/OIDC flows can authenticate under a synthetic actor URN (urn:li:corpuser:__oauth_<issuer>_<subject>) while ownership/group data is keyed on the real user (urn:li:corpuser:alice). Filtering by the synthetic actor matched no owners/groups, so the user saw empty/incorrect results (including empty home-page domain/platform modules). OwnershipInstrumentation now resolves the "effective actor": when the actor URN has the synthetic OAuth prefix, it reads the real user URN from the __datahub_user_urn claim the auth layer attaches, and uses it for group resolution, ownership-filter injection, and domain/platform scoping. This is identity resolution for authorization filtering only -- it does not validate the token (the authenticator does that). Fail-safe recommendations: The home-page domain/platform recommendation filtering is best-effort navigation, not the security boundary (assets are filtered in search, fail-closed). If access computation or filtering throws, return recommendations UNFILTERED rather than blanking the entire home page -- so one failing aggregation can't take down all home-page modules (Platforms included). Adds oauthSyntheticActorUsesResolvedUserUrnWhenAvailable. 36 unit tests pass.
…I clients An external service calling the GraphQL API with a raw Keycloak JWT never goes through interactive OIDC login, so DataHub has no groupMembership aspect for it. The ownership filter derived groups solely from that aspect, so group-owned assets were invisible -- a service that owns assets via its groups saw ~0 results. OwnershipInstrumentation now reads the caller's groups from the JWT `groups` claim and unions them with DataHub-recorded membership, mapping each name to urn:li:corpGroup:<URLEncoder.encode(name)> -- the exact encoding OidcCallbackLogic uses when provisioning groups (matches e.g. urn:li:corpGroup:%2FDHS%2FCBP). Best-effort: a malformed/absent claim falls back to DataHub membership; PAT auth (no groups claim) is unaffected. Reproduced with an unprovisioned Keycloak user (in group CBP, no DataHub membership): visible datasets 2 -> 13 and domains [] -> [CBP], with provisioned users and PAT auth unchanged. Adds jwtGroupsClaimAugmentsOwnershipFilterForUnprovisionedUser. 37 unit tests pass.
…in (default off) Adds OWNERSHIP_FILTER_RECOMMENDATIONS_ENABLED (default false). When disabled -- the default -- listRecommendations is not intercepted at all, so the home-page Domains/Platforms modules render exactly like stock DataHub (unfiltered). Set the flag to true to scope those cards to the actor's accessible domains/platforms. Asset search ownership filtering (the security boundary) is unaffected by this flag and remains always on. Verified end-to-end: with the flag off a CBP-only user sees all domains on the home page (stock) while still seeing only their 13 owned datasets and CBP-scoped domain search. 38 unit tests pass (adds listRecommendationsWrappedOnlyWhenRecommendationFilteringEnabled).