Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions datahub-ownership-filter/INTERFACES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Coupling Surface — Upgrade Checklist

Read this file on every DataHub version bump. For each row, verify the listed signature
still exists in the new version; if not, update the corresponding wrapper.

## Framework types (third-party — very stable)

| Type | Used in | Notes |
|---|---|---|
| `graphql.execution.instrumentation.Instrumentation` | `OwnershipInstrumentation` | graphql-java framework hook — stable since 14.x |
| `graphql.execution.instrumentation.SimplePerformantInstrumentation` | `OwnershipInstrumentation` | base class |
| `graphql.GraphQL.transform(Consumer<Builder>)` | `OwnershipFilterConfiguration.BeanPostProcessor` | stable public API |
| `graphql.GraphQL.getInstrumentation()` | same | stable public API |
| `graphql.schema.DataFetcher` | wrapping fetchers in instrumentation | core interface |

## DataHub types (project-internal)

| Type | Used in | Coupling kind |
|---|---|---|
| `com.linkedin.datahub.graphql.GraphQLEngine` (Spring bean `graphQLEngine`) | `BeanPostProcessor` target | Bean-name + reflection field name `_graphQL` |
| `com.linkedin.datahub.graphql.QueryContext` | accessed via `DataFetchingEnvironment.getGraphQlContext().get(QueryContext.class)` | API path |
| `com.datahub.authentication.group.GroupService.getGroupsForUser(opCtx, userUrn)` | `CachedGroupResolver` | method signature |
| `com.datahub.plugins.auth.authorization.Authorizer` interface | `OwnershipAuthorizer` | full interface contract |
| `com.linkedin.entity.client.SystemEntityClient.getV2(...)` | `OwnershipAuthorizer.init` | method signature |
| `com.linkedin.common.Ownership` PDL aspect | `OwnershipAuthorizer` | PDL schema |
| `com.linkedin.identity.GroupMembership` PDL aspect | `OwnershipAuthorizer` | PDL schema |
| `com.datahub.plugins.auth.authentication.Authenticator` interface | `KeycloakJwtAuthenticator` | full interface contract |
| `com.datahub.authentication.{Authentication,Actor,ActorType,AuthenticationRequest}` | `KeycloakJwtAuthenticator` | constructors / accessors |
| `com.linkedin.gms.factory.config.ConfigurationProvider` (Spring bean `configurationProvider`) | `keycloakAuthenticatorRegistrar` BeanPostProcessor | Bean-name + `getAuthentication()` |
| `com.datahub.authentication.{AuthenticationConfiguration,AuthenticatorConfiguration}` | registrar BeanPostProcessor | `getAuthenticators()/setAuthenticators()` + `setType/setConfigs` |
| `io.jsonwebtoken:jjwt 0.11.2` (`Jwts.parserBuilder`, `SigningKeyResolver`) | `KeycloakJwtAuthenticator`, `JwksSigningKeyResolver` | library API (0.12.x removed these — pin/verify on upgrade) |

## GraphQL field names (DataHub schema — stable contract)

Allowlist of GraphQL Query field names whose `DataFetcher` we wrap:

| Field name | Input type | Filter slot |
|---|---|---|
| `searchAcrossEntities` | `SearchAcrossEntitiesInput` | `orFilters: [AndFilterInput!]` |
| `scrollAcrossEntities` | `ScrollAcrossEntitiesInput` | `orFilters: [AndFilterInput!]` |
| `searchAcrossLineage` | `SearchAcrossLineageInput` | `orFilters: [AndFilterInput!]` |
| `scrollAcrossLineage` | `ScrollAcrossLineageInput` | `orFilters: [AndFilterInput!]` |
| `autoComplete` | `AutoCompleteInput` | `filters: [FacetFilterInput!]` (verify field name in v1.4) |
| `autoCompleteForMultiple` | `AutoCompleteMultipleInput` | same |
| `browse` | `BrowseInput` | `filters: [FacetFilterInput!]` (verify) |
| `browseV2` | `BrowseV2Input` | `orFilters: [AndFilterInput!]` (verify) |
| `aggregateAcrossEntities` | `AggregateAcrossEntitiesInput` | `orFilters: [AndFilterInput!]` |
| `search` (legacy) | `SearchInput` | `filters: [FacetFilterInput!]` |

## GraphQL input shapes (DataHub schema)

| Type | Fields we use |
|---|---|
| `FacetFilterInput` | `field: String!, values: [String!], condition: FilterOperator, negated: Boolean` |
| `AndFilterInput` | `and: [FacetFilterInput!]` |
| `FilterOperator.EQUAL` | enum value |

## Verification command

```bash
# On every upgrade, run this and compare to the row above:
grep -h "input AutoCompleteInput\|input BrowseInput\|input BrowseV2Input" \
datahub-graphql-core/src/main/resources/*.graphql -A 10
```
115 changes: 115 additions & 0 deletions datahub-ownership-filter/NOTES-reflection-target.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Reflection Target Notes: GraphQLEngine

Verified against commit `647398f588` (DataHub OSS, targeting v1.4.0).
Source file: `datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GraphQLEngine.java`

---

## Step 1: GraphQLEngine field inspection

### 1. Field name holding the `graphql.GraphQL` instance

```
private final GraphQL _graphQL;
```

The field is named **`_graphQL`** (line 49).

### 2. Declared type

`graphql.GraphQL` — imported directly as `graphql.GraphQL` (line 12). No wrapper or subclass.

### 3. Public getter — IMPORTANT for Task 6/7

**There IS a public getter:**

```java
public GraphQL getGraphQL() {
return _graphQL;
}
```

(lines 147–149)

**Consequence for later tasks:** Reflection on `_graphQL` is unnecessary. Use `engine.getGraphQL()`
instead. Task 6 (and any task that planned to use reflection to read the field) should call the
public getter directly. Reflection is still needed only if we need to *replace* the `GraphQL`
instance held by the engine (i.e., write back a transformed instance) — `GraphQLEngine` provides
no setter for `_graphQL`, so field-mutation via reflection would still be required for an
inject-and-replace approach.

### 4. Constructor argument names

The private constructor signature (lines 54–59):

```java
private GraphQLEngine(
@Nonnull final List<String> schemas,
@Nonnull final RuntimeWiring runtimeWiring,
@Nonnull final Map<String, Function<QueryContext, DataLoader<?, ?>>> dataLoaderSuppliers,
@Nonnull GraphQLConfiguration graphQLConfiguration,
MetricUtils metricUtils)
```

Parameters in order:
1. `schemas` — `List<String>` of SDL schema strings
2. `runtimeWiring` — `graphql.schema.idl.RuntimeWiring`
3. `dataLoaderSuppliers` — `Map<String, Function<QueryContext, DataLoader<?, ?>>>`
4. `graphQLConfiguration` — `com.linkedin.metadata.config.GraphQLConfiguration`
5. `metricUtils` — `com.linkedin.metadata.utils.metrics.MetricUtils` (nullable — constructor is
called with `null` in tests)

The constructor is `private`; construction goes through `GraphQLEngine.Builder`. The builder
exposes: `addSchema()`, `addDataLoader()`, `addDataLoaders()`, `configureRuntimeWiring()`,
`setGraphQLConfiguration()`, `setMetricUtils()`, then `build()`.

---

## Step 2: graphql-java version and API surface

### Pinned version

```
com.graphql-java:graphql-java:22.3
```

Declared in `build.gradle` under `externalDependency.graphqlJava`.

### `GraphQL.transform()` and `GraphQL.getInstrumentation()`

Both methods have been stable public API since graphql-java **14.x** (circa 2019).

- **`GraphQL.transform(Consumer<Builder> builderConsumer)`** — copies the current `GraphQL`
instance into a new `Builder`, applies the consumer, and returns a new `GraphQL`. Available in
22.x; the standard way to swap/augment instrumentation without reconstructing from scratch.
- **`GraphQL.getInstrumentation()`** — returns the `Instrumentation` registered on the instance.
In 22.x this is the `ChainedInstrumentation` wrapping all instrumentations. Available and public
in 22.x.

At version 22.3 neither method is deprecated. Both are confirmed present in the graphql-java 22.x
changelog and source (https://github.com/graphql-java/graphql-java).

---

## Summary for ownership-filter architecture

| Question | Answer |
|---|---|
| Field name | `_graphQL` |
| Field type | `graphql.GraphQL` |
| Public getter? | **Yes** — `getGraphQL()` (no reflection needed to *read*) |
| Setter? | No — reflection required to *replace* `_graphQL` with a transformed instance |
| graphql-java version | `22.3` |
| `transform()` available? | Yes (stable since 14.x, present in 22.3) |
| `getInstrumentation()` available? | Yes (stable since 14.x, present in 22.3) |

### Recommended inject strategy (for Task 6)

1. Read the existing `GraphQL` via `engine.getGraphQL()` (no reflection needed).
2. Call `existingGraphQL.transform(b -> b.instrumentation(newChainedInstrumentation))` to produce
a new `GraphQL` with the ownership filter instrumentation prepended to the chain.
3. Write the new `GraphQL` back into `engine._graphQL` via reflection (field is `private final`,
so `setAccessible(true)` is required — or use a `VarHandle` on Java 9+).

Alternatively, wrap `GraphQLEngine.execute()` at the call site (e.g., in the servlet layer) to
inject the instrumentation per-request via `ExecutionInput`, which avoids field mutation entirely.
Loading
Loading