From 7b9096dc744345eba2912a562ffb634a54325a5d Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Wed, 13 May 2026 16:59:19 +0530 Subject: [PATCH 01/20] feat(ingestion): per-connector CLI version matrix + resolution stamp + test-connection path MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds per-connector CLI version resolution so a Snowflake fix can ship without forcing a bump on every other connector. A JSON matrix hosted at INGESTION_VERSION_MATRIX_URL maps server release x connector -> version, with optional per-deployment cohort allowlists for canary rollouts. Matrix schema: { "1.3.1.4": { "snowflake": { "_default": "1.3.1.4", "cohorts": [ { "version": "1.3.1.5", "deployments": ["deployment-slug-1", "deployment-slug-2"] } ] } } } `deployments` entries are matched against `ingestion.deploymentId`, which is sourced from the existing DATAHUB_EXECUTOR_CUSTOMER_ID env var that the Acryl Cloud Helm chart already injects from the K8s namespace. The leading underscore on `_default` marks it as a sentinel key so it can't collide with a connector name. Resolution priority (top wins): 1. Per-source explicit `config.version` (unchanged) 2. matrix[serverVersion][connectorType].cohorts - first cohort whose `deployments` list contains this deployment's id 3. matrix[serverVersion][connectorType]._default 4. defaultCliVersion from application.yaml (unchanged fallback) Storage is pluggable. Matrix consumption (IngestionVersionMatrixService) is decoupled from where the matrix lives via a MatrixSource interface: - HttpUrlMatrixSource - periodic HTTP GET (the URL-backed default) - NoOpMatrixSource - empty matrix, wired when no URL is configured so the consumer never needs null checks HttpUrlMatrixSource optionally sends an `Authorization` request header when INGESTION_VERSION_MATRIX_AUTH_TOKEN is set, so the matrix can live behind authentication (e.g. a private GitHub repo's raw URL). Format is verbatim - "token ghp_xxx" for a GitHub PAT, "Bearer ey..." for an OIDC token. Property name ends with "Token" so PropertiesCollector's keyword-based redaction catches it in the system-info endpoint. Unset by default; public URLs work unchanged. Future backends (GMS aspect on a globalSettings entity, AppConfig/Consul/ etcd, signed S3) just implement MatrixSource - the resolver and the resolution stamp don't change. Three execution entry points are covered: - CreateIngestionExecutionRequestResolver (manual triggers) - IngestionScheduler.ExecutionRequestRunnable (scheduled triggers) - CreateTestConnectionRequestResolver (test-connection from UI) Each one stamps a structured CliVersionResolution record on the resulting ExecutionRequestInput aspect. The stamp captures provenance only - which tier of the chain produced the version, and the GMS server version at write time. The CLI version string itself remains in `args.version` (the wire-format field consumed by the executor) - unchanged contract, no duplication: args: { "version": "1.3.1.5", ... } cliVersionResolution: { source: MATRIX_COHORT, serverVersion: "1.3.1.4" } `serverVersion` is populated on every tier (PER_SOURCE, MATRIX_*, WORKSPACE_DEFAULT) so cross-version analytics on execution requests work without a separate per-aspect GMS version stamp. Test connections also now honor defaultCliVersion when no explicit version is supplied - previously they silently omitted version, which caused the executor to use whatever bundled CLI it shipped with rather than the configured default. Feature is off by default - when INGESTION_VERSION_MATRIX_URL is unset the factory wires a NoOpMatrixSource and every resolveVersion() returns empty, so existing behavior is preserved exactly. In single-tenant deployments without a deploymentId set, cohort matching never fires and only the per-connector `_default` from the matrix applies. Also: treat empty version strings as unset (bootstrap YAML can render "" for the version field, and an empty value forwarded to the executor silently falls back to the bundled CLI rather than the configured default). Config: INGESTION_VERSION_MATRIX_URL (default: empty / disabled) INGESTION_VERSION_MATRIX_REFRESH_SECONDS (default: 600) INGESTION_VERSION_MATRIX_AUTH_TOKEN (default: empty / no auth) DATAHUB_EXECUTOR_CUSTOMER_ID (existing; Acryl Cloud Helm chart injects this from the K8s namespace — sourced into `ingestion.deploymentId`) --- .../datahub/graphql/GmsGraphQLEngine.java | 11 +- .../datahub/graphql/GmsGraphQLEngineArgs.java | 2 + ...eateIngestionExecutionRequestResolver.java | 41 +- .../CreateTestConnectionRequestResolver.java | 97 +++- ...IngestionExecutionRequestResolverTest.java | 166 ++++++ ...eateTestConnectionRequestResolverTest.java | 348 ++++++++++--- .../ingestion/IngestionScheduler.java | 28 +- .../ingestion/IngestionSchedulerTest.java | 9 +- .../PropertiesCollectorConfigurationTest.java | 10 +- .../execution/CliVersionProvenance.pdl | 39 ++ .../execution/ExecutionRequestInput.pdl | 14 +- metadata-service/configuration/build.gradle | 5 + .../configuration/gradle.lockfile | 27 +- .../config/IngestionConfiguration.java | 39 ++ .../ingestion/CliVersionResolutionHelper.java | 130 +++++ .../linkedin/metadata/ingestion/Cohort.java | 32 ++ .../metadata/ingestion/ConnectorEntry.java | 34 ++ .../ingestion/HttpUrlMatrixSource.java | 340 ++++++++++++ .../IngestionVersionMatrixService.java | 141 +++++ .../linkedin/metadata/ingestion/Matrix.java | 42 ++ .../metadata/ingestion/MatrixSource.java | 49 ++ .../metadata/ingestion/NoOpMatrixSource.java | 22 + .../src/main/resources/application.yaml | 16 + .../CliVersionResolutionHelperTest.java | 136 +++++ .../HttpUrlMatrixSourceValidationTest.java | 177 +++++++ .../IngestionVersionMatrixServiceTest.java | 484 ++++++++++++++++++ .../factory/graphql/GraphQLEngineFactory.java | 6 + .../ingestion/IngestionSchedulerFactory.java | 8 +- .../IngestionVersionMatrixServiceFactory.java | 79 +++ .../graphql/GraphQLEngineFactoryTest.java | 6 + ...estionVersionMatrixServiceFactoryTest.java | 108 ++++ 31 files changed, 2538 insertions(+), 108 deletions(-) create mode 100644 metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Cohort.java create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ConnectorEntry.java create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSource.java create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixService.java create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Matrix.java create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/MatrixSource.java create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpMatrixSource.java create mode 100644 metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java create mode 100644 metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSourceValidationTest.java create mode 100644 metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixServiceTest.java create mode 100644 metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactory.java create mode 100644 metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactoryTest.java diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java index 21cd370229cc..fcdf7ee65b0a 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java @@ -347,6 +347,7 @@ import com.linkedin.metadata.entity.versioning.EntityVersioningService; import com.linkedin.metadata.graph.GraphClient; import com.linkedin.metadata.graph.SiblingGraphService; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import com.linkedin.metadata.models.registry.EntityRegistry; import com.linkedin.metadata.query.filter.SortCriterion; import com.linkedin.metadata.query.filter.SortOrder; @@ -453,6 +454,7 @@ public class GmsGraphQLEngine { private final FeatureFlags featureFlags; private final IngestionConfiguration ingestionConfiguration; + private final IngestionVersionMatrixService ingestionVersionMatrixService; private final AuthenticationConfiguration authenticationConfiguration; private final AuthorizationConfiguration authorizationConfiguration; private final VisualConfiguration visualConfiguration; @@ -596,6 +598,7 @@ public GmsGraphQLEngine(final GmsGraphQLEngineArgs args) { this.businessAttributeService = args.businessAttributeService; this.ingestionConfiguration = Objects.requireNonNull(args.ingestionConfiguration); + this.ingestionVersionMatrixService = args.ingestionVersionMatrixService; this.authenticationConfiguration = Objects.requireNonNull(args.authenticationConfiguration); this.authorizationConfiguration = Objects.requireNonNull(args.authorizationConfiguration); this.visualConfiguration = args.visualConfiguration; @@ -1368,14 +1371,18 @@ private void configureMutationResolvers(final RuntimeWiring.Builder builder) { .dataFetcher( "createIngestionExecutionRequest", new CreateIngestionExecutionRequestResolver( - this.entityClient, this.ingestionConfiguration)) + this.entityClient, + this.ingestionConfiguration, + this.ingestionVersionMatrixService)) .dataFetcher( "cancelIngestionExecutionRequest", new CancelIngestionExecutionRequestResolver(this.entityClient)) .dataFetcher( "createTestConnectionRequest", new CreateTestConnectionRequestResolver( - this.entityClient, this.ingestionConfiguration)) + this.entityClient, + this.ingestionConfiguration, + this.ingestionVersionMatrixService)) .dataFetcher( "upsertCustomAssertion", new UpsertCustomAssertionResolver(assertionService)) .dataFetcher( diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngineArgs.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngineArgs.java index ba6a702fa71b..f4b0adb533ed 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngineArgs.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngineArgs.java @@ -21,6 +21,7 @@ import com.linkedin.metadata.entity.versioning.EntityVersioningService; import com.linkedin.metadata.graph.GraphClient; import com.linkedin.metadata.graph.SiblingGraphService; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import com.linkedin.metadata.models.registry.EntityRegistry; import com.linkedin.metadata.recommendation.RecommendationsService; import com.linkedin.metadata.search.SemanticSearchService; @@ -64,6 +65,7 @@ public class GmsGraphQLEngineArgs { SecretService secretService; NativeUserService nativeUserService; IngestionConfiguration ingestionConfiguration; + IngestionVersionMatrixService ingestionVersionMatrixService; AuthenticationConfiguration authenticationConfiguration; AuthorizationConfiguration authorizationConfiguration; GitVersion gitVersion; diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java index 213112514a01..3c166bd9a53e 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java @@ -23,6 +23,8 @@ import com.linkedin.execution.ExecutionRequestSource; import com.linkedin.ingestion.DataHubIngestionSourceInfo; import com.linkedin.metadata.config.IngestionConfiguration; +import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.utils.EntityKeyUtils; import com.linkedin.metadata.utils.IngestionUtils; @@ -48,11 +50,26 @@ public class CreateIngestionExecutionRequestResolver private final EntityClient _entityClient; private final IngestionConfiguration _ingestionConfiguration; + private final IngestionVersionMatrixService _versionMatrixService; + /** Two-arg constructor — no per-connector version matrix is consulted. */ public CreateIngestionExecutionRequestResolver( final EntityClient entityClient, final IngestionConfiguration ingestionConfiguration) { + this(entityClient, ingestionConfiguration, null); + } + + /** + * Three-arg constructor for deployments that want matrix-aware version resolution. When {@code + * versionMatrixService} is non-null, the per-connector version matrix is consulted before falling + * back to {@code defaultCliVersion}. + */ + public CreateIngestionExecutionRequestResolver( + final EntityClient entityClient, + final IngestionConfiguration ingestionConfiguration, + final IngestionVersionMatrixService versionMatrixService) { _entityClient = entityClient; _ingestionConfiguration = ingestionConfiguration; + _versionMatrixService = versionMatrixService; } @Override @@ -122,11 +139,25 @@ public CompletableFuture get(final DataFetchingEnvironment environment) recipe = injectRunId(recipe, executionRequestUrn.toString()); recipe = IngestionUtils.injectPipelineName(recipe, ingestionSourceUrn.toString()); arguments.put(RECIPE_ARG_NAME, recipe); - arguments.put( - VERSION_ARG_NAME, - IngestionUtils.resolveIngestionCliVersion( - ingestionSourceInfo.getConfig().getVersion(), - _ingestionConfiguration.getDefaultCliVersion())); + // Per-source version may be null, empty, or whitespace-only (bootstrap YAML + // templating can render any of these); the helper normalizes all three to "unset" + // and falls through to the matrix / workspace default. See #17471 for the + // whitespace-only edge case. + final String explicitVersion = + ingestionSourceInfo.getConfig().hasVersion() + ? ingestionSourceInfo.getConfig().getVersion() + : null; + final CliVersionResolutionHelper.Result resolution = + CliVersionResolutionHelper.resolve( + explicitVersion, + ingestionSourceInfo.getType(), + _versionMatrixService, + _ingestionConfiguration.getDefaultCliVersion(), + _versionMatrixService != null + ? _versionMatrixService.getServerVersion() + : null); + arguments.put(VERSION_ARG_NAME, resolution.getVersion()); + execInput.setCliVersionProvenance(resolution.getStamp()); String debugMode = "false"; if (ingestionSourceInfo.getConfig().hasDebugMode()) { debugMode = ingestionSourceInfo.getConfig().isDebugMode() ? "true" : "false"; diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java index a22d4dd3da0f..b3770f0682c8 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java @@ -16,6 +16,8 @@ import com.linkedin.execution.ExecutionRequestInput; import com.linkedin.execution.ExecutionRequestSource; import com.linkedin.metadata.config.IngestionConfiguration; +import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.utils.EntityKeyUtils; import com.linkedin.metadata.utils.IngestionUtils; @@ -26,8 +28,30 @@ import java.util.Map; import java.util.UUID; import java.util.concurrent.CompletableFuture; +import lombok.extern.slf4j.Slf4j; +import org.json.JSONException; +import org.json.JSONObject; -/** Creates an on-demand ingestion execution request. */ +/** + * Creates an on-demand "test connection" ingestion execution request. + * + *

Version resolution priority (top wins): + * + *

    + *
  1. {@code input.version} — explicit per-request override (existing behavior) + *
  2. {@code matrix[serverVersion][source.type]} — connector-specific version pin from {@link + * IngestionVersionMatrixService} when enabled + *
  3. {@code matrix[serverVersion][source.type]._default} + *
  4. {@link IngestionConfiguration#getDefaultCliVersion()} — workspace-wide fallback + *
+ * + *

Prior to this change the test-connection path silently omitted {@code version} when the input + * did not provide one, causing the executor to fall back to whatever bundled default it shipped + * with — different from the path that real (non-test) executions take. The {@code + * defaultCliVersion} fallback below closes that gap; the matrix lookup brings test connections onto + * the same per-connector-pin behavior real executions get. + */ +@Slf4j public class CreateTestConnectionRequestResolver implements DataFetcher> { private static final String TEST_CONNECTION_TASK_NAME = "TEST_CONNECTION"; @@ -35,14 +59,31 @@ public class CreateTestConnectionRequestResolver implements DataFetcher get(final DataFetchingEnvironment environment) RECIPE_ARG_NAME, IngestionUtils.injectPipelineName( input.getRecipe(), executionRequestUrn.toString())); - // Mirror the manual-ingestion path (CreateIngestionExecutionRequestResolver) which - // routes the same call through IngestionUtils.resolveIngestionCliVersion. Without - // this, a test-connection request with no input.version (or a blank one) silently - // omits args.version, causing the executor to fall back to its bundled CLI version - // rather than the configured defaultCliVersion. That divergence makes test - // connections run on a different CLI than the actual ingestion will use — hiding - // compatibility issues that surface in production. - arguments.put( - VERSION_ARG_NAME, - IngestionUtils.resolveIngestionCliVersion( - input.getVersion(), _ingestionConfiguration.getDefaultCliVersion())); + // input.getVersion() may be null, empty, or whitespace-only (UI forms can submit any + // of these); the helper normalizes all three to "unset" and falls through to the + // matrix / workspace default. See #17471 for the whitespace-only edge case. + final CliVersionResolutionHelper.Result resolution = + CliVersionResolutionHelper.resolve( + input.getVersion(), + extractSourceType(input.getRecipe()), + _versionMatrixService, + _ingestionConfiguration.getDefaultCliVersion(), + _versionMatrixService != null + ? _versionMatrixService.getServerVersion() + : null); + if (resolution.getVersion() != null && !resolution.getVersion().isEmpty()) { + arguments.put(VERSION_ARG_NAME, resolution.getVersion()); + } execInput.setArgs(new StringMap(arguments)); + execInput.setCliVersionProvenance(resolution.getStamp()); final MetadataChangeProposal proposal = buildMetadataChangeProposalWithKey( @@ -110,4 +156,31 @@ public CompletableFuture get(final DataFetchingEnvironment environment) this.getClass().getSimpleName(), "get"); } + + /** + * Best-effort extraction of {@code source.type} from a recipe JSON document. Returns {@code null} + * for any malformed input — the resolver falls back to {@code defaultCliVersion} in that case + * rather than failing the request, since a malformed recipe will surface a clearer error + * downstream when the executor parses it. + */ + static String extractSourceType(final String recipeJson) { + if (recipeJson == null || recipeJson.isEmpty()) { + return null; + } + try { + JSONObject recipe = new JSONObject(recipeJson); + if (!recipe.has(SOURCE_FIELD)) { + return null; + } + JSONObject source = recipe.getJSONObject(SOURCE_FIELD); + if (!source.has(TYPE_FIELD)) { + return null; + } + String type = source.getString(TYPE_FIELD); + return (type != null && !type.isEmpty()) ? type : null; + } catch (JSONException e) { + log.debug("Could not extract source.type from recipe for version-matrix lookup", e); + return null; + } + } } diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java index 1c9116107ab7..eee9c81fbcf6 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java @@ -17,13 +17,17 @@ import com.linkedin.execution.ExecutionRequestInput; import com.linkedin.ingestion.DataHubIngestionSourceConfig; import com.linkedin.ingestion.DataHubIngestionSourceInfo; +import com.linkedin.ingestion.DataHubIngestionSourceSchedule; import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import com.linkedin.metadata.utils.GenericRecordUtils; import com.linkedin.mxe.MetadataChangeProposal; import com.linkedin.r2.RemoteInvocationException; import graphql.schema.DataFetchingEnvironment; +import java.nio.charset.StandardCharsets; import java.util.HashSet; +import org.json.JSONObject; import org.mockito.ArgumentCaptor; import org.mockito.Mockito; import org.testng.annotations.Test; @@ -92,6 +96,168 @@ public void testGetUnauthorized() throws Exception { Mockito.verify(mockClient, Mockito.times(0)).ingestProposal(any(), Mockito.any(), anyBoolean()); } + // --------------------------------------------------------------------------- + // Version matrix tests — use a source with NO per-source version so the + // resolver falls through to the matrix / default-cli-version path. + // --------------------------------------------------------------------------- + + /** + * When the version matrix has an entry for the connector type, that version should win over the + * global defaultCliVersion. + */ + @Test + public void testVersionMatrixConnectorSpecificVersionUsed() throws Exception { + EntityClient mockClient = Mockito.mock(EntityClient.class); + mockBatchGetV2(mockClient, sourceWithoutVersion("snowflake")); + + IngestionConfiguration config = new IngestionConfiguration(); + config.setDefaultCliVersion("default-global"); + + // Matrix maps "1.3.1.4" → { "snowflake": "matrix-snowflake-version" } + IngestionVersionMatrixService matrixService = + matrixServiceForConnector("snowflake", "matrix-snowflake-version", "1.3.1.4"); + + CreateIngestionExecutionRequestResolver resolver = + new CreateIngestionExecutionRequestResolver(mockClient, config, matrixService); + + String resolvedVersion = executeAndCaptureVersion(mockClient, resolver); + assertEquals(resolvedVersion, "matrix-snowflake-version"); + } + + /** + * When the matrix has no entry for the connector under the current server version, the resolver + * falls back to the global {@code defaultCliVersion}. (Replaces the old {@code _default} + * server-level fallback the previous schema offered — the new schema requires explicit + * per-connector entries, and unknown connectors fall through to the workspace default.) + */ + @Test + public void testVersionMatrixConnectorNotPresent_fallsBackToDefaultCliVersion() throws Exception { + EntityClient mockClient = Mockito.mock(EntityClient.class); + mockBatchGetV2(mockClient, sourceWithoutVersion("mysql")); + + IngestionConfiguration config = new IngestionConfiguration(); + config.setDefaultCliVersion("default-global"); + + // Matrix has snowflake only; mysql is absent. + IngestionVersionMatrixService matrixService = + matrixServiceForConnector("snowflake", "matrix-snowflake-version", "1.3.1.4"); + + CreateIngestionExecutionRequestResolver resolver = + new CreateIngestionExecutionRequestResolver(mockClient, config, matrixService); + + String resolvedVersion = executeAndCaptureVersion(mockClient, resolver); + assertEquals(resolvedVersion, "default-global"); + } + + /** When the matrix is disabled (null URL), the global {@code defaultCliVersion} is used. */ + @Test + public void testVersionMatrixMissFallsBackToDefaultCliVersion() throws Exception { + EntityClient mockClient = Mockito.mock(EntityClient.class); + mockBatchGetV2(mockClient, sourceWithoutVersion("mysql")); + + IngestionConfiguration config = new IngestionConfiguration(); + config.setDefaultCliVersion("default-global"); + + // Matrix service backed by a NoOp source → always returns empty + IngestionVersionMatrixService matrixService = + new IngestionVersionMatrixService( + new com.linkedin.metadata.ingestion.NoOpMatrixSource(), "1.3.1.4", null); + + CreateIngestionExecutionRequestResolver resolver = + new CreateIngestionExecutionRequestResolver(mockClient, config, matrixService); + + String resolvedVersion = executeAndCaptureVersion(mockClient, resolver); + assertEquals(resolvedVersion, "default-global"); + } + + // --------------------------------------------------------------------------- + // Helpers + // --------------------------------------------------------------------------- + + private static DataHubIngestionSourceInfo sourceWithoutVersion(String connectorType) { + DataHubIngestionSourceInfo info = new DataHubIngestionSourceInfo(); + info.setName("Test Source"); + info.setType(connectorType); + info.setSchedule( + new DataHubIngestionSourceSchedule().setTimezone("UTC").setInterval("* * * * *")); + // Deliberately omit .setVersion() so version resolution falls through to the matrix + info.setConfig(new DataHubIngestionSourceConfig().setRecipe("{}").setExecutorId("default")); + return info; + } + + private static void mockBatchGetV2(EntityClient mockClient, DataHubIngestionSourceInfo info) + throws Exception { + Mockito.when( + mockClient.batchGetV2( + any(), + Mockito.eq(Constants.INGESTION_SOURCE_ENTITY_NAME), + Mockito.eq(new HashSet<>(ImmutableSet.of(TEST_INGESTION_SOURCE_URN))), + Mockito.eq(ImmutableSet.of(Constants.INGESTION_INFO_ASPECT_NAME)))) + .thenReturn( + ImmutableMap.of( + TEST_INGESTION_SOURCE_URN, + new EntityResponse() + .setEntityName(Constants.INGESTION_SOURCE_ENTITY_NAME) + .setUrn(TEST_INGESTION_SOURCE_URN) + .setAspects( + new EnvelopedAspectMap( + ImmutableMap.of( + Constants.INGESTION_INFO_ASPECT_NAME, + new EnvelopedAspect().setValue(new Aspect(info.data()))))))); + } + + /** + * Returns a matrix service pre-loaded with a single entry under the new nested schema: + * + *

{@code
+   * { "": { "": { "_default": "" } } }
+   * }
+ * + *

deploymentId is left null since these tests don't exercise cohort matching. + */ + private static IngestionVersionMatrixService matrixServiceForConnector( + String connector, String version, String serverVersion) throws Exception { + String json = + String.format("{\"%s\":{\"%s\":{\"_default\":\"%s\"}}}", serverVersion, connector, version); + + java.nio.file.Path tmp = java.nio.file.Files.createTempFile("matrix", ".json"); + java.nio.file.Files.write(tmp, json.getBytes()); + tmp.toFile().deleteOnExit(); + + com.linkedin.metadata.ingestion.HttpUrlMatrixSource httpSource = + new com.linkedin.metadata.ingestion.HttpUrlMatrixSource(tmp.toUri().toString(), 3600); + IngestionVersionMatrixService svc = + new IngestionVersionMatrixService(httpSource, serverVersion, null); + + // Wait for the initial background fetch to complete + for (int i = 0; i < 20; i++) { + if (svc.resolveVersion(connector).isPresent()) { + break; + } + Thread.sleep(100); + } + return svc; + } + + /** Runs the resolver and returns the {@code version} value from the captured execution args. */ + private static String executeAndCaptureVersion( + EntityClient mockClient, CreateIngestionExecutionRequestResolver resolver) throws Exception { + QueryContext mockContext = getMockAllowContext(); + DataFetchingEnvironment mockEnv = Mockito.mock(DataFetchingEnvironment.class); + Mockito.when(mockEnv.getArgument(Mockito.eq("input"))).thenReturn(TEST_INPUT); + Mockito.when(mockEnv.getContext()).thenReturn(mockContext); + + resolver.get(mockEnv).get(); + + ArgumentCaptor captor = + ArgumentCaptor.forClass(MetadataChangeProposal.class); + Mockito.verify(mockClient, Mockito.atLeastOnce()) + .ingestProposal(any(), captor.capture(), anyBoolean()); + + String aspectJson = captor.getValue().getAspect().getValue().asString(StandardCharsets.UTF_8); + return new JSONObject(aspectJson).getJSONObject("args").getString("version"); + } + @Test public void testGetEntityClientException() throws Exception { // Create resolver diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java index 627a7cbf549f..de77b97277f3 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java @@ -5,139 +5,337 @@ import static org.mockito.ArgumentMatchers.anyBoolean; import static org.testng.Assert.*; +import com.linkedin.data.template.StringMap; import com.linkedin.datahub.graphql.QueryContext; import com.linkedin.datahub.graphql.generated.CreateTestConnectionRequestInput; import com.linkedin.entity.client.EntityClient; +import com.linkedin.execution.ExecutionRequestInput; +import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.utils.GenericRecordUtils; import com.linkedin.mxe.MetadataChangeProposal; import graphql.schema.DataFetchingEnvironment; -import java.nio.charset.StandardCharsets; -import org.json.JSONObject; +import java.util.Optional; import org.mockito.ArgumentCaptor; import org.mockito.Mockito; import org.testng.annotations.Test; public class CreateTestConnectionRequestResolverTest { - private static final String DEFAULT_CLI_VERSION = "default-cli-version"; - private static final CreateTestConnectionRequestInput TEST_INPUT = - new CreateTestConnectionRequestInput("{}", "0.8.44"); + private static final String DEFAULT_VERSION = "0.14.0"; + private static final String EXPLICIT_VERSION = "0.8.44"; + private static final String MATRIX_SNOWFLAKE_VERSION = "0.13.0.1"; + private static final String SNOWFLAKE_RECIPE = + "{\"source\":{\"type\":\"snowflake\",\"config\":{\"account_id\":\"abc123\"}}}"; + private static final String RECIPE_WITHOUT_TYPE = + "{\"source\":{\"config\":{\"account_id\":\"abc123\"}}}"; + private static final String MALFORMED_RECIPE = "{not valid json"; + + private static final CreateTestConnectionRequestInput TEST_INPUT_WITH_VERSION = + new CreateTestConnectionRequestInput(SNOWFLAKE_RECIPE, EXPLICIT_VERSION); + private static final CreateTestConnectionRequestInput TEST_INPUT_NO_VERSION = + new CreateTestConnectionRequestInput(SNOWFLAKE_RECIPE, null); @Test - public void testGetSuccess() throws Exception { - // Create resolver + public void testExplicitInputVersionWins() throws Exception { EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + + IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + Mockito.when(matrix.resolveVersionWithSource("snowflake")) + .thenReturn( + Optional.of( + new IngestionVersionMatrixService.MatrixResolution( + MATRIX_SNOWFLAKE_VERSION, + IngestionVersionMatrixService.MatrixSourceLevel.COHORT))); + CreateTestConnectionRequestResolver resolver = - new CreateTestConnectionRequestResolver(mockClient, ingestionConfig()); + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration, matrix); - // Execute resolver - QueryContext mockContext = getMockAllowContext(); - DataFetchingEnvironment mockEnv = Mockito.mock(DataFetchingEnvironment.class); - Mockito.when(mockEnv.getArgument(Mockito.eq("input"))).thenReturn(TEST_INPUT); - Mockito.when(mockEnv.getContext()).thenReturn(mockContext); + runAndVerifyVersion(resolver, mockClient, TEST_INPUT_WITH_VERSION, EXPLICIT_VERSION); - resolver.get(mockEnv).get(); + // Even though the matrix has a value for snowflake, the explicit input.version should win. + // The resolver may short-circuit before consulting the matrix at all — that's an optimization, + // not a contract, so we only assert the resolved version here. + } + + @Test + public void testMatrixConnectorVersionUsedWhenInputVersionMissing() throws Exception { + EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + + IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + Mockito.when(matrix.resolveVersionWithSource("snowflake")) + .thenReturn( + Optional.of( + new IngestionVersionMatrixService.MatrixResolution( + MATRIX_SNOWFLAKE_VERSION, + IngestionVersionMatrixService.MatrixSourceLevel.COHORT))); + + CreateTestConnectionRequestResolver resolver = + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration, matrix); + + runAndVerifyVersion(resolver, mockClient, TEST_INPUT_NO_VERSION, MATRIX_SNOWFLAKE_VERSION); + } + + @Test + public void testFallsBackToDefaultCliVersionWhenNoVersionAndNoMatrix() throws Exception { + // Closes the prior gap where test connections silently omitted version when input.version was + // null, instead of falling back to defaultCliVersion like real executions do. + EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + + CreateTestConnectionRequestResolver resolver = + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); + + runAndVerifyVersion(resolver, mockClient, TEST_INPUT_NO_VERSION, DEFAULT_VERSION); + } + + @Test + public void testEmptyVersionFallsBackToDefault() throws Exception { + // Bootstrap YAML templating can render input.version as an empty string; the helper normalizes + // that to "unset" so we still fall through to defaultCliVersion. See #17471. + EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + + CreateTestConnectionRequestResolver resolver = + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); + + runAndVerifyVersion( + resolver, + mockClient, + new CreateTestConnectionRequestInput(SNOWFLAKE_RECIPE, ""), + DEFAULT_VERSION); + } + + @Test + public void testWhitespaceVersionFallsBackToDefault() throws Exception { + // Same as the empty-string case but for whitespace-only — also normalized to "unset". + EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + + CreateTestConnectionRequestResolver resolver = + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); - Mockito.verify(mockClient, Mockito.times(1)) - .ingestProposal(any(), Mockito.any(MetadataChangeProposal.class), Mockito.eq(false)); + runAndVerifyVersion( + resolver, + mockClient, + new CreateTestConnectionRequestInput(SNOWFLAKE_RECIPE, " "), + DEFAULT_VERSION); + } + + @Test + public void testFallsBackToDefaultWhenMatrixHasNoEntryForConnector() throws Exception { + EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + + IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + Mockito.when(matrix.resolveVersionWithSource("snowflake")).thenReturn(Optional.empty()); + + CreateTestConnectionRequestResolver resolver = + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration, matrix); + + runAndVerifyVersion(resolver, mockClient, TEST_INPUT_NO_VERSION, DEFAULT_VERSION); + } + + @Test + public void testFallsBackToDefaultWhenRecipeHasNoSourceType() throws Exception { + // Valid JSON, but source.type is missing — we cannot identify the connector for a matrix + // lookup, so we must fall through to defaultCliVersion rather than crash or pick a wrong pin. + // (Truly malformed JSON is rejected earlier by IngestionUtils.injectPipelineName, so we don't + // exercise that path here.) + EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + + IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + CreateTestConnectionRequestResolver resolver = + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration, matrix); + + CreateTestConnectionRequestInput input = + new CreateTestConnectionRequestInput(RECIPE_WITHOUT_TYPE, null); + runAndVerifyVersion(resolver, mockClient, input, DEFAULT_VERSION); + + // We never attempt a matrix lookup because we cannot identify the connector type. + Mockito.verify(matrix, Mockito.never()).resolveVersionWithSource(Mockito.anyString()); } @Test public void testGetUnauthorized() throws Exception { - // Create resolver EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); CreateTestConnectionRequestResolver resolver = - new CreateTestConnectionRequestResolver(mockClient, ingestionConfig()); + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); - // Execute resolver DataFetchingEnvironment mockEnv = Mockito.mock(DataFetchingEnvironment.class); QueryContext mockContext = getMockDenyContext(); - Mockito.when(mockEnv.getArgument(Mockito.eq("input"))).thenReturn(TEST_INPUT); + Mockito.when(mockEnv.getArgument(Mockito.eq("input"))).thenReturn(TEST_INPUT_WITH_VERSION); Mockito.when(mockEnv.getContext()).thenReturn(mockContext); assertThrows(RuntimeException.class, () -> resolver.get(mockEnv).join()); Mockito.verify(mockClient, Mockito.times(0)).ingestProposal(any(), Mockito.any(), anyBoolean()); } - // --------------------------------------------------------------------------- - // Version-resolution regression tests - // - // Each test maps to one row of the PR description's behavior table: - // - // | input.version | Before this PR | After this PR | - // | ------------- | ------------------------------------ | ----------------- | - // | "0.8.44" | args.version = "0.8.44" | unchanged | - // | null | args.version omitted; executor falls | args.version = | - // | | back to bundled CLI | defaultCliVersion | - // | "" | args.version omitted | defaultCliVersion | - // | " " | args.version omitted | defaultCliVersion | - // - // Each test captures the persisted ExecutionRequestInput aspect and asserts - // on args.version, locking in the contract that all four input shapes route - // through IngestionUtils.resolveIngestionCliVersion (the helper introduced - // by #17471 for the manual-ingestion path). - // --------------------------------------------------------------------------- - @Test - public void testExplicitVersionPreserved() throws Exception { - String capturedVersion = runResolverAndCaptureVersion("0.8.44"); - assertEquals(capturedVersion, "0.8.44"); + public void testExtractSourceType() { + assertEquals( + CreateTestConnectionRequestResolver.extractSourceType(SNOWFLAKE_RECIPE), "snowflake"); + assertNull(CreateTestConnectionRequestResolver.extractSourceType(RECIPE_WITHOUT_TYPE)); + assertNull(CreateTestConnectionRequestResolver.extractSourceType(MALFORMED_RECIPE)); + assertNull(CreateTestConnectionRequestResolver.extractSourceType("")); + assertNull(CreateTestConnectionRequestResolver.extractSourceType(null)); } + /** + * Forensic stamp: the resolution record on the ExecutionRequestInput must reflect which + * resolution path actually fired (cohort vs connector default vs workspace default), with + * matching version + cohort index + matrix server-version metadata. This is the structured audit + * trail downstream tooling queries — not just args.version. + */ @Test - public void testNullVersionFallsBackToDefault() throws Exception { - String capturedVersion = runResolverAndCaptureVersion(null); - assertEquals(capturedVersion, DEFAULT_CLI_VERSION); + public void testStampsResolutionMetadata_cohortMatch() throws Exception { + EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + + IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + Mockito.when(matrix.resolveVersionWithSource("snowflake")) + .thenReturn( + Optional.of( + new IngestionVersionMatrixService.MatrixResolution( + MATRIX_SNOWFLAKE_VERSION, + IngestionVersionMatrixService.MatrixSourceLevel.COHORT))); + Mockito.when(matrix.getServerVersion()).thenReturn("1.3.1.4"); + + CreateTestConnectionRequestResolver resolver = + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration, matrix); + + ExecutionRequestInput captured = + runAndCaptureResolution(resolver, mockClient, TEST_INPUT_NO_VERSION); + + assertEquals(captured.getArgs().get("version"), MATRIX_SNOWFLAKE_VERSION); + com.linkedin.execution.CliVersionProvenance stamp = captured.getCliVersionProvenance(); + assertEquals(stamp.getSource(), com.linkedin.execution.CliVersionSource.MATRIX_COHORT); + assertEquals(stamp.getServerVersion(), "1.3.1.4"); } @Test - public void testEmptyVersionFallsBackToDefault() throws Exception { - String capturedVersion = runResolverAndCaptureVersion(""); - assertEquals(capturedVersion, DEFAULT_CLI_VERSION); + public void testStampsResolutionMetadata_perSourceOverride() throws Exception { + EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + // No matrix configured — the explicit version must still produce a PER_SOURCE stamp. + CreateTestConnectionRequestResolver resolver = + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); + + ExecutionRequestInput captured = + runAndCaptureResolution(resolver, mockClient, TEST_INPUT_WITH_VERSION); + + assertEquals(captured.getArgs().get("version"), EXPLICIT_VERSION); + com.linkedin.execution.CliVersionProvenance stamp = captured.getCliVersionProvenance(); + assertEquals(stamp.getSource(), com.linkedin.execution.CliVersionSource.PER_SOURCE); + // No matrix service wired → no serverVersion to stamp. + assertFalse(stamp.hasServerVersion()); } @Test - public void testWhitespaceVersionFallsBackToDefault() throws Exception { - // Bootstrap YAML templating can render a whitespace-only value; the helper - // normalizes this to "unset" so we fall through to defaultCliVersion rather - // than forward a blank version that defeats the fallback at the executor. - String capturedVersion = runResolverAndCaptureVersion(" "); - assertEquals(capturedVersion, DEFAULT_CLI_VERSION); - } + public void testStampsResolutionMetadata_workspaceDefault() throws Exception { + EntityClient mockClient = Mockito.mock(EntityClient.class); + IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); + ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); + + IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + Mockito.when(matrix.resolveVersionWithSource("snowflake")).thenReturn(Optional.empty()); + Mockito.when(matrix.getServerVersion()).thenReturn("1.3.1.4"); - // --------------------------------------------------------------------------- - // Helpers - // --------------------------------------------------------------------------- + CreateTestConnectionRequestResolver resolver = + new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration, matrix); - private static IngestionConfiguration ingestionConfig() { - IngestionConfiguration config = new IngestionConfiguration(); - config.setDefaultCliVersion(DEFAULT_CLI_VERSION); - return config; + ExecutionRequestInput captured = + runAndCaptureResolution(resolver, mockClient, TEST_INPUT_NO_VERSION); + + assertEquals(captured.getArgs().get("version"), DEFAULT_VERSION); + com.linkedin.execution.CliVersionProvenance stamp = captured.getCliVersionProvenance(); + assertEquals(stamp.getSource(), com.linkedin.execution.CliVersionSource.WORKSPACE_DEFAULT); + // serverVersion is stamped even on WORKSPACE_DEFAULT when the matrix service is wired. + assertEquals(stamp.getServerVersion(), "1.3.1.4"); } /** - * Triggers the resolver with the given input.version, captures the persisted - * MetadataChangeProposal, and returns the {@code args.version} value (or null if absent). + * Captures the proposal and returns the full {@link ExecutionRequestInput}, so tests can assert + * on both {@code args.version} (where the CLI version string lives) and {@code + * cliVersionProvenance} (where the provenance stamp lives). */ - private static String runResolverAndCaptureVersion(String inputVersion) throws Exception { - EntityClient mockClient = Mockito.mock(EntityClient.class); - CreateTestConnectionRequestResolver resolver = - new CreateTestConnectionRequestResolver(mockClient, ingestionConfig()); + private static ExecutionRequestInput runAndCaptureResolution( + CreateTestConnectionRequestResolver resolver, + EntityClient mockClient, + CreateTestConnectionRequestInput input) + throws Exception { + QueryContext mockContext = getMockAllowContext(); + DataFetchingEnvironment mockEnv = Mockito.mock(DataFetchingEnvironment.class); + Mockito.when(mockEnv.getArgument(Mockito.eq("input"))).thenReturn(input); + Mockito.when(mockEnv.getContext()).thenReturn(mockContext); + + resolver.get(mockEnv).get(); + + ArgumentCaptor proposalCaptor = + ArgumentCaptor.forClass(MetadataChangeProposal.class); + Mockito.verify(mockClient).ingestProposal(any(), proposalCaptor.capture(), Mockito.eq(false)); + MetadataChangeProposal proposal = proposalCaptor.getValue(); + ExecutionRequestInput recovered = + GenericRecordUtils.deserializeAspect( + proposal.getAspect().getValue(), + proposal.getAspect().getContentType(), + ExecutionRequestInput.class); + assertTrue( + recovered.hasCliVersionProvenance(), + "Expected cliVersionProvenance to be stamped on the ExecutionRequestInput"); + return recovered; + } + + /** + * Executes the resolver against an allow-context and asserts that the version argument on the + * resulting {@link ExecutionRequestInput} matches {@code expectedVersion}. Captures the {@link + * MetadataChangeProposal} written to the entity client and rehydrates the {@code + * ExecutionRequestInput} aspect to inspect the resolved args. + */ + private static void runAndVerifyVersion( + CreateTestConnectionRequestResolver resolver, + EntityClient mockClient, + CreateTestConnectionRequestInput input, + String expectedVersion) + throws Exception { QueryContext mockContext = getMockAllowContext(); DataFetchingEnvironment mockEnv = Mockito.mock(DataFetchingEnvironment.class); - Mockito.when(mockEnv.getArgument(Mockito.eq("input"))) - .thenReturn(new CreateTestConnectionRequestInput("{}", inputVersion)); + Mockito.when(mockEnv.getArgument(Mockito.eq("input"))).thenReturn(input); Mockito.when(mockEnv.getContext()).thenReturn(mockContext); resolver.get(mockEnv).get(); - ArgumentCaptor captor = + ArgumentCaptor proposalCaptor = ArgumentCaptor.forClass(MetadataChangeProposal.class); - Mockito.verify(mockClient).ingestProposal(any(), captor.capture(), anyBoolean()); + Mockito.verify(mockClient).ingestProposal(any(), proposalCaptor.capture(), Mockito.eq(false)); + + MetadataChangeProposal proposal = proposalCaptor.getValue(); + assertEquals(proposal.getEntityType(), Constants.EXECUTION_REQUEST_ENTITY_NAME); + assertEquals(proposal.getAspectName(), Constants.EXECUTION_REQUEST_INPUT_ASPECT_NAME); - String aspectJson = captor.getValue().getAspect().getValue().asString(StandardCharsets.UTF_8); - JSONObject args = new JSONObject(aspectJson).getJSONObject("args"); - return args.has("version") ? args.getString("version") : null; + ExecutionRequestInput recovered = + GenericRecordUtils.deserializeAspect( + proposal.getAspect().getValue(), + proposal.getAspect().getContentType(), + ExecutionRequestInput.class); + StringMap args = recovered.getArgs(); + assertNotNull(args, "Expected args to be populated on the ExecutionRequestInput"); + assertEquals(args.get("version"), expectedVersion); } } diff --git a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java index 19b8a37a2358..d7a88ffbbc07 100644 --- a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java +++ b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java @@ -17,6 +17,8 @@ import com.linkedin.ingestion.DataHubIngestionSourceSchedule; import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; +import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.query.ListResult; import com.linkedin.metadata.utils.GenericRecordUtils; @@ -87,6 +89,7 @@ public class IngestionScheduler { private final ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(1); private final IngestionConfiguration ingestionConfiguration; + private final IngestionVersionMatrixService versionMatrixService; private final int batchGetDelayIntervalSeconds; private final int batchGetRefreshIntervalSeconds; @@ -177,6 +180,7 @@ public void scheduleNextIngestionSourceExecution( systemOpContext, entityClient, ingestionConfiguration, + versionMatrixService, ingestionSourceUrn, newInfo, () -> nextIngestionSourceExecutionCache.remove(ingestionSourceUrn), @@ -338,6 +342,7 @@ static class ExecutionRequestRunnable implements Runnable { private final OperationContext systemOpContext; private final EntityClient entityClient; private final IngestionConfiguration ingestionConfiguration; + private final IngestionVersionMatrixService versionMatrixService; // Information about the ingestion source being executed private final Urn ingestionSourceUrn; @@ -354,6 +359,7 @@ public ExecutionRequestRunnable( @Nonnull final OperationContext systemOpContext, @Nonnull final EntityClient entityClient, @Nonnull final IngestionConfiguration ingestionConfiguration, + @Nonnull final IngestionVersionMatrixService versionMatrixService, @Nonnull final Urn ingestionSourceUrn, @Nonnull final DataHubIngestionSourceInfo ingestionSourceInfo, @Nonnull final Runnable deleteNextIngestionSourceExecution, @@ -363,6 +369,7 @@ public ExecutionRequestRunnable( this.systemOpContext = systemOpContext; this.entityClient = Objects.requireNonNull(entityClient); this.ingestionConfiguration = Objects.requireNonNull(ingestionConfiguration); + this.versionMatrixService = Objects.requireNonNull(versionMatrixService); this.ingestionSourceUrn = Objects.requireNonNull(ingestionSourceUrn); this.ingestionSourceInfo = Objects.requireNonNull(ingestionSourceInfo); this.deleteNextIngestionSourceExecution = @@ -409,11 +416,22 @@ public void run() { IngestionUtils.injectPipelineName( ingestionSourceInfo.getConfig().getRecipe(), ingestionSourceUrn.toString()); arguments.put(RECIPE_ARGUMENT_NAME, recipe); - arguments.put( - VERSION_ARGUMENT_NAME, - IngestionUtils.resolveIngestionCliVersion( - ingestionSourceInfo.getConfig().getVersion(), - ingestionConfiguration.getDefaultCliVersion())); + // Per-source version may be null, empty, or whitespace-only (bootstrap YAML templating + // can render any of these); the helper normalizes all three to "unset" and falls through + // to the matrix / workspace default. See #17471 for the whitespace-only edge case. + final String explicitVersion = + ingestionSourceInfo.getConfig().hasVersion() + ? ingestionSourceInfo.getConfig().getVersion() + : null; + final CliVersionResolutionHelper.Result resolution = + CliVersionResolutionHelper.resolve( + explicitVersion, + ingestionSourceInfo.getType(), + versionMatrixService, + ingestionConfiguration.getDefaultCliVersion(), + versionMatrixService != null ? versionMatrixService.getServerVersion() : null); + arguments.put(VERSION_ARGUMENT_NAME, resolution.getVersion()); + input.setCliVersionProvenance(resolution.getStamp()); String debugMode = "false"; if (ingestionSourceInfo.getConfig().hasDebugMode()) { debugMode = ingestionSourceInfo.getConfig().isDebugMode() ? "true" : "false"; diff --git a/ingestion-scheduler/src/test/java/com/datahub/metadata/ingestion/IngestionSchedulerTest.java b/ingestion-scheduler/src/test/java/com/datahub/metadata/ingestion/IngestionSchedulerTest.java index 36a245028500..7d721d68c56c 100644 --- a/ingestion-scheduler/src/test/java/com/datahub/metadata/ingestion/IngestionSchedulerTest.java +++ b/ingestion-scheduler/src/test/java/com/datahub/metadata/ingestion/IngestionSchedulerTest.java @@ -17,6 +17,7 @@ import com.linkedin.ingestion.DataHubIngestionSourceSchedule; import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import com.linkedin.metadata.query.ListResult; import io.datahubproject.metadata.context.OperationContext; import java.util.Collections; @@ -120,7 +121,13 @@ public void setupTest() throws Exception { ingestionScheduler = new IngestionScheduler( - Mockito.mock(OperationContext.class), mockClient, ingestionConfiguration, 1, 1200); + Mockito.mock(OperationContext.class), + mockClient, + ingestionConfiguration, + new IngestionVersionMatrixService( + new com.linkedin.metadata.ingestion.NoOpMatrixSource(), "test", null), + 1, + 1200); ingestionScheduler.init(); Thread.sleep(2000); // Sleep so the runnable can execute. (not ideal) } diff --git a/metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java b/metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java index 1b39c8fa8d8b..67d5fcc8f84d 100644 --- a/metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java +++ b/metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java @@ -84,7 +84,12 @@ public PropertiesCollector propertiesCollector(Environment environment) { "postgres.pgQueue.pool.password", "postgres.pgCron.admin.password", "postgres.pgCron.iam.awsSecretAccessKey", - "postgres.pgCron.iam.awsSessionToken"); + "postgres.pgCron.iam.awsSessionToken", + // Auth token for fetching the per-connector CLI version matrix from a private host + // (e.g. "token ghp_xxx" for a private GitHub repo, "Bearer ey..." for OIDC). Property + // name intentionally ends with "Token" so PropertiesCollector's keyword-based redaction + // catches it without needing a new keyword in SENSITIVE_PATTERNS. + "ingestion.versionMatrixAuthToken"); /** * Template patterns for sensitive properties that contain dynamic parts. Use [*] for numeric @@ -752,8 +757,11 @@ public PropertiesCollector propertiesCollector(Environment environment) { "incidents.hook.maxIncidentHistory", "ingestion.batchRefreshCount", "ingestion.defaultCliVersion", + "ingestion.deploymentId", "ingestion.enabled", "ingestion.maxSerializedStringLength", + "ingestion.versionMatrixRefreshSeconds", + "ingestion.versionMatrixUrl", "ingestionMetrics.enabled", "ingestionScheduler.consumerGroupSuffix", "ingestionScheduler.enabled", diff --git a/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl b/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl new file mode 100644 index 000000000000..310b8718abe1 --- /dev/null +++ b/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl @@ -0,0 +1,39 @@ +namespace com.linkedin.execution + +/** + * Structured provenance record for the CLI version chosen for an ingestion execution. + * + * Stamped on each ingestion or test-connection ExecutionRequestInput. Captures only metadata + * about the resolution (which tier fired + which GMS performed it) — the resolved CLI version + * itself lives in `args.version` on the same aspect (the wire-format field consumed by the + * executor). Splitting these avoids storing the version string twice on the aspect; this record + * exists so post-hoc forensics can answer "which tier produced the version, and which GMS wrote + * this?" from a single SQL query without log archaeology. + * + * The resolution chain is, in priority order: + * 1. Per-source `config.version` explicit override (PER_SOURCE) + * 2. Cohort whose `deployments` list contains this deployment's id (MATRIX_COHORT) + * 3. Connector's `_default` from the matrix (MATRIX_CONNECTOR_DEFAULT) + * 4. `defaultCliVersion` from application.yaml (WORKSPACE_DEFAULT) + */ +record CliVersionProvenance { + /** + * Which level of the resolution priority hit. + */ + source: enum CliVersionSource { + /** Step 1 — explicit cli_version on the source's config. */ + PER_SOURCE + /** Step 2 — matched a cohort whose deployments list contains this deployment's id. */ + MATRIX_COHORT + /** Step 3 — fell through to the connector's _default in the matrix. */ + MATRIX_CONNECTOR_DEFAULT + /** Step 4 — fell through to defaultCliVersion from application.yaml. */ + WORKSPACE_DEFAULT + } + + /** + * GMS server version that performed the resolution. Populated regardless of which tier hit. + * Equals `GitVersion.getVersion()` on the pod that wrote this aspect. + */ + serverVersion: optional string +} diff --git a/metadata-models/src/main/pegasus/com/linkedin/execution/ExecutionRequestInput.pdl b/metadata-models/src/main/pegasus/com/linkedin/execution/ExecutionRequestInput.pdl index 5560795d1153..6ab475b44bc6 100644 --- a/metadata-models/src/main/pegasus/com/linkedin/execution/ExecutionRequestInput.pdl +++ b/metadata-models/src/main/pegasus/com/linkedin/execution/ExecutionRequestInput.pdl @@ -7,7 +7,8 @@ import com.linkedin.common.Urn * TODO: Determine who is responsible for emitting execution request success or failure. Executor? */ @Aspect = { - "name": "dataHubExecutionRequestInput" + "name": "dataHubExecutionRequestInput", + "schemaVersion": 2 } record ExecutionRequestInput { /** @@ -52,4 +53,15 @@ record ExecutionRequestInput { "fieldType": "URN" } actorUrn: optional Urn + + /** + * Provenance metadata for the CLI version chosen for this execution — which tier of the + * resolution chain produced the version (per-source override / matrix cohort / matrix connector + * default / workspace default) and which GMS performed the resolution. Stamped at request time + * so post-hoc forensics does not require iterating the generic args map. The resolved CLI + * version string itself lives in `args.version` on this same aspect; this record deliberately + * does not duplicate it. Optional for backward compatibility — older execution requests will + * not have this set. + */ + cliVersionProvenance: optional CliVersionProvenance } \ No newline at end of file diff --git a/metadata-service/configuration/build.gradle b/metadata-service/configuration/build.gradle index aed4d5b8528d..04ee8f43b49c 100644 --- a/metadata-service/configuration/build.gradle +++ b/metadata-service/configuration/build.gradle @@ -10,6 +10,11 @@ dependencies { implementation externalDependency.jacksonDataBind implementation externalDependency.fabric8KubernetesClient + // Needed by CliVersionResolutionHelper to build the typed CliVersionResolution PDL record + // stamped on ExecutionRequestInput. The `dataTemplate` configuration exposes the generated + // Java classes; metadata-models has no dependency on this module, so no cycle. + implementation project(path: ':metadata-models', configuration: 'dataTemplate') + implementation externalDependency.slf4jApi // Newer Spring libraries require JDK17 classes, allow for JDK11 diff --git a/metadata-service/configuration/gradle.lockfile b/metadata-service/configuration/gradle.lockfile index 9aa1d3d5f20d..1698c2d99f09 100644 --- a/metadata-service/configuration/gradle.lockfile +++ b/metadata-service/configuration/gradle.lockfile @@ -1,6 +1,7 @@ # This is a Gradle generated file for dependency locking. # Manual edits can break the build and are not advised. # This file is expected to be part of source control. +antlr:antlr:2.7.7=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath com.beust:jcommander:1.82=testCompileClasspath,testRuntimeClasspath com.fasterxml.jackson.core:jackson-annotations:2.21=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath com.fasterxml.jackson.core:jackson-core:2.21.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath @@ -10,11 +11,16 @@ com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.3=compileClasspath,r com.fasterxml.jackson:jackson-bom:2.21.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath com.google.errorprone:error_prone_annotations:2.47.0=spotless865458226 com.google.googlejavaformat:google-java-format:1.18.1=spotless865458226 -com.google.guava:failureaccess:1.0.3=spotless865458226 -com.google.guava:guava:33.6.0-jre=spotless865458226 -com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava=spotless865458226 -com.google.j2objc:j2objc-annotations:3.1=spotless865458226 +com.google.guava:failureaccess:1.0.3=runtimeClasspath,spotless865458226,testRuntimeClasspath +com.google.guava:guava:33.6.0-jre=runtimeClasspath,spotless865458226,testRuntimeClasspath +com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava=runtimeClasspath,spotless865458226,testRuntimeClasspath +com.google.j2objc:j2objc-annotations:3.1=runtimeClasspath,spotless865458226,testRuntimeClasspath +com.ibm.icu:icu4j:69.1=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath com.jayway.jsonpath:json-path:2.10.0=testCompileClasspath,testRuntimeClasspath +com.linkedin.pegasus:data:29.74.2=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +com.linkedin.pegasus:entity-stream:29.74.2=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +com.linkedin.pegasus:li-protobuf:29.74.2=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +com.linkedin.pegasus:pegasus-common:29.74.2=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath commons-logging:commons-logging:1.3.5=testCompileClasspath,testRuntimeClasspath io.fabric8:kubernetes-client-api:7.4.0=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath io.fabric8:kubernetes-client:7.4.0=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath @@ -65,19 +71,29 @@ io.vertx:vertx-web-common:4.5.27=runtimeClasspath,testRuntimeClasspath jakarta.activation:jakarta.activation-api:2.1.4=testCompileClasspath,testRuntimeClasspath jakarta.annotation:jakarta.annotation-api:3.0.0=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath jakarta.xml.bind:jakarta.xml.bind-api:4.0.4=testCompileClasspath,testRuntimeClasspath +javax.annotation:javax.annotation-api:1.3.1=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath net.bytebuddy:byte-buddy-agent:1.17.7=testCompileClasspath,testRuntimeClasspath net.bytebuddy:byte-buddy:1.18.3=testCompileClasspath,testRuntimeClasspath net.minidev:accessors-smart:2.6.0=testCompileClasspath,testRuntimeClasspath net.minidev:json-smart:2.6.0=testCompileClasspath,testRuntimeClasspath +org.abego.treelayout:org.abego.treelayout.core:1.0.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +org.antlr:ST4:4.3.1=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +org.antlr:antlr-runtime:3.5.2=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +org.antlr:antlr4-runtime:4.9.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +org.antlr:antlr4:4.9.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +org.apache.commons:commons-lang3:3.18.0=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +org.apache.commons:commons-text:1.10.0=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath org.apiguardian:apiguardian-api:1.1.2=testCompileClasspath org.assertj:assertj-core:3.27.7=testCompileClasspath,testRuntimeClasspath org.awaitility:awaitility:4.3.0=testCompileClasspath,testRuntimeClasspath +org.checkerframework:checker-qual:2.6.0=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath org.hamcrest:hamcrest:3.0=testCompileClasspath,testRuntimeClasspath org.jacoco:org.jacoco.agent:0.8.12=jacocoAgent,jacocoAnt org.jacoco:org.jacoco.ant:0.8.12=jacocoAnt org.jacoco:org.jacoco.core:0.8.12=jacocoAnt org.jacoco:org.jacoco.report:0.8.12=jacocoAnt -org.jspecify:jspecify:1.0.0=spotless865458226,testCompileClasspath,testRuntimeClasspath +org.javassist:javassist:3.26.0-GA=runtimeClasspath,testRuntimeClasspath +org.jspecify:jspecify:1.0.0=runtimeClasspath,spotless865458226,testCompileClasspath,testRuntimeClasspath org.junit.jupiter:junit-jupiter-api:5.12.2=testCompileClasspath,testRuntimeClasspath org.junit.jupiter:junit-jupiter-engine:5.12.2=testRuntimeClasspath org.junit.jupiter:junit-jupiter-params:5.12.2=testCompileClasspath,testRuntimeClasspath @@ -94,6 +110,7 @@ org.ow2.asm:asm-tree:9.7=jacocoAnt org.ow2.asm:asm:9.7=jacocoAnt org.ow2.asm:asm:9.7.1=testCompileClasspath,testRuntimeClasspath org.projectlombok:lombok:1.18.42=annotationProcessor,compileClasspath +org.reflections:reflections:0.9.12=runtimeClasspath,testRuntimeClasspath org.skyscreamer:jsonassert:1.5.3=testCompileClasspath,testRuntimeClasspath org.slf4j:slf4j-api:2.0.17=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath org.snakeyaml:snakeyaml-engine:2.10=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/IngestionConfiguration.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/IngestionConfiguration.java index 1c88afe391dd..dae517b91eba 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/IngestionConfiguration.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/IngestionConfiguration.java @@ -12,4 +12,43 @@ public class IngestionConfiguration { private String defaultCliVersion; private Integer batchRefreshCount; + + /** + * Optional URL to a publicly accessible JSON file containing a per-connector version matrix keyed + * by server release version. When set, the server fetches and caches this matrix and uses it to + * resolve the CLI version per connector type. When empty, the existing defaultCliVersion is used + * for all connectors. + */ + private String versionMatrixUrl; + + /** + * How often (in seconds) to re-fetch the version matrix from versionMatrixUrl. Defaults to 600 + * (10 minutes). + */ + private int versionMatrixRefreshSeconds; + + /** + * Optional value sent verbatim as the {@code Authorization} HTTP header when fetching the version + * matrix. Required when the matrix URL is hosted behind authentication (e.g. a private GitHub + * repo's {@code raw.githubusercontent.com} URL). + * + *

Format is whatever the host expects: + * + *

+ * + *

When empty or unset, no {@code Authorization} header is sent (public-URL semantics). + */ + private String versionMatrixAuthToken; + + /** + * Identifier for this deployment, matched against {@code deployments} entries in the version + * matrix to select cohort versions. Sourced from {@code DATAHUB_EXECUTOR_CUSTOMER_ID} (injected + * by the Acryl Cloud Helm chart from the K8s namespace). Empty in single-tenant / OSS + * deployments, in which case cohort matching never fires and only the per-connector {@code + * _default} from the matrix applies. + */ + private String deploymentId; } diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java new file mode 100644 index 000000000000..f7741a7201dd --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java @@ -0,0 +1,130 @@ +package com.linkedin.metadata.ingestion; + +import com.linkedin.execution.CliVersionProvenance; +import com.linkedin.execution.CliVersionSource; +import java.util.Optional; +import javax.annotation.Nullable; + +/** + * Centralizes the CLI-version resolution logic shared by the three execution-request creation paths + * (manual trigger, scheduled trigger, test connection). + * + *

Produces a {@link Result} carrying two distinct pieces: + * + *

+ * + *

The two pieces intentionally don't duplicate each other: the version string lives only in + * {@code args.version}, and the stamp captures only the provenance fields (source tier, GMS server + * version). Post-hoc forensics queries both via JSON paths on {@code metadata_aspect_v2}. + * + *

Resolution priority (top wins): + * + *

    + *
  1. Per-source explicit version on {@code DataHubIngestionSourceConfig.version} + *
  2. Matrix cohort match — first cohort whose {@code deployments} list contains this + * deployment's id + *
  3. Matrix connector default — the connector's {@code _default} entry + *
  4. Workspace default — {@code defaultCliVersion} from application.yaml + *
+ */ +public final class CliVersionResolutionHelper { + + private CliVersionResolutionHelper() {} + + /** + * Resolve a CLI version for an ingestion or test-connection request. + * + * @param explicitVersion the per-source version from {@code config.version}, or {@code null} / + * empty if unset + * @param connectorType the source-type string from the recipe (e.g. {@code "snowflake"}), or + * {@code null} if not derivable (e.g. malformed test-connection recipe) + * @param matrixService the version-matrix service; pass {@code null} for OSS callers that do not + * consult a matrix (e.g. unit-test setups) + * @param defaultCliVersion the workspace-wide fallback from {@code IngestionConfiguration} + * @param serverVersion the GMS server version (typically {@code GitVersion.getVersion()}). + * Stamped on every returned record regardless of which tier hit; pass {@code null} only in + * tests that don't care about provenance. + * @return a {@link Result} carrying the resolved version string + the structured stamp. Never + * {@code null}; the {@code version} field is guaranteed non-null except when {@code + * defaultCliVersion} itself is null/empty (an OSS misconfiguration). + */ + public static Result resolve( + @Nullable String explicitVersion, + @Nullable String connectorType, + @Nullable IngestionVersionMatrixService matrixService, + @Nullable String defaultCliVersion, + @Nullable String serverVersion) { + + // Normalize the per-source version: bootstrap YAML templating can render null, empty, or + // whitespace-only strings, and all three should mean "unset" so we fall through to the + // matrix / workspace default. Matches the contract of + // IngestionUtils.resolveIngestionCliVersion(...) introduced in #17471. + final String normalizedExplicit = + explicitVersion != null && !explicitVersion.trim().isEmpty() + ? explicitVersion.trim() + : null; + + if (normalizedExplicit != null) { + return new Result( + normalizedExplicit, stampWithSource(CliVersionSource.PER_SOURCE, serverVersion)); + } + + if (matrixService != null && connectorType != null && !connectorType.isEmpty()) { + Optional matrixResult = + matrixService.resolveVersionWithSource(connectorType); + if (matrixResult.isPresent()) { + IngestionVersionMatrixService.MatrixResolution r = matrixResult.get(); + CliVersionSource pdlSource = + r.getSource() == IngestionVersionMatrixService.MatrixSourceLevel.COHORT + ? CliVersionSource.MATRIX_COHORT + : CliVersionSource.MATRIX_CONNECTOR_DEFAULT; + return new Result(r.getResolved(), stampWithSource(pdlSource, serverVersion)); + } + } + + // Default fallback. Even if `defaultCliVersion` is itself null/empty, we still emit a + // resolution stamp so forensic queries see a deterministic answer rather than a missing field. + return new Result( + defaultCliVersion == null ? "" : defaultCliVersion, + stampWithSource(CliVersionSource.WORKSPACE_DEFAULT, serverVersion)); + } + + private static CliVersionProvenance stampWithSource( + CliVersionSource source, @Nullable String serverVersion) { + CliVersionProvenance out = new CliVersionProvenance().setSource(source); + if (serverVersion != null && !serverVersion.isEmpty()) { + out.setServerVersion(serverVersion); + } + return out; + } + + /** + * Wraps the two outputs of {@link #resolve(String, String, IngestionVersionMatrixService, String, + * String)} — the plain CLI version string (for {@code args.version}) and the structured + * provenance stamp (for the {@code cliVersionProvenance} aspect field). + */ + public static final class Result { + private final String version; + private final CliVersionProvenance stamp; + + public Result(String version, CliVersionProvenance stamp) { + this.version = version; + this.stamp = stamp; + } + + /** The plain CLI version string to put in {@code args.version}. */ + public String getVersion() { + return version; + } + + /** The structured stamp to put on the {@code cliVersionProvenance} aspect field. */ + public CliVersionProvenance getStamp() { + return stamp; + } + } +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Cohort.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Cohort.java new file mode 100644 index 000000000000..4c46fcbaa2f3 --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Cohort.java @@ -0,0 +1,32 @@ +package com.linkedin.metadata.ingestion; + +import java.util.Collections; +import java.util.Set; + +/** + * A canary cohort within a {@link ConnectorEntry}: a CLI version that should be served to a + * specific allowlist of deployments instead of the connector's default. + * + *

Used for staged rollouts — pin a fix to a few deployments first, validate, then widen the + * allowlist. + */ +public final class Cohort { + + private final String version; + private final Set deployments; + + public Cohort(String version, Set deployments) { + this.version = version; + this.deployments = + deployments == null ? Collections.emptySet() : Collections.unmodifiableSet(deployments); + } + + public String getVersion() { + return version; + } + + /** Deployment identifiers that should receive this cohort's version. */ + public Set getDeployments() { + return deployments; + } +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ConnectorEntry.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ConnectorEntry.java new file mode 100644 index 000000000000..edc448d82d8f --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ConnectorEntry.java @@ -0,0 +1,34 @@ +package com.linkedin.metadata.ingestion; + +import java.util.Collections; +import java.util.List; + +/** + * A single {@code (serverVersion, connectorType)} entry in the matrix: a connector-level default + * version plus an ordered list of canary cohorts. + * + *

Resolution against this entry walks the cohorts in order and returns the first one whose + * {@code deployments} list contains the deployment's id. If no cohort matches, the {@code + * defaultVersion} is used. + */ +public final class ConnectorEntry { + + private final String defaultVersion; + private final List cohorts; + + public ConnectorEntry(String defaultVersion, List cohorts) { + this.defaultVersion = defaultVersion; + this.cohorts = + cohorts == null ? Collections.emptyList() : Collections.unmodifiableList(cohorts); + } + + /** The {@code _default} version applied when no cohort allowlist matches. May be {@code null}. */ + public String getDefaultVersion() { + return defaultVersion; + } + + /** Cohorts in declaration order. First match wins. Never {@code null}; may be empty. */ + public List getCohorts() { + return cohorts; + } +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSource.java new file mode 100644 index 000000000000..38dd6cc13723 --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSource.java @@ -0,0 +1,340 @@ +package com.linkedin.metadata.ingestion; + +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import java.io.InputStream; +import java.net.URL; +import java.net.URLConnection; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicLong; +import java.util.concurrent.atomic.AtomicReference; +import java.util.regex.Pattern; +import javax.annotation.Nullable; +import lombok.extern.slf4j.Slf4j; + +/** + * {@link MatrixSource} backed by a publicly-readable HTTP URL serving the matrix JSON. Suitable for + * any deployment that wants to fetch the matrix from a CDN, object store (S3, GCS), or a GitHub raw + * URL without rebuilding or redeploying GMS to change connector versions. + * + *

The remote JSON must follow this schema: + * + *

{@code
+ * {
+ *   "1.3.1.4": {
+ *     "snowflake": {
+ *       "_default": "1.3.1.4",
+ *       "cohorts": [
+ *         { "version": "1.3.1.5", "deployments": ["deployment-1", "deployment-2"] }
+ *       ]
+ *     }
+ *   }
+ * }
+ * }
+ * + *

The cache is refreshed in the background on a configurable interval. On fetch failure the last + * successfully loaded matrix is retained so in-flight executions never see a flapping view. + * + *

Reads from {@link #getMatrix()} are lock-free (single volatile read on {@link + * AtomicReference#get()}); the background refresh swaps the cache atomically after a successful + * fetch. + * + *

For private hosts (e.g. a private GitHub repo's {@code raw.githubusercontent.com} URL) an + * optional {@code authHeader} value is sent verbatim as the {@code Authorization} request header. + * Format is whatever the host expects — e.g. {@code "token ghp_xxx"} for a GitHub PAT, {@code + * "Bearer ey..."} for an OIDC token. When {@code authHeader} is {@code null} or empty no auth + * header is sent (the public-URL path is unchanged). + */ +@Slf4j +public class HttpUrlMatrixSource implements MatrixSource { + + private static final String DEFAULT_FIELD = "_default"; + private static final String COHORTS_FIELD = "cohorts"; + private static final String VERSION_FIELD = "version"; + private static final String DEPLOYMENTS_FIELD = "deployments"; + private static final int FETCH_TIMEOUT_MS = 10_000; + + /** + * Basic version-string shape — alphanumeric, underscore, dot, plus, exclamation, hyphen. Catches + * obvious typos (whitespace, embedded JSON, HTML, etc.) without trying to validate PEP 440 fully. + * pip will catch any version that doesn't actually exist on PyPI; this check exists so operators + * fat-fingering the matrix file get a clean WARN at fetch time rather than a cryptic pip error + * minutes later on every execution. + */ + private static final Pattern VALID_VERSION_PATTERN = Pattern.compile("^[\\w.+!-]+$"); + + private final String url; + @Nullable private final String authHeader; + private final AtomicReference cached; + private final AtomicLong lastFetchedAtMillis; + private final ObjectMapper objectMapper; + + /** Convenience constructor for unauthenticated (public) URLs. */ + public HttpUrlMatrixSource(String url, int refreshIntervalSeconds) { + this(url, refreshIntervalSeconds, null); + } + + public HttpUrlMatrixSource(String url, int refreshIntervalSeconds, @Nullable String authHeader) { + this.url = url; + this.authHeader = authHeader; + this.cached = new AtomicReference<>(Matrix.EMPTY); + this.lastFetchedAtMillis = new AtomicLong(0L); + this.objectMapper = new ObjectMapper(); + + ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); + // Fetch immediately on startup (delay=0), then repeat on the configured interval. + executor.scheduleAtFixedRate(this::refresh, 0, refreshIntervalSeconds, TimeUnit.SECONDS); + } + + @Override + public Matrix getMatrix() { + return cached.get(); + } + + @Override + public long getLastFetchedAtMillis() { + return lastFetchedAtMillis.get(); + } + + /** Package-private so tests can force a refresh without waiting for the scheduled tick. */ + void refresh() { + try { + URL connectionUrl = new URL(url); + URLConnection conn = connectionUrl.openConnection(); + conn.setConnectTimeout(FETCH_TIMEOUT_MS); + conn.setReadTimeout(FETCH_TIMEOUT_MS); + conn.setRequestProperty("User-Agent", "DataHub-GMS"); + if (authHeader != null && !authHeader.isEmpty()) { + conn.setRequestProperty("Authorization", authHeader); + } + + try (InputStream is = conn.getInputStream()) { + JsonNode root = objectMapper.readTree(is); + Matrix parsed; + try { + parsed = parseMatrix(root); + } catch (IllegalArgumentException schemaError) { + // File-level schema violation (e.g. root not a JSON object). Refuse to swap the cache + // — operator gets a fix-this-now WARN while in-flight resolutions continue to use the + // last-known-good matrix. Per-entry violations are handled inside parseMatrix without + // throwing; this branch is only for whole-file rejection. + log.warn( + "Refusing to swap matrix cache from {}: {}. Retaining last known matrix.", + url, + schemaError.getMessage()); + return; + } + cached.set(parsed); + // Stamp the successful-fetch timestamp after the cache swap so readers never see a fresh + // timestamp paired with a stale matrix. + lastFetchedAtMillis.set(System.currentTimeMillis()); + log.info( + "Successfully refreshed ingestion version matrix from {}; {} server version entries loaded", + url, + parsed.size()); + } + } catch (Exception e) { + log.warn( + "Failed to refresh ingestion version matrix from {}. Retaining last known matrix.", + url, + e); + } + } + + /** + * Parses the nested schema into a {@link Matrix} with two layers of validation: + * + *

    + *
  • File-level (fail closed): if the root isn't a JSON object, throws {@link + * IllegalArgumentException}. The caller refuses to swap the cache and the last-known-good + * matrix continues to serve resolutions. Prevents a malformed file from blanking everyone. + *
  • Entry-level (fail open + log): individual malformed connector entries / cohorts / + * version strings are skipped with structured WARN logs that name the offending path + * ({@code server='X' connector='Y' cohort[N]}). Good entries around the bad one are kept, + * so a single typo doesn't take the whole matrix down. + *
+ * + *

Package-private so tests can drive it directly with arbitrary {@link JsonNode} input. + */ + static Matrix parseMatrix(JsonNode root) { + if (root == null || !root.isObject()) { + throw new IllegalArgumentException( + "matrix root must be a JSON object, got: " + + (root == null ? "null" : root.getNodeType().toString().toLowerCase())); + } + + Map> entries = new HashMap<>(); + root.fields() + .forEachRemaining( + serverEntry -> { + String serverVersion = serverEntry.getKey(); + JsonNode serverValue = serverEntry.getValue(); + if (!serverValue.isObject()) { + log.warn( + "Skipping malformed matrix entry: server='{}' value is not an object (got {})", + serverVersion, + serverValue.getNodeType().toString().toLowerCase()); + return; + } + Map connectors = new HashMap<>(); + serverValue + .fields() + .forEachRemaining( + connectorEntry -> { + String connector = connectorEntry.getKey(); + ConnectorEntry parsed = + parseConnectorEntry( + serverVersion, connector, connectorEntry.getValue()); + if (parsed != null) { + connectors.put(connector, parsed); + } + }); + entries.put(serverVersion, Collections.unmodifiableMap(connectors)); + }); + return new Matrix(entries); + } + + /** + * Parses a single connector entry. Returns {@code null} if the entry's structure is wholly + * malformed (not a JSON object) — caller drops that connector. Bad sub-fields ({@code _default}, + * {@code cohorts}) are logged and either ignored or skipped while keeping the rest of the entry. + */ + @Nullable + private static ConnectorEntry parseConnectorEntry( + String serverVersion, String connector, JsonNode connectorNode) { + if (!connectorNode.isObject()) { + log.warn( + "Skipping malformed matrix entry: server='{}' connector='{}' value is not an object (got {})", + serverVersion, + connector, + connectorNode.getNodeType().toString().toLowerCase()); + return null; + } + + JsonNode defaultNode = connectorNode.path(DEFAULT_FIELD); + String defaultVersion = null; + if (!defaultNode.isMissingNode() && !defaultNode.isNull()) { + if (!defaultNode.isTextual()) { + log.warn( + "Ignoring non-textual '_default' value: server='{}' connector='{}' (got {})", + serverVersion, + connector, + defaultNode.getNodeType().toString().toLowerCase()); + } else { + String candidate = defaultNode.asText(); + if (isValidVersion(candidate)) { + defaultVersion = candidate; + } else { + log.warn( + "Ignoring invalid '_default' version: server='{}' connector='{}' version='{}'. " + + "Cohort matches still apply; this connector will fall through to " + + "WORKSPACE_DEFAULT when no cohort matches.", + serverVersion, + connector, + candidate); + } + } + } + + List cohorts = new ArrayList<>(); + JsonNode cohortsNode = connectorNode.path(COHORTS_FIELD); + if (cohortsNode.isArray()) { + int index = 0; + for (JsonNode cohortNode : cohortsNode) { + Cohort cohort = parseCohort(serverVersion, connector, index, cohortNode); + if (cohort != null) { + cohorts.add(cohort); + } + index++; + } + } else if (!cohortsNode.isMissingNode() && !cohortsNode.isNull()) { + log.warn( + "Skipping malformed 'cohorts' field: server='{}' connector='{}' cohorts is not an array (got {})", + serverVersion, + connector, + cohortsNode.getNodeType().toString().toLowerCase()); + } + return new ConnectorEntry(defaultVersion, cohorts); + } + + /** + * Parses a single cohort entry. Returns {@code null} if the cohort is unusable (missing or + * invalid {@code version}). Other malformed sub-fields are logged but don't drop the cohort. + */ + @Nullable + private static Cohort parseCohort( + String serverVersion, String connector, int cohortIndex, JsonNode cohortNode) { + if (!cohortNode.isObject()) { + log.warn( + "Skipping malformed cohort: server='{}' connector='{}' cohort[{}] is not an object", + serverVersion, + connector, + cohortIndex); + return null; + } + + JsonNode versionNode = cohortNode.path(VERSION_FIELD); + if (versionNode.isMissingNode() || versionNode.isNull() || !versionNode.isTextual()) { + log.warn( + "Skipping malformed cohort: server='{}' connector='{}' cohort[{}] missing or non-textual 'version'", + serverVersion, + connector, + cohortIndex); + return null; + } + String version = versionNode.asText(); + if (!isValidVersion(version)) { + log.warn( + "Skipping malformed cohort: server='{}' connector='{}' cohort[{}] has invalid version '{}'", + serverVersion, + connector, + cohortIndex, + version); + return null; + } + + Set deployments = new HashSet<>(); + JsonNode deploymentsNode = cohortNode.path(DEPLOYMENTS_FIELD); + if (deploymentsNode.isArray()) { + for (JsonNode d : deploymentsNode) { + if (d.isTextual()) { + deployments.add(d.asText()); + } else { + log.warn( + "Ignoring non-string deployment entry: server='{}' connector='{}' cohort[{}] (got {})", + serverVersion, + connector, + cohortIndex, + d.getNodeType().toString().toLowerCase()); + } + } + } else if (!deploymentsNode.isMissingNode() && !deploymentsNode.isNull()) { + log.warn( + "Ignoring malformed 'deployments' field: server='{}' connector='{}' cohort[{}] is not an array (got {})", + serverVersion, + connector, + cohortIndex, + deploymentsNode.getNodeType().toString().toLowerCase()); + } + return new Cohort(version, deployments); + } + + /** + * Returns {@code true} if {@code s} is a non-empty string matching {@link + * #VALID_VERSION_PATTERN}. Permissive — allows weird-but-valid PyPI versions like {@code + * "1.5.0.6rc1"} or {@code "1!0.0.0.dev0"} — but rejects whitespace, embedded markup, and other + * shapes that obviously aren't versions. + */ + private static boolean isValidVersion(String s) { + return s != null && !s.isEmpty() && VALID_VERSION_PATTERN.matcher(s).matches(); + } +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixService.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixService.java new file mode 100644 index 000000000000..3bd90c139883 --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixService.java @@ -0,0 +1,141 @@ +package com.linkedin.metadata.ingestion; + +import java.util.Map; +import java.util.Optional; + +/** + * Resolves a CLI version for a given connector type, walking a {@link Matrix} returned by a + * pluggable {@link MatrixSource}. + * + *

This class owns the resolution policy only — cohort ordering, allowlist matching, + * connector-default fallback, and forensic metadata stamping. Where the matrix data comes from + * (HTTP, a GMS metadata aspect, a config server, an in-memory test fixture, …) is the {@link + * MatrixSource}'s problem. + * + *

Cohort-based rollouts are aimed at multi-tenant deployments. Single-tenant installations leave + * the deployment identifier unset, which makes cohort matching a no-op and falls through to the + * connector's {@code _default}. When no {@code MatrixSource} is configured at all, the {@link + * NoOpMatrixSource} wired by the factory ensures every {@link #resolveVersionWithSource(String)} + * returns {@link Optional#empty()}, preserving the existing {@code defaultCliVersion} behavior + * bit-identically. + * + *

Resolution priority when picking a CLI version for an execution: + * + *

    + *
  1. Connector-specific version stored in DataHubIngestionSourceConfig (per-source override) — + * handled by callers, not this service. + *
  2. The first cohort under {@code matrix[serverVersion][connectorType].cohorts} whose {@code + * deployments} list contains this deployment's id. + *
  3. {@code matrix[serverVersion][connectorType]._default}. + *
  4. {@link Optional#empty()} — callers fall back to {@code defaultCliVersion} from + * application.yaml. + *
+ * + *

Cohorts are evaluated in array order; the first deployments-list hit wins. An empty or missing + * {@code deployments} list never matches. + */ +public class IngestionVersionMatrixService { + + private final MatrixSource source; + private final String serverVersion; + private final String deploymentId; + + public IngestionVersionMatrixService( + MatrixSource source, String serverVersion, String deploymentId) { + this.source = source; + this.serverVersion = serverVersion; + this.deploymentId = deploymentId; + } + + /** Server version key this resolver matches against. */ + public String getServerVersion() { + return serverVersion; + } + + /** + * Returns the CLI version to use for the given connector type, consulting the underlying matrix + * source. Returns {@link Optional#empty()} when: + * + *

    + *
  • The source returned an empty matrix (no data yet, or {@link NoOpMatrixSource}) + *
  • The current server version has no entry in the matrix + *
  • The connector has no entry under the current server version + *
+ * + *

Callers should fall back to {@code defaultCliVersion} when this returns empty. + * + *

For richer forensic detail (which cohort matched, when the matrix was fetched), use {@link + * #resolveVersionWithSource(String)} instead. + */ + public Optional resolveVersion(String connectorType) { + return resolveVersionWithSource(connectorType).map(MatrixResolution::getResolved); + } + + /** + * Resolves the CLI version for the given connector type and returns structured detail about which + * level of the matrix matched (cohort vs connector default). Returns {@link Optional#empty()} + * under the same conditions as {@link #resolveVersion(String)}. + * + *

This is the preferred API for callers that need to stamp the resolution provenance on the + * resulting execution request (for post-hoc forensics). + */ + public Optional resolveVersionWithSource(String connectorType) { + Matrix matrix = source.getMatrix(); + Map serverEntry = matrix.getEntriesForServer(serverVersion); + if (serverEntry == null) { + return Optional.empty(); + } + ConnectorEntry connectorEntry = serverEntry.get(connectorType); + if (connectorEntry == null) { + return Optional.empty(); + } + + // Walk cohorts in array order — first cohort whose `deployments` list contains this + // deployment's slug wins. An unset / empty deploymentId can never match a deployment entry, + // which is the intended "fall through to _default" behavior for deployments that haven't + // wired the env var. + if (deploymentId != null && !deploymentId.isEmpty()) { + for (Cohort cohort : connectorEntry.getCohorts()) { + if (cohort.getDeployments().contains(deploymentId)) { + return Optional.of(new MatrixResolution(cohort.getVersion(), MatrixSourceLevel.COHORT)); + } + } + } + + String defaultVersion = connectorEntry.getDefaultVersion(); + if (defaultVersion == null) { + return Optional.empty(); + } + return Optional.of(new MatrixResolution(defaultVersion, MatrixSourceLevel.CONNECTOR_DEFAULT)); + } + + /** Which level of the matrix produced a resolution. */ + public enum MatrixSourceLevel { + /** Matched a cohort whose deployments list contained this deployment's id. */ + COHORT, + /** Fell through to the connector's _default. */ + CONNECTOR_DEFAULT + } + + /** + * Structured result of a matrix resolution — just version and tier; serverVersion is stamped at + * the helper layer. + */ + public static final class MatrixResolution { + private final String resolved; + private final MatrixSourceLevel source; + + public MatrixResolution(String resolved, MatrixSourceLevel source) { + this.resolved = resolved; + this.source = source; + } + + public String getResolved() { + return resolved; + } + + public MatrixSourceLevel getSource() { + return source; + } + } +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Matrix.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Matrix.java new file mode 100644 index 000000000000..da602fc7ebbb --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Matrix.java @@ -0,0 +1,42 @@ +package com.linkedin.metadata.ingestion; + +import java.util.Collections; +import java.util.Map; + +/** + * In-memory snapshot of the per-connector CLI version matrix. + * + *

The matrix is keyed by server release version, then by connector type, with each entry + * carrying a {@code _default} version and an optional ordered list of canary cohorts. + * + *

This is a pure POJO produced by {@link MatrixSource} implementations and consumed by {@link + * IngestionVersionMatrixService} — the storage layer (HTTP, GMS aspect, config server, …) is + * decoupled from the resolution layer that walks the matrix and applies precedence rules. + */ +public final class Matrix { + + /** Empty matrix used when no source is configured or fetch has not yet succeeded. */ + public static final Matrix EMPTY = new Matrix(Collections.emptyMap()); + + private final Map> entriesByServerVersion; + + public Matrix(Map> entriesByServerVersion) { + this.entriesByServerVersion = + entriesByServerVersion == null + ? Collections.emptyMap() + : Collections.unmodifiableMap(entriesByServerVersion); + } + + /** + * Lookup the per-connector matrix entries for a given server release. Returns {@code null} if the + * server version has no entry — callers fall back to the workspace default. + */ + public Map getEntriesForServer(String serverVersion) { + return entriesByServerVersion.get(serverVersion); + } + + /** Number of server-version keys in the matrix. Used for diagnostic logging. */ + public int size() { + return entriesByServerVersion.size(); + } +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/MatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/MatrixSource.java new file mode 100644 index 000000000000..361a242ad450 --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/MatrixSource.java @@ -0,0 +1,49 @@ +package com.linkedin.metadata.ingestion; + +/** + * Storage abstraction for the per-connector CLI version matrix. + * + *

Decouples how the matrix is fetched/stored from how it is consumed. {@link + * IngestionVersionMatrixService} (the consumer) only knows that "something" returns a {@link + * Matrix}; implementations of this interface decide where the data comes from and how/when it + * refreshes. + * + *

Current implementations: + * + *

    + *
  • {@link HttpUrlMatrixSource} — periodic GET of a JSON document from a remote URL (S3, CDN, + * GitHub raw, …). + *
  • {@link NoOpMatrixSource} — always returns an empty matrix. Used as the default when no + * source is configured, so the resolution service never needs null checks. + *
+ * + *

Future implementations could include: + * + *

    + *
  • {@code GmsAspectMatrixSource} — reads the matrix from a metadata aspect on a {@code + * globalSettings} entity inside DataHub itself. Lets workspace admins edit the matrix through + * the UI/GraphQL the same way they edit any other setting. + *
  • {@code ConfigServerMatrixSource} — generic config-server backend (AWS AppConfig, Consul, + * etcd, …). + *
+ * + *

Implementations are responsible for their own caching, refresh cadence, and failure handling. + * The consumer assumes {@link #getMatrix()} is cheap to call on the hot path. + */ +public interface MatrixSource { + + /** + * Returns the latest available matrix snapshot. Never {@code null}; implementations should return + * {@link Matrix#EMPTY} if they have no data yet (e.g. initial fetch hasn't completed) or if the + * source is intentionally a no-op. + */ + Matrix getMatrix(); + + /** + * Returns the epoch-millis timestamp of when the currently-cached matrix was last successfully + * populated, or {@code 0} if no successful fetch has happened. Used by the resolution service to + * stamp forensic metadata on the execution request (so post-hoc triage can correlate the resolved + * version with matrix freshness). + */ + long getLastFetchedAtMillis(); +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpMatrixSource.java new file mode 100644 index 000000000000..95fa6e7405a0 --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpMatrixSource.java @@ -0,0 +1,22 @@ +package com.linkedin.metadata.ingestion; + +/** + * {@link MatrixSource} that always returns an empty matrix. Used when no matrix backend is + * configured (the OSS default — {@code INGESTION_VERSION_MATRIX_URL} is unset). + * + *

Always wiring a {@code NoOpMatrixSource} instead of leaving the consumer service null means + * {@link IngestionVersionMatrixService} can rely on a non-null source without null checks on the + * hot path, and unit tests that don't care about matrix data don't have to construct a real source. + */ +public final class NoOpMatrixSource implements MatrixSource { + + @Override + public Matrix getMatrix() { + return Matrix.EMPTY; + } + + @Override + public long getLastFetchedAtMillis() { + return 0L; + } +} diff --git a/metadata-service/configuration/src/main/resources/application.yaml b/metadata-service/configuration/src/main/resources/application.yaml index 6c1832a8166d..f46ee1556f1c 100644 --- a/metadata-service/configuration/src/main/resources/application.yaml +++ b/metadata-service/configuration/src/main/resources/application.yaml @@ -104,6 +104,22 @@ ingestion: defaultCliVersion: "${UI_INGESTION_DEFAULT_CLI_VERSION:@cliVersion@}" maxSerializedStringLength: "${INGESTION_MAX_SERIALIZED_STRING_LENGTH:16000000}" # Indicates the maximum allowed JSON String length Jackson will handle, impacts the maximum size of ingested aspects batchRefreshCount: ${INGESTION_BATCH_REFRESH_COUNT:100} # The number of entities to refresh in a single batch when refreshing entities after ingestion + # Optional: URL to a publicly accessible JSON file containing a per-connector CLI version matrix + # keyed by server release. When set, overrides defaultCliVersion on a per-connector basis. + # Update this file externally (e.g. S3/CDN) without redeploying to change connector versions. + versionMatrixUrl: "${INGESTION_VERSION_MATRIX_URL:}" + versionMatrixRefreshSeconds: ${INGESTION_VERSION_MATRIX_REFRESH_SECONDS:600} + # Optional. Sent verbatim as the `Authorization` header when fetching versionMatrixUrl. Required + # when the matrix is hosted behind authentication (e.g. a private GitHub repo). Format examples: + # token ghp_xxxxxxxxxxxxxxxx (GitHub PAT) + # Bearer eyJ... (OAuth / OIDC bearer) + # Leave unset for public URLs. Property name ends with "Token" so it is auto-redacted in system-info. + versionMatrixAuthToken: "${INGESTION_VERSION_MATRIX_AUTH_TOKEN:}" + # Identifier for this deployment, used for matching against `deployments` allowlists in the + # version matrix. Sourced from DATAHUB_EXECUTOR_CUSTOMER_ID, which the Acryl Cloud Helm chart + # injects from the K8s namespace. Empty in single-tenant / OSS deployments — cohort matching + # never fires in that case and only the per-connector `_default` from the matrix applies. + deploymentId: "${DATAHUB_EXECUTOR_CUSTOMER_ID:}" scheduler: refreshIntervalSeconds: ${INGESTION_SOURCE_REFRESH_INTERVAL_SECONDS:43200} # The interval at which the ingestion source scheduler will check for new or updated ingestion sources diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java new file mode 100644 index 000000000000..25431e1d4997 --- /dev/null +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java @@ -0,0 +1,136 @@ +package com.linkedin.metadata.ingestion; + +import static org.testng.Assert.assertEquals; +import static org.testng.Assert.assertNotNull; + +import com.linkedin.execution.CliVersionSource; +import java.util.Optional; +import org.mockito.Mockito; +import org.testng.annotations.Test; + +/** + * Focused unit tests for {@link CliVersionResolutionHelper}. + * + *

Covers the precedence ladder (per-source > matrix cohort > matrix connector default > + * workspace default) and the per-source normalization contract (null, empty, and whitespace-only + * strings all fall through to the next tier). The whitespace case matches the contract of {@code + * IngestionUtils.resolveIngestionCliVersion(...)} from #17471 — bootstrap YAML templating can + * render any of the three, and forwarding them to the executor silently picks the bundled CLI + * rather than the configured default. + */ +public class CliVersionResolutionHelperTest { + + private static final String DEFAULT_CLI = "0.14.0"; + private static final String SERVER_VERSION = "1.3.1.4"; + + @Test + public void testPerSourceVersionWins() { + CliVersionResolutionHelper.Result result = + CliVersionResolutionHelper.resolve( + "0.13.5", "snowflake", null, DEFAULT_CLI, SERVER_VERSION); + + assertEquals(result.getVersion(), "0.13.5"); + assertEquals(result.getStamp().getSource(), CliVersionSource.PER_SOURCE); + assertEquals(result.getStamp().getServerVersion(), SERVER_VERSION); + } + + @Test + public void testPerSourceWhitespaceIsTrimmed() { + CliVersionResolutionHelper.Result result = + CliVersionResolutionHelper.resolve( + " 0.13.5 ", "snowflake", null, DEFAULT_CLI, SERVER_VERSION); + + assertEquals(result.getVersion(), "0.13.5"); + assertEquals(result.getStamp().getSource(), CliVersionSource.PER_SOURCE); + } + + @Test + public void testPerSourceNullFallsThroughToDefault() { + CliVersionResolutionHelper.Result result = + CliVersionResolutionHelper.resolve(null, null, null, DEFAULT_CLI, SERVER_VERSION); + + assertEquals(result.getVersion(), DEFAULT_CLI); + assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + } + + @Test + public void testPerSourceEmptyFallsThroughToDefault() { + CliVersionResolutionHelper.Result result = + CliVersionResolutionHelper.resolve("", null, null, DEFAULT_CLI, SERVER_VERSION); + + assertEquals(result.getVersion(), DEFAULT_CLI); + assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + } + + @Test + public void testPerSourceWhitespaceOnlyFallsThroughToDefault() { + // Documents the contract from #17471: a bootstrap YAML field that renders as a blank string + // must be treated as "unset" so we hit the workspace default rather than passing the blank + // through to the executor. + CliVersionResolutionHelper.Result result = + CliVersionResolutionHelper.resolve(" ", null, null, DEFAULT_CLI, SERVER_VERSION); + + assertEquals(result.getVersion(), DEFAULT_CLI); + assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + } + + @Test + public void testMatrixConnectorDefaultWinsOverWorkspaceDefault() { + IngestionVersionMatrixService matrixService = Mockito.mock(IngestionVersionMatrixService.class); + Mockito.when(matrixService.resolveVersionWithSource("snowflake")) + .thenReturn( + Optional.of( + new IngestionVersionMatrixService.MatrixResolution( + "0.13.5", IngestionVersionMatrixService.MatrixSourceLevel.CONNECTOR_DEFAULT))); + + CliVersionResolutionHelper.Result result = + CliVersionResolutionHelper.resolve( + null, "snowflake", matrixService, DEFAULT_CLI, SERVER_VERSION); + + assertEquals(result.getVersion(), "0.13.5"); + assertEquals(result.getStamp().getSource(), CliVersionSource.MATRIX_CONNECTOR_DEFAULT); + } + + @Test + public void testMatrixCohortWinsOverConnectorDefault() { + IngestionVersionMatrixService matrixService = Mockito.mock(IngestionVersionMatrixService.class); + Mockito.when(matrixService.resolveVersionWithSource("snowflake")) + .thenReturn( + Optional.of( + new IngestionVersionMatrixService.MatrixResolution( + "0.13.6", IngestionVersionMatrixService.MatrixSourceLevel.COHORT))); + + CliVersionResolutionHelper.Result result = + CliVersionResolutionHelper.resolve( + null, "snowflake", matrixService, DEFAULT_CLI, SERVER_VERSION); + + assertEquals(result.getVersion(), "0.13.6"); + assertEquals(result.getStamp().getSource(), CliVersionSource.MATRIX_COHORT); + } + + @Test + public void testNullConnectorTypeSkipsMatrix() { + // A malformed test-connection recipe produces a null connector type; we must skip the matrix + // and fall through to the workspace default rather than throwing. + IngestionVersionMatrixService matrixService = Mockito.mock(IngestionVersionMatrixService.class); + + CliVersionResolutionHelper.Result result = + CliVersionResolutionHelper.resolve(null, null, matrixService, DEFAULT_CLI, SERVER_VERSION); + + assertEquals(result.getVersion(), DEFAULT_CLI); + assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + Mockito.verifyNoInteractions(matrixService); + } + + @Test + public void testNullDefaultStillReturnsStamp() { + // OSS misconfiguration (defaultCliVersion not set) — we still emit a deterministic stamp so + // forensic queries see a definite answer rather than a missing field. + CliVersionResolutionHelper.Result result = + CliVersionResolutionHelper.resolve(null, null, null, null, SERVER_VERSION); + + assertEquals(result.getVersion(), ""); + assertNotNull(result.getStamp()); + assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + } +} diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSourceValidationTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSourceValidationTest.java new file mode 100644 index 000000000000..d82d03da4cb5 --- /dev/null +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSourceValidationTest.java @@ -0,0 +1,177 @@ +package com.linkedin.metadata.ingestion; + +import static org.testng.Assert.assertEquals; +import static org.testng.Assert.assertFalse; +import static org.testng.Assert.assertNotNull; +import static org.testng.Assert.assertNull; +import static org.testng.Assert.assertThrows; +import static org.testng.Assert.assertTrue; + +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import org.testng.annotations.Test; + +/** + * Direct unit tests for the JSON-schema validation rules in {@link + * HttpUrlMatrixSource#parseMatrix}. + * + *

Two layers of validation are tested: + * + *

    + *
  • File-level (fail closed): {@link IllegalArgumentException} for root structure + * violations — caller refuses to swap the cache. + *
  • Entry-level (fail open + log): bad sub-entries are skipped; good entries around them + * are kept. We don't assert on the log lines themselves (brittle) — only on the resulting + * {@link Matrix} shape. + *
+ */ +public class HttpUrlMatrixSourceValidationTest { + + private static final ObjectMapper MAPPER = new ObjectMapper(); + + // --------------------------------------------------------------------------- + // File-level (fail closed) + // --------------------------------------------------------------------------- + + @Test + public void rootNotObjectThrowsAndCallerRetainsCache() throws Exception { + // A JSON array at the root is the realistic operator-error case (e.g. they + // exported a list of versions instead of the keyed object). We refuse to swap. + // The thrown IllegalArgumentException is the signal HttpUrlMatrixSource.refresh() + // uses to retain the last-known-good cache. + JsonNode root = MAPPER.readTree("[ {\"snowflake\": {} } ]"); + assertThrows(IllegalArgumentException.class, () -> HttpUrlMatrixSource.parseMatrix(root)); + } + + @Test + public void rootNullThrows() { + assertThrows(IllegalArgumentException.class, () -> HttpUrlMatrixSource.parseMatrix(null)); + } + + // --------------------------------------------------------------------------- + // Entry-level (fail open + log) — good entries survive bad neighbors + // --------------------------------------------------------------------------- + + @Test + public void invalidDefaultVersionIgnoredButCohortsKept() throws Exception { + // The "_default" is unusable (has a space), but cohorts are well-formed and + // should still drive cohort matches. The connector falls through to + // WORKSPACE_DEFAULT when no cohort matches. + JsonNode root = + MAPPER.readTree( + "{\"1.5.0\": {\"snowflake\": {" + + "\"_default\": \"not a version\"," + + "\"cohorts\": [{\"version\": \"1.5.0.6\", \"deployments\": [\"acme\"]}]" + + "}}}"); + Matrix m = HttpUrlMatrixSource.parseMatrix(root); + ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); + assertNotNull(snowflake); + assertNull( + snowflake.getDefaultVersion(), "invalid _default should be dropped, not stored verbatim"); + assertEquals(snowflake.getCohorts().size(), 1, "cohort should still be present"); + assertEquals(snowflake.getCohorts().get(0).getVersion(), "1.5.0.6"); + } + + @Test + public void cohortMissingVersionIsSkippedOthersKept() throws Exception { + // First cohort has no version field — skipped. Second cohort is well-formed — kept. + JsonNode root = + MAPPER.readTree( + "{\"1.5.0\": {\"snowflake\": {\"cohorts\": [" + + "{\"deployments\": [\"acme\"]}," + + "{\"version\": \"1.5.0.6\", \"deployments\": [\"acme\"]}" + + "]}}}"); + Matrix m = HttpUrlMatrixSource.parseMatrix(root); + ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); + assertEquals(snowflake.getCohorts().size(), 1, "first cohort (no version) should be dropped"); + assertEquals(snowflake.getCohorts().get(0).getVersion(), "1.5.0.6"); + } + + @Test + public void cohortWithGarbageVersionIsSkipped() throws Exception { + // Operator pasted a string with HTML — pattern rejects it. + JsonNode root = + MAPPER.readTree( + "{\"1.5.0\": {\"snowflake\": {\"cohorts\": [" + + "{\"version\": \"\", \"deployments\": [\"acme\"]}" + + "]}}}"); + Matrix m = HttpUrlMatrixSource.parseMatrix(root); + ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); + assertTrue( + snowflake.getCohorts().isEmpty(), "cohort with invalid version pattern should be dropped"); + } + + @Test + public void permissiveVersionPatternAcceptsRealPyPiVersions() throws Exception { + // Make sure we don't over-reject. Each is a real shape that appears on PyPI for acryl-datahub + // or similar. Includes rc, post, dev, and PEP 440 epoch prefix. + String[] realVersions = { + "1.5.0.19", "1.5.0.6rc1", "1.5.0.13.post1", "1!0.0.0.dev0", "0.14.0.6rc3" + }; + StringBuilder cohorts = new StringBuilder(); + for (int i = 0; i < realVersions.length; i++) { + if (i > 0) cohorts.append(","); + cohorts + .append("{\"version\": \"") + .append(realVersions[i]) + .append("\", \"deployments\": [\"d") + .append(i) + .append("\"]}"); + } + JsonNode root = + MAPPER.readTree("{\"1.5.0\": {\"snowflake\": {\"cohorts\": [" + cohorts + "]}}}"); + Matrix m = HttpUrlMatrixSource.parseMatrix(root); + assertEquals( + m.getEntriesForServer("1.5.0").get("snowflake").getCohorts().size(), + realVersions.length, + "all real PyPI-style versions should pass the permissive pattern"); + } + + @Test + public void connectorValueNotObjectIsSkippedOthersKept() throws Exception { + // "snowflake" got assigned an array by mistake — drop it. "bigquery" is fine — keep it. + JsonNode root = + MAPPER.readTree( + "{\"1.5.0\": {" + + "\"snowflake\": [\"oops\", \"this is wrong\"]," + + "\"bigquery\": {\"_default\": \"1.4.0.3\"}" + + "}}"); + Matrix m = HttpUrlMatrixSource.parseMatrix(root); + assertFalse( + m.getEntriesForServer("1.5.0").containsKey("snowflake"), + "malformed connector entry should be dropped"); + assertNotNull( + m.getEntriesForServer("1.5.0").get("bigquery"), + "well-formed sibling connector should survive"); + assertEquals(m.getEntriesForServer("1.5.0").get("bigquery").getDefaultVersion(), "1.4.0.3"); + } + + @Test + public void wellFormedMatrixParsesUnchanged() throws Exception { + // Regression test — the happy-path schema we documented in the class Javadoc still parses. + // Locks in the contract so future tightening of validation doesn't accidentally reject valid + // input. + JsonNode root = + MAPPER.readTree( + "{\"1.5.0\": {" + + "\"snowflake\": {" + + " \"_default\": \"1.5.0.5\"," + + " \"cohorts\": [{\"version\": \"1.5.0.6\", \"deployments\": [\"acme\", \"beta\"]}]" + + "}," + + "\"bigquery\": {\"_default\": \"1.4.0.3\"}" + + "}}"); + Matrix m = HttpUrlMatrixSource.parseMatrix(root); + ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); + assertEquals(snowflake.getDefaultVersion(), "1.5.0.5"); + assertEquals(snowflake.getCohorts().size(), 1); + Cohort cohort = snowflake.getCohorts().get(0); + assertEquals(cohort.getVersion(), "1.5.0.6"); + assertEquals(cohort.getDeployments().size(), 2); + assertTrue(cohort.getDeployments().contains("acme")); + assertTrue(cohort.getDeployments().contains("beta")); + + ConnectorEntry bigquery = m.getEntriesForServer("1.5.0").get("bigquery"); + assertEquals(bigquery.getDefaultVersion(), "1.4.0.3"); + assertTrue(bigquery.getCohorts().isEmpty()); + } +} diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixServiceTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixServiceTest.java new file mode 100644 index 000000000000..c8adbe32867b --- /dev/null +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixServiceTest.java @@ -0,0 +1,484 @@ +package com.linkedin.metadata.ingestion; + +import static org.testng.Assert.*; + +import com.sun.net.httpserver.HttpServer; +import java.io.IOException; +import java.io.OutputStream; +import java.net.InetSocketAddress; +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.Optional; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicReference; +import org.testng.annotations.Test; + +public class IngestionVersionMatrixServiceTest { + + private static final String SERVER_VERSION = "1.3.1.4"; + + // JSON matching the Notion-documented schema. Server version "1.3.1.4" has snowflake (with two + // cohorts) and bigquery (_default only); server version "2.0.0" is intentionally absent so we + // can exercise the "unknown server" path. + private static final String MATRIX_JSON = + "{\n" + + " \"1.3.1.4\": {\n" + + " \"snowflake\": {\n" + + " \"_default\": \"1.3.1.4\",\n" + + " \"cohorts\": [\n" + + " {\n" + + " \"version\": \"1.3.1.5\",\n" + + " \"deployments\": [\"deployment-b1\", \"deployment-b2\", \"deployment-b3\"]\n" + + " },\n" + + " {\n" + + " \"version\": \"1.3.1.6\",\n" + + " \"deployments\": [\"deployment-d1\", \"deployment-d2\"]\n" + + " }\n" + + " ]\n" + + " },\n" + + " \"bigquery\": {\n" + + " \"_default\": \"0.14.2\"\n" + + " }\n" + + " }\n" + + "}"; + + /** + * Returns a service backed by {@link HttpUrlMatrixSource} pointed at a temp-file URL containing + * {@link #MATRIX_JSON}. Polls briefly so the asynchronous initial fetch has a chance to populate + * the cache before the assertions run. + */ + private IngestionVersionMatrixService serviceWithMatrix(String serverVersion, String deploymentId) + throws IOException { + Path tmp = Files.createTempFile("version-matrix", ".json"); + Files.write(tmp, MATRIX_JSON.getBytes()); + tmp.toFile().deleteOnExit(); + + HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); + IngestionVersionMatrixService svc = + new IngestionVersionMatrixService(httpSource, serverVersion, deploymentId); + + // Wait briefly for the initial fetch to complete (delay=0 in the scheduled executor). + for (int i = 0; i < 20; i++) { + if (svc.resolveVersion("bigquery").isPresent()) { + break; + } + try { + Thread.sleep(100); + } catch (InterruptedException ignored) { + } + } + return svc; + } + + // ------------------------------------------------------------------------- + // Feature disabled (NoOpMatrixSource — what the factory binds when URL is unset) + // ------------------------------------------------------------------------- + + @Test + public void testDisabled_noOpSource() { + IngestionVersionMatrixService svc = + new IngestionVersionMatrixService(new NoOpMatrixSource(), SERVER_VERSION, "deployment-b1"); + assertEquals(svc.resolveVersion("snowflake"), Optional.empty()); + } + + // ------------------------------------------------------------------------- + // Connector default (no cohort match) + // ------------------------------------------------------------------------- + + @Test + public void testCustomerNotInAnyCohort_usesConnectorDefault() throws Exception { + IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-unknown"); + assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.4")); + } + + @Test + public void testConnectorWithoutCohorts_returnsDefault() throws Exception { + // bigquery is in the matrix with only a `default` and no `cohorts` array. + IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-b1"); + assertEquals(svc.resolveVersion("bigquery"), Optional.of("0.14.2")); + } + + // ------------------------------------------------------------------------- + // Cohort match + // ------------------------------------------------------------------------- + + @Test + public void testCustomerInFirstCohort_getsCohortVersion() throws Exception { + IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-b2"); + assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.5")); + } + + @Test + public void testCustomerInSecondCohort_getsCohortVersion() throws Exception { + IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-d1"); + assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.6")); + } + + // ------------------------------------------------------------------------- + // Missing deployment identity + // ------------------------------------------------------------------------- + + @Test + public void testNullCustomerId_neverMatchesCohort() throws Exception { + // A deployment that didn't wire DATAHUB_EXECUTOR_CUSTOMER_ID should fall through to the + // connector default rather than throw or behave unpredictably. + IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, null); + assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.4")); + } + + @Test + public void testEmptyCustomerId_neverMatchesCohort() throws Exception { + IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, ""); + assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.4")); + } + + // ------------------------------------------------------------------------- + // Misses + // ------------------------------------------------------------------------- + + @Test + public void testUnknownConnector_returnsEmpty() throws Exception { + // Connectors not present in the matrix should return empty so the caller falls back to the + // workspace-wide defaultCliVersion. This is intentionally different from the old `_default` + // server-level fallback the prior schema had. + IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-b1"); + assertEquals(svc.resolveVersion("redshift"), Optional.empty()); + } + + @Test + public void testUnknownServerVersion_returnsEmpty() throws Exception { + IngestionVersionMatrixService svc = serviceWithMatrix("2.0.0", "deployment-b1"); + assertEquals(svc.resolveVersion("snowflake"), Optional.empty()); + } + + // ------------------------------------------------------------------------- + // Fetch failure / malformed JSON + // ------------------------------------------------------------------------- + + @Test + public void testUnreachableUrlReturnsEmpty() { + // A URL that will always fail — the source should log a warning and the service returns empty + // (no prior matrix to fall back to). + HttpUrlMatrixSource httpSource = + new HttpUrlMatrixSource("http://localhost:19999/does-not-exist", 3600); + IngestionVersionMatrixService svc = + new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); + + try { + Thread.sleep(200); + } catch (InterruptedException ignored) { + } + + assertEquals(svc.resolveVersion("snowflake"), Optional.empty()); + } + + @Test + public void testMalformedJsonReturnsEmpty() throws Exception { + Path tmp = Files.createTempFile("bad-matrix", ".json"); + Files.write(tmp, "not valid json".getBytes()); + tmp.toFile().deleteOnExit(); + + HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); + IngestionVersionMatrixService svc = + new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); + + try { + Thread.sleep(300); + } catch (InterruptedException ignored) { + } + + assertEquals(svc.resolveVersion("snowflake"), Optional.empty()); + } + + // ------------------------------------------------------------------------- + // Cohort entries with missing fields + // ------------------------------------------------------------------------- + + @Test + public void testCohortWithoutVersion_isSkipped() throws Exception { + // A cohort missing a version should be skipped — we should not silently use the deployment's + // first deployments-list hit to mean "_default". Falling back to the connector _default is the + // safe behavior. + String json = + "{\n" + + " \"1.3.1.4\": {\n" + + " \"snowflake\": {\n" + + " \"_default\": \"1.3.1.4\",\n" + + " \"cohorts\": [\n" + + " { \"deployments\": [\"deployment-b1\"] },\n" + + " { \"version\": \"1.3.1.6\", \"deployments\": [\"deployment-d1\"] }\n" + + " ]\n" + + " }\n" + + " }\n" + + "}"; + + Path tmp = Files.createTempFile("cohort-no-version", ".json"); + Files.write(tmp, json.getBytes()); + tmp.toFile().deleteOnExit(); + + HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); + IngestionVersionMatrixService svc = + new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); + + for (int i = 0; i < 20; i++) { + Optional v = svc.resolveVersion("snowflake"); + if (v.isPresent()) { + // deployment-b1 is in the malformed cohort's deployments, but that cohort lacks a + // `version`. Should fall through to the connector _default rather than match the bad + // cohort. + assertEquals(v, Optional.of("1.3.1.4")); + return; + } + Thread.sleep(50); + } + fail("Matrix never loaded"); + } + + @Test + public void testCohortWithMissingDeployments_neverMatches() throws Exception { + // No deployments field at all → no deployment should ever match this cohort, even one that + // "looks empty". Use the _default. + String json = + "{\n" + + " \"1.3.1.4\": {\n" + + " \"snowflake\": {\n" + + " \"_default\": \"1.3.1.4\",\n" + + " \"cohorts\": [\n" + + " { \"version\": \"1.3.1.5\" }\n" + + " ]\n" + + " }\n" + + " }\n" + + "}"; + + Path tmp = Files.createTempFile("cohort-no-deployments", ".json"); + Files.write(tmp, json.getBytes()); + tmp.toFile().deleteOnExit(); + + HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); + IngestionVersionMatrixService svc = + new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); + + for (int i = 0; i < 20; i++) { + Optional v = svc.resolveVersion("snowflake"); + if (v.isPresent()) { + assertEquals(v, Optional.of("1.3.1.4")); + return; + } + Thread.sleep(50); + } + fail("Matrix never loaded"); + } + + // ------------------------------------------------------------------------- + // HttpUrlMatrixSource HTTP-level behavior (auth header, fetch-failure cache retention) + // These exercise the network code path; the other tests use file:// URIs which can't surface + // request-header or HTTP-status behavior. + // ------------------------------------------------------------------------- + + @Test + public void testAuthHeader_sentVerbatimWhenConfigured() throws Exception { + AtomicReference captured = new AtomicReference<>(); + HttpServer server = HttpServer.create(new InetSocketAddress("127.0.0.1", 0), 0); + server.createContext( + "/matrix", + exchange -> { + captured.set(exchange.getRequestHeaders().getFirst("Authorization")); + byte[] body = MATRIX_JSON.getBytes(); + exchange.sendResponseHeaders(200, body.length); + try (OutputStream os = exchange.getResponseBody()) { + os.write(body); + } + }); + server.start(); + try { + String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; + HttpUrlMatrixSource source = new HttpUrlMatrixSource(url, 3600, "token ghp_test_xyz"); + waitForFirstFetch(source); + assertEquals( + captured.get(), + "token ghp_test_xyz", + "Authorization header should be sent verbatim when authHeader is configured"); + } finally { + server.stop(0); + } + } + + @Test + public void testAuthHeader_omittedWhenUnset() throws Exception { + AtomicBoolean sawAuthHeader = new AtomicBoolean(false); + HttpServer server = HttpServer.create(new InetSocketAddress("127.0.0.1", 0), 0); + server.createContext( + "/matrix", + exchange -> { + sawAuthHeader.set(exchange.getRequestHeaders().containsKey("Authorization")); + byte[] body = MATRIX_JSON.getBytes(); + exchange.sendResponseHeaders(200, body.length); + try (OutputStream os = exchange.getResponseBody()) { + os.write(body); + } + }); + server.start(); + try { + String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; + // 2-arg constructor — no authHeader. + HttpUrlMatrixSource source = new HttpUrlMatrixSource(url, 3600); + waitForFirstFetch(source); + assertFalse( + sawAuthHeader.get(), + "Authorization header should NOT be sent when authHeader is null (public-URL semantics)"); + } finally { + server.stop(0); + } + } + + @Test + public void testFetchFailureAfterSuccess_retainsCachedMatrix() throws Exception { + AtomicInteger callCount = new AtomicInteger(); + HttpServer server = HttpServer.create(new InetSocketAddress("127.0.0.1", 0), 0); + server.createContext( + "/matrix", + exchange -> { + int n = callCount.incrementAndGet(); + if (n == 1) { + byte[] body = MATRIX_JSON.getBytes(); + exchange.sendResponseHeaders(200, body.length); + try (OutputStream os = exchange.getResponseBody()) { + os.write(body); + } + } else { + // 500 on subsequent refreshes. The source should keep its previously cached value. + exchange.sendResponseHeaders(500, -1); + exchange.close(); + } + }); + server.start(); + try { + String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; + HttpUrlMatrixSource source = new HttpUrlMatrixSource(url, 3600); + waitForFirstFetch(source); + + Matrix beforeFailure = source.getMatrix(); + long firstFetchTimestamp = source.getLastFetchedAtMillis(); + assertTrue(firstFetchTimestamp > 0, "Initial fetch should have populated the cache"); + + // Force a second refresh — the server now returns 500. + source.refresh(); + + assertSame( + source.getMatrix(), + beforeFailure, + "Matrix cache should retain the previously fetched value when a refresh fails"); + assertEquals( + source.getLastFetchedAtMillis(), + firstFetchTimestamp, + "lastFetchedAtMillis should not advance on a failed refresh"); + } finally { + server.stop(0); + } + } + + /** Polls until the source's initial scheduled fetch has populated the cache, or fails fast. */ + private static void waitForFirstFetch(HttpUrlMatrixSource source) throws InterruptedException { + for (int i = 0; i < 30; i++) { + if (source.getLastFetchedAtMillis() > 0) { + return; + } + Thread.sleep(50); + } + fail("Initial matrix fetch did not complete within 1.5s"); + } + + @Test + public void testAuthHeader_omittedWhenEmptyString() throws Exception { + AtomicBoolean sawAuthHeader = new AtomicBoolean(false); + HttpServer server = HttpServer.create(new InetSocketAddress("127.0.0.1", 0), 0); + server.createContext( + "/matrix", + exchange -> { + sawAuthHeader.set(exchange.getRequestHeaders().containsKey("Authorization")); + byte[] body = MATRIX_JSON.getBytes(); + exchange.sendResponseHeaders(200, body.length); + try (OutputStream os = exchange.getResponseBody()) { + os.write(body); + } + }); + server.start(); + try { + String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; + // Explicit empty string — distinct branch from null in HttpUrlMatrixSource.refresh(). + HttpUrlMatrixSource source = new HttpUrlMatrixSource(url, 3600, ""); + waitForFirstFetch(source); + assertFalse( + sawAuthHeader.get(), + "Empty-string authHeader should be treated like null — no Authorization header sent"); + } finally { + server.stop(0); + } + } + + // ------------------------------------------------------------------------- + // POJO direct construction — covers the null-input defensive branches that the + // HTTP parser never exercises (it always passes non-null empty collections). + // ------------------------------------------------------------------------- + + @Test + public void testNoOpMatrixSource_returnsEmptyAndZeroTimestamp() { + NoOpMatrixSource source = new NoOpMatrixSource(); + assertSame( + source.getMatrix(), Matrix.EMPTY, "NoOpMatrixSource should always return Matrix.EMPTY"); + assertEquals( + source.getLastFetchedAtMillis(), + 0L, + "NoOpMatrixSource should report 0 for last-fetched timestamp"); + } + + @Test + public void testCohort_nullDeploymentsBecomesEmpty() { + Cohort cohort = new Cohort("1.5.0.20", null); + assertEquals(cohort.getVersion(), "1.5.0.20"); + assertTrue( + cohort.getDeployments().isEmpty(), + "null deployments should default to an empty set (defensive null-handling)"); + } + + @Test + public void testConnectorEntry_nullCohortsBecomesEmpty() { + ConnectorEntry entry = new ConnectorEntry("1.5.0.14", null); + assertEquals(entry.getDefaultVersion(), "1.5.0.14"); + assertTrue( + entry.getCohorts().isEmpty(), + "null cohorts should default to an empty list (defensive null-handling)"); + } + + @Test + public void testConnectorWithoutDefault_andNoCohortMatch_returnsEmpty() throws Exception { + // Connector exists in the matrix but has no `_default` field AND we don't match any cohort. + // Caller should fall through to workspace-wide defaultCliVersion. + String json = + "{\n" + + " \"1.3.1.4\": {\n" + + " \"snowflake\": {\n" + + " \"cohorts\": [\n" + + " { \"version\": \"1.3.1.5\", \"deployments\": [\"deployment-x\"] }\n" + + " ]\n" + + " }\n" + + " }\n" + + "}"; + Path tmp = Files.createTempFile("no-default", ".json"); + Files.write(tmp, json.getBytes()); + tmp.toFile().deleteOnExit(); + + HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); + IngestionVersionMatrixService svc = + new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-unknown"); + + for (int i = 0; i < 20; i++) { + if (httpSource.getLastFetchedAtMillis() > 0) { + break; + } + Thread.sleep(50); + } + assertEquals(svc.resolveVersion("snowflake"), Optional.empty()); + } +} diff --git a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactory.java b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactory.java index 7cdfd3c4f65a..8584fd8ea568 100644 --- a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactory.java +++ b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactory.java @@ -31,6 +31,7 @@ import com.linkedin.metadata.graph.GraphClient; import com.linkedin.metadata.graph.GraphService; import com.linkedin.metadata.graph.SiblingGraphService; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import com.linkedin.metadata.models.registry.EntityRegistry; import com.linkedin.metadata.recommendation.RecommendationsService; import com.linkedin.metadata.search.SemanticSearchService; @@ -137,6 +138,10 @@ public class GraphQLEngineFactory { @Qualifier("gitVersion") private GitVersion gitVersion; + @Autowired + @Qualifier("ingestionVersionMatrixService") + private IngestionVersionMatrixService versionMatrixService; + @Autowired @Qualifier("timelineService") private TimelineService timelineService; @@ -267,6 +272,7 @@ protected GraphQLEngine graphQLEngine( args.setSecretService(secretService); args.setNativeUserService(nativeUserService); args.setIngestionConfiguration(configProvider.getIngestion()); + args.setIngestionVersionMatrixService(versionMatrixService); args.setAuthenticationConfiguration(configProvider.getAuthentication()); args.setAuthorizationConfiguration(configProvider.getAuthorization()); args.setGitVersion(gitVersion); diff --git a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionSchedulerFactory.java b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionSchedulerFactory.java index 116ed69a9d13..b9921b4f0ced 100644 --- a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionSchedulerFactory.java +++ b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionSchedulerFactory.java @@ -4,6 +4,7 @@ import com.linkedin.entity.client.SystemEntityClient; import com.linkedin.gms.factory.auth.SystemAuthenticationFactory; import com.linkedin.gms.factory.config.ConfigurationProvider; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import io.datahubproject.metadata.context.OperationContext; import javax.annotation.Nonnull; import org.springframework.beans.factory.annotation.Autowired; @@ -13,13 +14,17 @@ import org.springframework.context.annotation.Import; import org.springframework.context.annotation.Scope; -@Import({SystemAuthenticationFactory.class}) +@Import({SystemAuthenticationFactory.class, IngestionVersionMatrixServiceFactory.class}) public class IngestionSchedulerFactory { @Autowired @Qualifier("configurationProvider") private ConfigurationProvider configProvider; + @Autowired + @Qualifier("ingestionVersionMatrixService") + private IngestionVersionMatrixService versionMatrixService; + @Value("${ingestion.scheduler.delayIntervalSeconds:45}") // Boot up ingestion source cache after // waiting 45 seconds for startup. private Integer delayIntervalSeconds; @@ -38,6 +43,7 @@ protected IngestionScheduler getInstance( systemOpContext, entityClient, configProvider.getIngestion(), + versionMatrixService, delayIntervalSeconds, refreshIntervalSeconds); } diff --git a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactory.java b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactory.java new file mode 100644 index 000000000000..c2e191038a38 --- /dev/null +++ b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactory.java @@ -0,0 +1,79 @@ +package com.linkedin.gms.factory.ingestion; + +import com.linkedin.gms.factory.config.ConfigurationProvider; +import com.linkedin.metadata.config.IngestionConfiguration; +import com.linkedin.metadata.ingestion.HttpUrlMatrixSource; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.MatrixSource; +import com.linkedin.metadata.ingestion.NoOpMatrixSource; +import com.linkedin.metadata.version.GitVersion; +import javax.annotation.Nonnull; +import org.springframework.beans.factory.annotation.Autowired; +import org.springframework.beans.factory.annotation.Qualifier; +import org.springframework.context.annotation.Bean; +import org.springframework.context.annotation.Configuration; +import org.springframework.context.annotation.Scope; + +/** + * Wires up the per-connector CLI version matrix. + * + *

The wiring is split into two beans so that storage and consumption are decoupled: + * + *

    + *
  • {@code matrixSource} — implements {@link MatrixSource}; chosen based on configuration. + * Today the only "live" implementation is {@link HttpUrlMatrixSource}, picked when {@code + * ingestion.versionMatrixUrl} is set. Otherwise a {@link NoOpMatrixSource} is bound. Future + * implementations (GMS-aspect-backed, config-server-backed, …) can plug in here without any + * change to the consumer service. + *
  • {@code ingestionVersionMatrixService} — consumes whichever {@link MatrixSource} is bound + * and applies the resolution policy (cohort → connector default → workspace default). + *
+ * + *

The deployment identifier is sourced from {@code ingestion.deploymentId}, which is bound to + * the {@code DATAHUB_EXECUTOR_CUSTOMER_ID} env var injected by the Acryl Cloud Helm chart from the + * K8s namespace. In single-tenant / OSS deployments it is typically unset, so the deployment id is + * empty and cohort matching never fires — only the connector-level {@code _default} from the matrix + * applies. + */ +@Configuration +public class IngestionVersionMatrixServiceFactory { + + @Autowired + @Qualifier("configurationProvider") + private ConfigurationProvider configProvider; + + @Autowired + @Qualifier("gitVersion") + private GitVersion gitVersion; + + /** + * Picks the storage backend for the matrix. Today this is either HTTP-fetch or no-op; the + * decision is driven by whether a URL is configured. New backends should be added here behind an + * explicit config flag rather than by replacing the existing decision. + */ + @Bean(name = "matrixSource") + @Scope("singleton") + @Nonnull + protected MatrixSource matrixSource() { + IngestionConfiguration ingestionConfig = configProvider.getIngestion(); + String url = ingestionConfig.getVersionMatrixUrl(); + if (url == null || url.isEmpty()) { + return new NoOpMatrixSource(); + } + return new HttpUrlMatrixSource( + url, + ingestionConfig.getVersionMatrixRefreshSeconds(), + ingestionConfig.getVersionMatrixAuthToken()); + } + + @Bean(name = "ingestionVersionMatrixService") + @Scope("singleton") + @Nonnull + protected IngestionVersionMatrixService getInstance( + @Qualifier("matrixSource") final MatrixSource matrixSource) { + IngestionConfiguration ingestionConfig = configProvider.getIngestion(); + String serverVersion = (String) gitVersion.toConfig().get("version"); + return new IngestionVersionMatrixService( + matrixSource, serverVersion, ingestionConfig.getDeploymentId()); + } +} diff --git a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactoryTest.java b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactoryTest.java index 166c56b2d8b3..03d4ff2fcaed 100644 --- a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactoryTest.java +++ b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactoryTest.java @@ -24,6 +24,7 @@ import com.linkedin.metadata.graph.GraphClient; import com.linkedin.metadata.graph.GraphService; import com.linkedin.metadata.graph.SiblingGraphService; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; import com.linkedin.metadata.models.registry.EntityRegistry; import com.linkedin.metadata.recommendation.RecommendationsService; import com.linkedin.metadata.recommendation.candidatesource.RecentlySearchedSource; @@ -164,6 +165,10 @@ public void setup() { @Qualifier("gitVersion") private GitVersion gitVersion; + @MockitoBean + @Qualifier("ingestionVersionMatrixService") + private IngestionVersionMatrixService versionMatrixService; + @MockitoBean @Qualifier("timelineService") private TimelineService timelineService; @@ -369,6 +374,7 @@ public void testGraphQLEngineWithAnalyticsEnabled() { setField(factoryWithAnalytics, "entityRegistry", entityRegistry); setField(factoryWithAnalytics, "configProvider", configurationProvider); setField(factoryWithAnalytics, "gitVersion", gitVersion); + setField(factoryWithAnalytics, "versionMatrixService", versionMatrixService); setField(factoryWithAnalytics, "timelineService", timelineService); setField(factoryWithAnalytics, "nativeUserService", nativeUserService); setField(factoryWithAnalytics, "groupService", groupService); diff --git a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactoryTest.java b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactoryTest.java new file mode 100644 index 000000000000..ea93f1513faa --- /dev/null +++ b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactoryTest.java @@ -0,0 +1,108 @@ +package com.linkedin.gms.factory.ingestion; + +import static org.mockito.Mockito.*; +import static org.testng.Assert.*; + +import com.linkedin.gms.factory.config.ConfigurationProvider; +import com.linkedin.metadata.config.IngestionConfiguration; +import com.linkedin.metadata.ingestion.HttpUrlMatrixSource; +import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.MatrixSource; +import com.linkedin.metadata.ingestion.NoOpMatrixSource; +import com.linkedin.metadata.version.GitVersion; +import java.lang.reflect.Field; +import java.util.Map; +import org.testng.annotations.BeforeMethod; +import org.testng.annotations.Test; + +/** + * Direct unit tests for {@link IngestionVersionMatrixServiceFactory}. Exercises the branch that + * picks {@link NoOpMatrixSource} vs {@link HttpUrlMatrixSource} based on whether {@code + * versionMatrixUrl} is configured — the rest of the codebase only ever exercises the no-op path + * (test contexts don't set the env var). + */ +public class IngestionVersionMatrixServiceFactoryTest { + + private IngestionVersionMatrixServiceFactory factory; + private ConfigurationProvider configProvider; + private IngestionConfiguration ingestionConfig; + private GitVersion gitVersion; + + @BeforeMethod + public void setUp() { + factory = new IngestionVersionMatrixServiceFactory(); + configProvider = mock(ConfigurationProvider.class); + ingestionConfig = new IngestionConfiguration(); + gitVersion = mock(GitVersion.class); + + when(configProvider.getIngestion()).thenReturn(ingestionConfig); + // GitVersion.toConfig() is called for the server-version key. Returning an empty config is + // fine for the matrixSource() bean; only getInstance() reads the server version. + when(gitVersion.toConfig()).thenReturn(Map.of("version", "test-server-1.0")); + + setField(factory, "configProvider", configProvider); + setField(factory, "gitVersion", gitVersion); + } + + @Test + public void testMatrixSource_whenUrlIsNull_wiresNoOp() { + ingestionConfig.setVersionMatrixUrl(null); + + MatrixSource source = factory.matrixSource(); + + assertTrue( + source instanceof NoOpMatrixSource, + "Unset versionMatrixUrl should wire NoOpMatrixSource (OSS-safe default)"); + } + + @Test + public void testMatrixSource_whenUrlIsEmpty_wiresNoOp() { + ingestionConfig.setVersionMatrixUrl(""); + + MatrixSource source = factory.matrixSource(); + + assertTrue( + source instanceof NoOpMatrixSource, + "Empty-string versionMatrixUrl should be treated like unset → NoOpMatrixSource"); + } + + @Test + public void testMatrixSource_whenUrlIsSet_wiresHttpUrlSource() { + // file:// URI is fine — the factory only inspects the string, not whether it's reachable. + ingestionConfig.setVersionMatrixUrl("file:///tmp/nonexistent-matrix.json"); + ingestionConfig.setVersionMatrixRefreshSeconds(3600); + ingestionConfig.setVersionMatrixAuthToken(null); + + MatrixSource source = factory.matrixSource(); + + assertTrue( + source instanceof HttpUrlMatrixSource, + "Configured versionMatrixUrl should wire HttpUrlMatrixSource"); + } + + @Test + public void testGetInstance_buildsServiceWithServerVersionFromGitVersion() { + ingestionConfig.setVersionMatrixUrl(null); + ingestionConfig.setDeploymentId("test-deployment"); + when(gitVersion.toConfig()).thenReturn(Map.of("version", "1.5.0")); + + IngestionVersionMatrixService service = factory.getInstance(new NoOpMatrixSource()); + + assertNotNull(service); + assertEquals( + service.getServerVersion(), + "1.5.0", + "Service should be constructed with the GitVersion's reported version"); + } + + /** Reflection helper — the factory's autowired fields are private, like every Spring bean. */ + private static void setField(Object target, String name, Object value) { + try { + Field f = target.getClass().getDeclaredField(name); + f.setAccessible(true); + f.set(target, value); + } catch (Exception e) { + throw new RuntimeException("Failed to set field " + name, e); + } + } +} From 3f26512ce0b390156c48efde3a5f8de443858c29 Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Tue, 19 May 2026 12:09:48 +0530 Subject: [PATCH 02/20] docs(ingestion): note CliVersionProvenance is GMS-side intent MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The aspect stamps what GMS resolved, not what the executor actually installed. Calls out the three cases where these diverge — extra_pip transitive dep, no-acryl opt-out, bundled image short-circuit — so forensics queries don't conflate intent with install outcome. Co-Authored-By: Claude Opus 4.7 --- .../pegasus/com/linkedin/execution/CliVersionProvenance.pdl | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl b/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl index 310b8718abe1..6f3a595079e0 100644 --- a/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl +++ b/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl @@ -10,6 +10,12 @@ namespace com.linkedin.execution * exists so post-hoc forensics can answer "which tier produced the version, and which GMS wrote * this?" from a single SQL query without log archaeology. * + * NOTE: this stamps the version GMS *resolved*, not the version the executor actually + * installed. The executor may run a different effective version when (a) the recipe's + * `extra_pip` requirements transitively pull in `acryl-datahub`, (b) the customer opts out + * of installing `acryl-datahub` (e.g. `version="no-acryl-datahub"`), or (c) a bundled image + * short-circuits the install step. Treat this aspect as GMS-side intent, not proof-of-install. + * * The resolution chain is, in priority order: * 1. Per-source `config.version` explicit override (PER_SOURCE) * 2. Cohort whose `deployments` list contains this deployment's id (MATRIX_COHORT) From 26da88e12a2e57514daa601aac4bcde34db36f5b Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Wed, 20 May 2026 10:55:10 +0530 Subject: [PATCH 03/20] fix(deps): restore transitive lockfile entries dropped during rebase MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The rebase onto master (Jackson 2.21.1 → 2.21.3) inadvertently dropped caffeine, jackson-dataformat-smile, jsr305, and errorprone 2.3.3 from the metadata-service:configuration lockfile, plus narrowed the errorprone 2.47.0 scope. CI surfaced these as 'Resolved … which is not part of the dependency lock state' failures. Co-Authored-By: Claude Opus 4.7 --- metadata-service/configuration/gradle.lockfile | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/metadata-service/configuration/gradle.lockfile b/metadata-service/configuration/gradle.lockfile index 1698c2d99f09..6b5e29cb7e85 100644 --- a/metadata-service/configuration/gradle.lockfile +++ b/metadata-service/configuration/gradle.lockfile @@ -6,10 +6,14 @@ com.beust:jcommander:1.82=testCompileClasspath,testRuntimeClasspath com.fasterxml.jackson.core:jackson-annotations:2.21=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath com.fasterxml.jackson.core:jackson-core:2.21.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath com.fasterxml.jackson.core:jackson-databind:2.21.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +com.fasterxml.jackson.dataformat:jackson-dataformat-smile:2.21.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath com.fasterxml.jackson:jackson-bom:2.21.3=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath -com.google.errorprone:error_prone_annotations:2.47.0=spotless865458226 +com.github.ben-manes.caffeine:caffeine:2.7.0=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +com.google.code.findbugs:jsr305:3.0.2=compileClasspath,runtimeClasspath,testCompileClasspath,testRuntimeClasspath +com.google.errorprone:error_prone_annotations:2.3.3=compileClasspath,testCompileClasspath +com.google.errorprone:error_prone_annotations:2.47.0=runtimeClasspath,spotless865458226,testRuntimeClasspath com.google.googlejavaformat:google-java-format:1.18.1=spotless865458226 com.google.guava:failureaccess:1.0.3=runtimeClasspath,spotless865458226,testRuntimeClasspath com.google.guava:guava:33.6.0-jre=runtimeClasspath,spotless865458226,testRuntimeClasspath From bc0762bab4c18e73e5ba1b1bc04232c4d56e759b Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Thu, 28 May 2026 13:47:43 +0530 Subject: [PATCH 04/20] refactor(ingestion): rename CLI version matrix types per review feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses naming concerns raised in PR review: - `Matrix` was too generic — clearly an ingestion CLI version matrix - `CliVersionProvenance` used a suffix uncommon in this codebase - `CliVersionSource` enum values `PER_SOURCE` / `WORKSPACE_DEFAULT` read awkwardly Class renames (metadata-service/configuration/.../ingestion/): Matrix -> IngestionCliVersionMatrix MatrixSource -> IngestionCliVersionMatrixSource NoOpMatrixSource -> NoOpIngestionCliVersionMatrixSource HttpUrlMatrixSource -> HttpUrlIngestionCliVersionMatrixSource IngestionVersionMatrixService -> IngestionCliVersionMatrixService Factory + bean renames: IngestionVersionMatrixServiceFactory -> IngestionCliVersionMatrixServiceFactory bean "matrixSource" -> "ingestionCliVersionMatrixSource" bean "ingestionVersionMatrixService" -> "ingestionCliVersionMatrixService" PDL changes (metadata-models/.../execution/): CliVersionProvenance -> CliVersionAudit (record + .pdl file) ExecutionRequestInput field: cliVersionProvenance -> cliVersionAudit CliVersionSource enum values: PER_SOURCE -> SOURCE_CONFIG_OVERRIDE WORKSPACE_DEFAULT -> APPLICATION_DEFAULT (MATRIX_COHORT, MATRIX_CONNECTOR_DEFAULT unchanged) All three call sites updated to use renamed types + setter (setCliVersionAudit): CreateIngestionExecutionRequestResolver, CreateTestConnectionRequestResolver, IngestionScheduler. Tests + factories updated accordingly. End-to-end validated against a running GMS: all 4 resolution tiers + test-connection + scheduled trigger emit the new `cliVersionAudit` field with new enum values. Co-Authored-By: Claude Opus 4.7 --- .../datahub/graphql/GmsGraphQLEngine.java | 10 +- .../datahub/graphql/GmsGraphQLEngineArgs.java | 4 +- ...eateIngestionExecutionRequestResolver.java | 10 +- .../CreateTestConnectionRequestResolver.java | 14 +-- ...IngestionExecutionRequestResolverTest.java | 27 ++-- ...eateTestConnectionRequestResolverTest.java | 51 ++++---- .../ingestion/IngestionScheduler.java | 12 +- .../ingestion/IngestionSchedulerTest.java | 8 +- ...sionProvenance.pdl => CliVersionAudit.pdl} | 14 +-- .../execution/ExecutionRequestInput.pdl | 16 +-- .../ingestion/CliVersionResolutionHelper.java | 49 ++++---- ...tpUrlIngestionCliVersionMatrixSource.java} | 30 ++--- ...ix.java => IngestionCliVersionMatrix.java} | 19 +-- ... => IngestionCliVersionMatrixService.java} | 27 ++-- .../IngestionCliVersionMatrixSource.java | 49 ++++++++ .../metadata/ingestion/MatrixSource.java | 49 -------- .../NoOpIngestionCliVersionMatrixSource.java | 23 ++++ .../metadata/ingestion/NoOpMatrixSource.java | 22 ---- .../CliVersionResolutionHelperTest.java | 50 ++++---- ...CliVersionMatrixSourceValidationTest.java} | 31 +++-- ...IngestionCliVersionMatrixServiceTest.java} | 116 ++++++++++-------- .../factory/graphql/GraphQLEngineFactory.java | 8 +- ...estionCliVersionMatrixServiceFactory.java} | 45 +++---- .../ingestion/IngestionSchedulerFactory.java | 8 +- .../graphql/GraphQLEngineFactoryTest.java | 6 +- ...onCliVersionMatrixServiceFactoryTest.java} | 47 +++---- 26 files changed, 395 insertions(+), 350 deletions(-) rename metadata-models/src/main/pegasus/com/linkedin/execution/{CliVersionProvenance.pdl => CliVersionAudit.pdl} (82%) rename metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/{HttpUrlMatrixSource.java => HttpUrlIngestionCliVersionMatrixSource.java} (91%) rename metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/{Matrix.java => IngestionCliVersionMatrix.java} (58%) rename metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/{IngestionVersionMatrixService.java => IngestionCliVersionMatrixService.java} (84%) create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixSource.java delete mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/MatrixSource.java create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpIngestionCliVersionMatrixSource.java delete mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpMatrixSource.java rename metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/{HttpUrlMatrixSourceValidationTest.java => HttpUrlIngestionCliVersionMatrixSourceValidationTest.java} (86%) rename metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/{IngestionVersionMatrixServiceTest.java => IngestionCliVersionMatrixServiceTest.java} (76%) rename metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/{IngestionVersionMatrixServiceFactory.java => IngestionCliVersionMatrixServiceFactory.java} (57%) rename metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/{IngestionVersionMatrixServiceFactoryTest.java => IngestionCliVersionMatrixServiceFactoryTest.java} (60%) diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java index fcdf7ee65b0a..6ea321524b71 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java @@ -347,7 +347,7 @@ import com.linkedin.metadata.entity.versioning.EntityVersioningService; import com.linkedin.metadata.graph.GraphClient; import com.linkedin.metadata.graph.SiblingGraphService; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.models.registry.EntityRegistry; import com.linkedin.metadata.query.filter.SortCriterion; import com.linkedin.metadata.query.filter.SortOrder; @@ -454,7 +454,7 @@ public class GmsGraphQLEngine { private final FeatureFlags featureFlags; private final IngestionConfiguration ingestionConfiguration; - private final IngestionVersionMatrixService ingestionVersionMatrixService; + private final IngestionCliVersionMatrixService ingestionCliVersionMatrixService; private final AuthenticationConfiguration authenticationConfiguration; private final AuthorizationConfiguration authorizationConfiguration; private final VisualConfiguration visualConfiguration; @@ -598,7 +598,7 @@ public GmsGraphQLEngine(final GmsGraphQLEngineArgs args) { this.businessAttributeService = args.businessAttributeService; this.ingestionConfiguration = Objects.requireNonNull(args.ingestionConfiguration); - this.ingestionVersionMatrixService = args.ingestionVersionMatrixService; + this.ingestionCliVersionMatrixService = args.ingestionCliVersionMatrixService; this.authenticationConfiguration = Objects.requireNonNull(args.authenticationConfiguration); this.authorizationConfiguration = Objects.requireNonNull(args.authorizationConfiguration); this.visualConfiguration = args.visualConfiguration; @@ -1373,7 +1373,7 @@ private void configureMutationResolvers(final RuntimeWiring.Builder builder) { new CreateIngestionExecutionRequestResolver( this.entityClient, this.ingestionConfiguration, - this.ingestionVersionMatrixService)) + this.ingestionCliVersionMatrixService)) .dataFetcher( "cancelIngestionExecutionRequest", new CancelIngestionExecutionRequestResolver(this.entityClient)) @@ -1382,7 +1382,7 @@ private void configureMutationResolvers(final RuntimeWiring.Builder builder) { new CreateTestConnectionRequestResolver( this.entityClient, this.ingestionConfiguration, - this.ingestionVersionMatrixService)) + this.ingestionCliVersionMatrixService)) .dataFetcher( "upsertCustomAssertion", new UpsertCustomAssertionResolver(assertionService)) .dataFetcher( diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngineArgs.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngineArgs.java index f4b0adb533ed..8f74fd5a5797 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngineArgs.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngineArgs.java @@ -21,7 +21,7 @@ import com.linkedin.metadata.entity.versioning.EntityVersioningService; import com.linkedin.metadata.graph.GraphClient; import com.linkedin.metadata.graph.SiblingGraphService; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.models.registry.EntityRegistry; import com.linkedin.metadata.recommendation.RecommendationsService; import com.linkedin.metadata.search.SemanticSearchService; @@ -65,7 +65,7 @@ public class GmsGraphQLEngineArgs { SecretService secretService; NativeUserService nativeUserService; IngestionConfiguration ingestionConfiguration; - IngestionVersionMatrixService ingestionVersionMatrixService; + IngestionCliVersionMatrixService ingestionCliVersionMatrixService; AuthenticationConfiguration authenticationConfiguration; AuthorizationConfiguration authorizationConfiguration; GitVersion gitVersion; diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java index 3c166bd9a53e..23e8d46d75a8 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java @@ -24,7 +24,7 @@ import com.linkedin.ingestion.DataHubIngestionSourceInfo; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.utils.EntityKeyUtils; import com.linkedin.metadata.utils.IngestionUtils; @@ -50,7 +50,7 @@ public class CreateIngestionExecutionRequestResolver private final EntityClient _entityClient; private final IngestionConfiguration _ingestionConfiguration; - private final IngestionVersionMatrixService _versionMatrixService; + private final IngestionCliVersionMatrixService _versionMatrixService; /** Two-arg constructor — no per-connector version matrix is consulted. */ public CreateIngestionExecutionRequestResolver( @@ -66,7 +66,7 @@ public CreateIngestionExecutionRequestResolver( public CreateIngestionExecutionRequestResolver( final EntityClient entityClient, final IngestionConfiguration ingestionConfiguration, - final IngestionVersionMatrixService versionMatrixService) { + final IngestionCliVersionMatrixService versionMatrixService) { _entityClient = entityClient; _ingestionConfiguration = ingestionConfiguration; _versionMatrixService = versionMatrixService; @@ -141,7 +141,7 @@ public CompletableFuture get(final DataFetchingEnvironment environment) arguments.put(RECIPE_ARG_NAME, recipe); // Per-source version may be null, empty, or whitespace-only (bootstrap YAML // templating can render any of these); the helper normalizes all three to "unset" - // and falls through to the matrix / workspace default. See #17471 for the + // and falls through to the matrix / application default. See #17471 for the // whitespace-only edge case. final String explicitVersion = ingestionSourceInfo.getConfig().hasVersion() @@ -157,7 +157,7 @@ public CompletableFuture get(final DataFetchingEnvironment environment) ? _versionMatrixService.getServerVersion() : null); arguments.put(VERSION_ARG_NAME, resolution.getVersion()); - execInput.setCliVersionProvenance(resolution.getStamp()); + execInput.setCliVersionAudit(resolution.getStamp()); String debugMode = "false"; if (ingestionSourceInfo.getConfig().hasDebugMode()) { debugMode = ingestionSourceInfo.getConfig().isDebugMode() ? "true" : "false"; diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java index b3770f0682c8..48217bae11a7 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java @@ -17,7 +17,7 @@ import com.linkedin.execution.ExecutionRequestSource; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.utils.EntityKeyUtils; import com.linkedin.metadata.utils.IngestionUtils; @@ -40,9 +40,9 @@ *

    *
  1. {@code input.version} — explicit per-request override (existing behavior) *
  2. {@code matrix[serverVersion][source.type]} — connector-specific version pin from {@link - * IngestionVersionMatrixService} when enabled + * IngestionCliVersionMatrixService} when enabled *
  3. {@code matrix[serverVersion][source.type]._default} - *
  4. {@link IngestionConfiguration#getDefaultCliVersion()} — workspace-wide fallback + *
  5. {@link IngestionConfiguration#getDefaultCliVersion()} — application-wide fallback *
* *

Prior to this change the test-connection path silently omitted {@code version} when the input @@ -64,7 +64,7 @@ public class CreateTestConnectionRequestResolver implements DataFetcher get(final DataFetchingEnvironment environment) input.getRecipe(), executionRequestUrn.toString())); // input.getVersion() may be null, empty, or whitespace-only (UI forms can submit any // of these); the helper normalizes all three to "unset" and falls through to the - // matrix / workspace default. See #17471 for the whitespace-only edge case. + // matrix / application default. See #17471 for the whitespace-only edge case. final CliVersionResolutionHelper.Result resolution = CliVersionResolutionHelper.resolve( input.getVersion(), @@ -137,7 +137,7 @@ public CompletableFuture get(final DataFetchingEnvironment environment) arguments.put(VERSION_ARG_NAME, resolution.getVersion()); } execInput.setArgs(new StringMap(arguments)); - execInput.setCliVersionProvenance(resolution.getStamp()); + execInput.setCliVersionAudit(resolution.getStamp()); final MetadataChangeProposal proposal = buildMetadataChangeProposalWithKey( diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java index eee9c81fbcf6..2dec7669aeb5 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java @@ -20,7 +20,7 @@ import com.linkedin.ingestion.DataHubIngestionSourceSchedule; import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.utils.GenericRecordUtils; import com.linkedin.mxe.MetadataChangeProposal; import com.linkedin.r2.RemoteInvocationException; @@ -114,7 +114,7 @@ public void testVersionMatrixConnectorSpecificVersionUsed() throws Exception { config.setDefaultCliVersion("default-global"); // Matrix maps "1.3.1.4" → { "snowflake": "matrix-snowflake-version" } - IngestionVersionMatrixService matrixService = + IngestionCliVersionMatrixService matrixService = matrixServiceForConnector("snowflake", "matrix-snowflake-version", "1.3.1.4"); CreateIngestionExecutionRequestResolver resolver = @@ -128,7 +128,7 @@ public void testVersionMatrixConnectorSpecificVersionUsed() throws Exception { * When the matrix has no entry for the connector under the current server version, the resolver * falls back to the global {@code defaultCliVersion}. (Replaces the old {@code _default} * server-level fallback the previous schema offered — the new schema requires explicit - * per-connector entries, and unknown connectors fall through to the workspace default.) + * per-connector entries, and unknown connectors fall through to the application default.) */ @Test public void testVersionMatrixConnectorNotPresent_fallsBackToDefaultCliVersion() throws Exception { @@ -139,7 +139,7 @@ public void testVersionMatrixConnectorNotPresent_fallsBackToDefaultCliVersion() config.setDefaultCliVersion("default-global"); // Matrix has snowflake only; mysql is absent. - IngestionVersionMatrixService matrixService = + IngestionCliVersionMatrixService matrixService = matrixServiceForConnector("snowflake", "matrix-snowflake-version", "1.3.1.4"); CreateIngestionExecutionRequestResolver resolver = @@ -159,9 +159,11 @@ public void testVersionMatrixMissFallsBackToDefaultCliVersion() throws Exception config.setDefaultCliVersion("default-global"); // Matrix service backed by a NoOp source → always returns empty - IngestionVersionMatrixService matrixService = - new IngestionVersionMatrixService( - new com.linkedin.metadata.ingestion.NoOpMatrixSource(), "1.3.1.4", null); + IngestionCliVersionMatrixService matrixService = + new IngestionCliVersionMatrixService( + new com.linkedin.metadata.ingestion.NoOpIngestionCliVersionMatrixSource(), + "1.3.1.4", + null); CreateIngestionExecutionRequestResolver resolver = new CreateIngestionExecutionRequestResolver(mockClient, config, matrixService); @@ -215,7 +217,7 @@ private static void mockBatchGetV2(EntityClient mockClient, DataHubIngestionSour * *

deploymentId is left null since these tests don't exercise cohort matching. */ - private static IngestionVersionMatrixService matrixServiceForConnector( + private static IngestionCliVersionMatrixService matrixServiceForConnector( String connector, String version, String serverVersion) throws Exception { String json = String.format("{\"%s\":{\"%s\":{\"_default\":\"%s\"}}}", serverVersion, connector, version); @@ -224,10 +226,11 @@ private static IngestionVersionMatrixService matrixServiceForConnector( java.nio.file.Files.write(tmp, json.getBytes()); tmp.toFile().deleteOnExit(); - com.linkedin.metadata.ingestion.HttpUrlMatrixSource httpSource = - new com.linkedin.metadata.ingestion.HttpUrlMatrixSource(tmp.toUri().toString(), 3600); - IngestionVersionMatrixService svc = - new IngestionVersionMatrixService(httpSource, serverVersion, null); + com.linkedin.metadata.ingestion.HttpUrlIngestionCliVersionMatrixSource httpSource = + new com.linkedin.metadata.ingestion.HttpUrlIngestionCliVersionMatrixSource( + tmp.toUri().toString(), 3600); + IngestionCliVersionMatrixService svc = + new IngestionCliVersionMatrixService(httpSource, serverVersion, null); // Wait for the initial background fetch to complete for (int i = 0; i < 20; i++) { diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java index de77b97277f3..5860d7ea2e45 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java @@ -12,7 +12,7 @@ import com.linkedin.execution.ExecutionRequestInput; import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.utils.GenericRecordUtils; import com.linkedin.mxe.MetadataChangeProposal; import graphql.schema.DataFetchingEnvironment; @@ -43,13 +43,13 @@ public void testExplicitInputVersionWins() throws Exception { IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); - IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + IngestionCliVersionMatrixService matrix = Mockito.mock(IngestionCliVersionMatrixService.class); Mockito.when(matrix.resolveVersionWithSource("snowflake")) .thenReturn( Optional.of( - new IngestionVersionMatrixService.MatrixResolution( + new IngestionCliVersionMatrixService.MatrixResolution( MATRIX_SNOWFLAKE_VERSION, - IngestionVersionMatrixService.MatrixSourceLevel.COHORT))); + IngestionCliVersionMatrixService.MatrixSourceLevel.COHORT))); CreateTestConnectionRequestResolver resolver = new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration, matrix); @@ -67,13 +67,13 @@ public void testMatrixConnectorVersionUsedWhenInputVersionMissing() throws Excep IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); - IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + IngestionCliVersionMatrixService matrix = Mockito.mock(IngestionCliVersionMatrixService.class); Mockito.when(matrix.resolveVersionWithSource("snowflake")) .thenReturn( Optional.of( - new IngestionVersionMatrixService.MatrixResolution( + new IngestionCliVersionMatrixService.MatrixResolution( MATRIX_SNOWFLAKE_VERSION, - IngestionVersionMatrixService.MatrixSourceLevel.COHORT))); + IngestionCliVersionMatrixService.MatrixSourceLevel.COHORT))); CreateTestConnectionRequestResolver resolver = new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration, matrix); @@ -136,7 +136,7 @@ public void testFallsBackToDefaultWhenMatrixHasNoEntryForConnector() throws Exce IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); - IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + IngestionCliVersionMatrixService matrix = Mockito.mock(IngestionCliVersionMatrixService.class); Mockito.when(matrix.resolveVersionWithSource("snowflake")).thenReturn(Optional.empty()); CreateTestConnectionRequestResolver resolver = @@ -155,7 +155,7 @@ public void testFallsBackToDefaultWhenRecipeHasNoSourceType() throws Exception { IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); - IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + IngestionCliVersionMatrixService matrix = Mockito.mock(IngestionCliVersionMatrixService.class); CreateTestConnectionRequestResolver resolver = new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration, matrix); @@ -206,13 +206,13 @@ public void testStampsResolutionMetadata_cohortMatch() throws Exception { IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); - IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + IngestionCliVersionMatrixService matrix = Mockito.mock(IngestionCliVersionMatrixService.class); Mockito.when(matrix.resolveVersionWithSource("snowflake")) .thenReturn( Optional.of( - new IngestionVersionMatrixService.MatrixResolution( + new IngestionCliVersionMatrixService.MatrixResolution( MATRIX_SNOWFLAKE_VERSION, - IngestionVersionMatrixService.MatrixSourceLevel.COHORT))); + IngestionCliVersionMatrixService.MatrixSourceLevel.COHORT))); Mockito.when(matrix.getServerVersion()).thenReturn("1.3.1.4"); CreateTestConnectionRequestResolver resolver = @@ -222,7 +222,7 @@ public void testStampsResolutionMetadata_cohortMatch() throws Exception { runAndCaptureResolution(resolver, mockClient, TEST_INPUT_NO_VERSION); assertEquals(captured.getArgs().get("version"), MATRIX_SNOWFLAKE_VERSION); - com.linkedin.execution.CliVersionProvenance stamp = captured.getCliVersionProvenance(); + com.linkedin.execution.CliVersionAudit stamp = captured.getCliVersionAudit(); assertEquals(stamp.getSource(), com.linkedin.execution.CliVersionSource.MATRIX_COHORT); assertEquals(stamp.getServerVersion(), "1.3.1.4"); } @@ -232,7 +232,8 @@ public void testStampsResolutionMetadata_perSourceOverride() throws Exception { EntityClient mockClient = Mockito.mock(EntityClient.class); IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); - // No matrix configured — the explicit version must still produce a PER_SOURCE stamp. + // No matrix configured — the explicit version must still produce a SOURCE_CONFIG_OVERRIDE + // stamp. CreateTestConnectionRequestResolver resolver = new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); @@ -240,19 +241,19 @@ public void testStampsResolutionMetadata_perSourceOverride() throws Exception { runAndCaptureResolution(resolver, mockClient, TEST_INPUT_WITH_VERSION); assertEquals(captured.getArgs().get("version"), EXPLICIT_VERSION); - com.linkedin.execution.CliVersionProvenance stamp = captured.getCliVersionProvenance(); - assertEquals(stamp.getSource(), com.linkedin.execution.CliVersionSource.PER_SOURCE); + com.linkedin.execution.CliVersionAudit stamp = captured.getCliVersionAudit(); + assertEquals(stamp.getSource(), com.linkedin.execution.CliVersionSource.SOURCE_CONFIG_OVERRIDE); // No matrix service wired → no serverVersion to stamp. assertFalse(stamp.hasServerVersion()); } @Test - public void testStampsResolutionMetadata_workspaceDefault() throws Exception { + public void testStampsResolutionMetadata_applicationDefault() throws Exception { EntityClient mockClient = Mockito.mock(EntityClient.class); IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); - IngestionVersionMatrixService matrix = Mockito.mock(IngestionVersionMatrixService.class); + IngestionCliVersionMatrixService matrix = Mockito.mock(IngestionCliVersionMatrixService.class); Mockito.when(matrix.resolveVersionWithSource("snowflake")).thenReturn(Optional.empty()); Mockito.when(matrix.getServerVersion()).thenReturn("1.3.1.4"); @@ -263,16 +264,16 @@ public void testStampsResolutionMetadata_workspaceDefault() throws Exception { runAndCaptureResolution(resolver, mockClient, TEST_INPUT_NO_VERSION); assertEquals(captured.getArgs().get("version"), DEFAULT_VERSION); - com.linkedin.execution.CliVersionProvenance stamp = captured.getCliVersionProvenance(); - assertEquals(stamp.getSource(), com.linkedin.execution.CliVersionSource.WORKSPACE_DEFAULT); - // serverVersion is stamped even on WORKSPACE_DEFAULT when the matrix service is wired. + com.linkedin.execution.CliVersionAudit stamp = captured.getCliVersionAudit(); + assertEquals(stamp.getSource(), com.linkedin.execution.CliVersionSource.APPLICATION_DEFAULT); + // serverVersion is stamped even on APPLICATION_DEFAULT when the matrix service is wired. assertEquals(stamp.getServerVersion(), "1.3.1.4"); } /** * Captures the proposal and returns the full {@link ExecutionRequestInput}, so tests can assert - * on both {@code args.version} (where the CLI version string lives) and {@code - * cliVersionProvenance} (where the provenance stamp lives). + * on both {@code args.version} (where the CLI version string lives) and {@code cliVersionAudit} + * (where the audit stamp lives). */ private static ExecutionRequestInput runAndCaptureResolution( CreateTestConnectionRequestResolver resolver, @@ -297,8 +298,8 @@ private static ExecutionRequestInput runAndCaptureResolution( proposal.getAspect().getContentType(), ExecutionRequestInput.class); assertTrue( - recovered.hasCliVersionProvenance(), - "Expected cliVersionProvenance to be stamped on the ExecutionRequestInput"); + recovered.hasCliVersionAudit(), + "Expected cliVersionAudit to be stamped on the ExecutionRequestInput"); return recovered; } diff --git a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java index d7a88ffbbc07..b3526c76be73 100644 --- a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java +++ b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java @@ -18,7 +18,7 @@ import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.query.ListResult; import com.linkedin.metadata.utils.GenericRecordUtils; @@ -89,7 +89,7 @@ public class IngestionScheduler { private final ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(1); private final IngestionConfiguration ingestionConfiguration; - private final IngestionVersionMatrixService versionMatrixService; + private final IngestionCliVersionMatrixService versionMatrixService; private final int batchGetDelayIntervalSeconds; private final int batchGetRefreshIntervalSeconds; @@ -342,7 +342,7 @@ static class ExecutionRequestRunnable implements Runnable { private final OperationContext systemOpContext; private final EntityClient entityClient; private final IngestionConfiguration ingestionConfiguration; - private final IngestionVersionMatrixService versionMatrixService; + private final IngestionCliVersionMatrixService versionMatrixService; // Information about the ingestion source being executed private final Urn ingestionSourceUrn; @@ -359,7 +359,7 @@ public ExecutionRequestRunnable( @Nonnull final OperationContext systemOpContext, @Nonnull final EntityClient entityClient, @Nonnull final IngestionConfiguration ingestionConfiguration, - @Nonnull final IngestionVersionMatrixService versionMatrixService, + @Nonnull final IngestionCliVersionMatrixService versionMatrixService, @Nonnull final Urn ingestionSourceUrn, @Nonnull final DataHubIngestionSourceInfo ingestionSourceInfo, @Nonnull final Runnable deleteNextIngestionSourceExecution, @@ -418,7 +418,7 @@ public void run() { arguments.put(RECIPE_ARGUMENT_NAME, recipe); // Per-source version may be null, empty, or whitespace-only (bootstrap YAML templating // can render any of these); the helper normalizes all three to "unset" and falls through - // to the matrix / workspace default. See #17471 for the whitespace-only edge case. + // to the matrix / application default. See #17471 for the whitespace-only edge case. final String explicitVersion = ingestionSourceInfo.getConfig().hasVersion() ? ingestionSourceInfo.getConfig().getVersion() @@ -431,7 +431,7 @@ public void run() { ingestionConfiguration.getDefaultCliVersion(), versionMatrixService != null ? versionMatrixService.getServerVersion() : null); arguments.put(VERSION_ARGUMENT_NAME, resolution.getVersion()); - input.setCliVersionProvenance(resolution.getStamp()); + input.setCliVersionAudit(resolution.getStamp()); String debugMode = "false"; if (ingestionSourceInfo.getConfig().hasDebugMode()) { debugMode = ingestionSourceInfo.getConfig().isDebugMode() ? "true" : "false"; diff --git a/ingestion-scheduler/src/test/java/com/datahub/metadata/ingestion/IngestionSchedulerTest.java b/ingestion-scheduler/src/test/java/com/datahub/metadata/ingestion/IngestionSchedulerTest.java index 7d721d68c56c..775cfa97c760 100644 --- a/ingestion-scheduler/src/test/java/com/datahub/metadata/ingestion/IngestionSchedulerTest.java +++ b/ingestion-scheduler/src/test/java/com/datahub/metadata/ingestion/IngestionSchedulerTest.java @@ -17,7 +17,7 @@ import com.linkedin.ingestion.DataHubIngestionSourceSchedule; import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.query.ListResult; import io.datahubproject.metadata.context.OperationContext; import java.util.Collections; @@ -124,8 +124,10 @@ public void setupTest() throws Exception { Mockito.mock(OperationContext.class), mockClient, ingestionConfiguration, - new IngestionVersionMatrixService( - new com.linkedin.metadata.ingestion.NoOpMatrixSource(), "test", null), + new IngestionCliVersionMatrixService( + new com.linkedin.metadata.ingestion.NoOpIngestionCliVersionMatrixSource(), + "test", + null), 1, 1200); ingestionScheduler.init(); diff --git a/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl b/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionAudit.pdl similarity index 82% rename from metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl rename to metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionAudit.pdl index 6f3a595079e0..e32ddae1c696 100644 --- a/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionProvenance.pdl +++ b/metadata-models/src/main/pegasus/com/linkedin/execution/CliVersionAudit.pdl @@ -1,7 +1,7 @@ namespace com.linkedin.execution /** - * Structured provenance record for the CLI version chosen for an ingestion execution. + * Audit record for the CLI version chosen for an ingestion execution. * * Stamped on each ingestion or test-connection ExecutionRequestInput. Captures only metadata * about the resolution (which tier fired + which GMS performed it) — the resolved CLI version @@ -17,24 +17,24 @@ namespace com.linkedin.execution * short-circuits the install step. Treat this aspect as GMS-side intent, not proof-of-install. * * The resolution chain is, in priority order: - * 1. Per-source `config.version` explicit override (PER_SOURCE) + * 1. Per-source `config.version` explicit override (SOURCE_CONFIG_OVERRIDE) * 2. Cohort whose `deployments` list contains this deployment's id (MATRIX_COHORT) * 3. Connector's `_default` from the matrix (MATRIX_CONNECTOR_DEFAULT) - * 4. `defaultCliVersion` from application.yaml (WORKSPACE_DEFAULT) + * 4. `defaultCliVersion` from application.yaml (APPLICATION_DEFAULT) */ -record CliVersionProvenance { +record CliVersionAudit { /** * Which level of the resolution priority hit. */ source: enum CliVersionSource { - /** Step 1 — explicit cli_version on the source's config. */ - PER_SOURCE + /** Step 1 — explicit cli_version on the ingestion source's recipe config. */ + SOURCE_CONFIG_OVERRIDE /** Step 2 — matched a cohort whose deployments list contains this deployment's id. */ MATRIX_COHORT /** Step 3 — fell through to the connector's _default in the matrix. */ MATRIX_CONNECTOR_DEFAULT /** Step 4 — fell through to defaultCliVersion from application.yaml. */ - WORKSPACE_DEFAULT + APPLICATION_DEFAULT } /** diff --git a/metadata-models/src/main/pegasus/com/linkedin/execution/ExecutionRequestInput.pdl b/metadata-models/src/main/pegasus/com/linkedin/execution/ExecutionRequestInput.pdl index 6ab475b44bc6..f09a28660f6a 100644 --- a/metadata-models/src/main/pegasus/com/linkedin/execution/ExecutionRequestInput.pdl +++ b/metadata-models/src/main/pegasus/com/linkedin/execution/ExecutionRequestInput.pdl @@ -55,13 +55,13 @@ record ExecutionRequestInput { actorUrn: optional Urn /** - * Provenance metadata for the CLI version chosen for this execution — which tier of the - * resolution chain produced the version (per-source override / matrix cohort / matrix connector - * default / workspace default) and which GMS performed the resolution. Stamped at request time - * so post-hoc forensics does not require iterating the generic args map. The resolved CLI - * version string itself lives in `args.version` on this same aspect; this record deliberately - * does not duplicate it. Optional for backward compatibility — older execution requests will - * not have this set. + * Audit metadata for the CLI version chosen for this execution — which tier of the + * resolution chain produced the version (source config override / matrix cohort / matrix + * connector default / application default) and which GMS performed the resolution. Stamped at + * request time so post-hoc forensics does not require iterating the generic args map. The + * resolved CLI version string itself lives in `args.version` on this same aspect; this record + * deliberately does not duplicate it. Optional for backward compatibility — older execution + * requests will not have this set. */ - cliVersionProvenance: optional CliVersionProvenance + cliVersionAudit: optional CliVersionAudit } \ No newline at end of file diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java index f7741a7201dd..5fb3b41edc2b 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java @@ -1,6 +1,6 @@ package com.linkedin.metadata.ingestion; -import com.linkedin.execution.CliVersionProvenance; +import com.linkedin.execution.CliVersionAudit; import com.linkedin.execution.CliVersionSource; import java.util.Optional; import javax.annotation.Nullable; @@ -14,22 +14,22 @@ *

    *
  • {@code version}: the plain CLI version string the executor will install (written to {@code * args.version} on the ExecutionRequestInput aspect). - *
  • {@code stamp}: a structured {@link CliVersionProvenance} record describing HOW the version - * was chosen — written to the {@code cliVersionProvenance} field on the same aspect. + *
  • {@code stamp}: a structured {@link CliVersionAudit} record describing HOW the version was + * chosen — written to the {@code cliVersionAudit} field on the same aspect. *
* *

The two pieces intentionally don't duplicate each other: the version string lives only in - * {@code args.version}, and the stamp captures only the provenance fields (source tier, GMS server + * {@code args.version}, and the stamp captures only the audit fields (source tier, GMS server * version). Post-hoc forensics queries both via JSON paths on {@code metadata_aspect_v2}. * *

Resolution priority (top wins): * *

    - *
  1. Per-source explicit version on {@code DataHubIngestionSourceConfig.version} + *
  2. Source-config explicit override on {@code DataHubIngestionSourceConfig.version} *
  3. Matrix cohort match — first cohort whose {@code deployments} list contains this * deployment's id *
  4. Matrix connector default — the connector's {@code _default} entry - *
  5. Workspace default — {@code defaultCliVersion} from application.yaml + *
  6. Application default — {@code defaultCliVersion} from application.yaml *
*/ public final class CliVersionResolutionHelper { @@ -45,10 +45,10 @@ private CliVersionResolutionHelper() {} * {@code null} if not derivable (e.g. malformed test-connection recipe) * @param matrixService the version-matrix service; pass {@code null} for OSS callers that do not * consult a matrix (e.g. unit-test setups) - * @param defaultCliVersion the workspace-wide fallback from {@code IngestionConfiguration} + * @param defaultCliVersion the application-wide fallback from {@code IngestionConfiguration} * @param serverVersion the GMS server version (typically {@code GitVersion.getVersion()}). * Stamped on every returned record regardless of which tier hit; pass {@code null} only in - * tests that don't care about provenance. + * tests that don't care about audit data. * @return a {@link Result} carrying the resolved version string + the structured stamp. Never * {@code null}; the {@code version} field is guaranteed non-null except when {@code * defaultCliVersion} itself is null/empty (an OSS misconfiguration). @@ -56,13 +56,13 @@ private CliVersionResolutionHelper() {} public static Result resolve( @Nullable String explicitVersion, @Nullable String connectorType, - @Nullable IngestionVersionMatrixService matrixService, + @Nullable IngestionCliVersionMatrixService matrixService, @Nullable String defaultCliVersion, @Nullable String serverVersion) { // Normalize the per-source version: bootstrap YAML templating can render null, empty, or // whitespace-only strings, and all three should mean "unset" so we fall through to the - // matrix / workspace default. Matches the contract of + // matrix / application default. Matches the contract of // IngestionUtils.resolveIngestionCliVersion(...) introduced in #17471. final String normalizedExplicit = explicitVersion != null && !explicitVersion.trim().isEmpty() @@ -71,16 +71,17 @@ public static Result resolve( if (normalizedExplicit != null) { return new Result( - normalizedExplicit, stampWithSource(CliVersionSource.PER_SOURCE, serverVersion)); + normalizedExplicit, + stampWithSource(CliVersionSource.SOURCE_CONFIG_OVERRIDE, serverVersion)); } if (matrixService != null && connectorType != null && !connectorType.isEmpty()) { - Optional matrixResult = + Optional matrixResult = matrixService.resolveVersionWithSource(connectorType); if (matrixResult.isPresent()) { - IngestionVersionMatrixService.MatrixResolution r = matrixResult.get(); + IngestionCliVersionMatrixService.MatrixResolution r = matrixResult.get(); CliVersionSource pdlSource = - r.getSource() == IngestionVersionMatrixService.MatrixSourceLevel.COHORT + r.getSource() == IngestionCliVersionMatrixService.MatrixSourceLevel.COHORT ? CliVersionSource.MATRIX_COHORT : CliVersionSource.MATRIX_CONNECTOR_DEFAULT; return new Result(r.getResolved(), stampWithSource(pdlSource, serverVersion)); @@ -91,12 +92,12 @@ public static Result resolve( // resolution stamp so forensic queries see a deterministic answer rather than a missing field. return new Result( defaultCliVersion == null ? "" : defaultCliVersion, - stampWithSource(CliVersionSource.WORKSPACE_DEFAULT, serverVersion)); + stampWithSource(CliVersionSource.APPLICATION_DEFAULT, serverVersion)); } - private static CliVersionProvenance stampWithSource( + private static CliVersionAudit stampWithSource( CliVersionSource source, @Nullable String serverVersion) { - CliVersionProvenance out = new CliVersionProvenance().setSource(source); + CliVersionAudit out = new CliVersionAudit().setSource(source); if (serverVersion != null && !serverVersion.isEmpty()) { out.setServerVersion(serverVersion); } @@ -104,15 +105,15 @@ private static CliVersionProvenance stampWithSource( } /** - * Wraps the two outputs of {@link #resolve(String, String, IngestionVersionMatrixService, String, - * String)} — the plain CLI version string (for {@code args.version}) and the structured - * provenance stamp (for the {@code cliVersionProvenance} aspect field). + * Wraps the two outputs of {@link #resolve(String, String, IngestionCliVersionMatrixService, + * String, String)} — the plain CLI version string (for {@code args.version}) and the structured + * audit stamp (for the {@code cliVersionAudit} aspect field). */ public static final class Result { private final String version; - private final CliVersionProvenance stamp; + private final CliVersionAudit stamp; - public Result(String version, CliVersionProvenance stamp) { + public Result(String version, CliVersionAudit stamp) { this.version = version; this.stamp = stamp; } @@ -122,8 +123,8 @@ public String getVersion() { return version; } - /** The structured stamp to put on the {@code cliVersionProvenance} aspect field. */ - public CliVersionProvenance getStamp() { + /** The structured stamp to put on the {@code cliVersionAudit} aspect field. */ + public CliVersionAudit getStamp() { return stamp; } } diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java similarity index 91% rename from metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSource.java rename to metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java index 38dd6cc13723..6add93047af2 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSource.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java @@ -22,9 +22,9 @@ import lombok.extern.slf4j.Slf4j; /** - * {@link MatrixSource} backed by a publicly-readable HTTP URL serving the matrix JSON. Suitable for - * any deployment that wants to fetch the matrix from a CDN, object store (S3, GCS), or a GitHub raw - * URL without rebuilding or redeploying GMS to change connector versions. + * {@link IngestionCliVersionMatrixSource} backed by a publicly-readable HTTP URL serving the matrix + * JSON. Suitable for any deployment that wants to fetch the matrix from a CDN, object store (S3, + * GCS), or a GitHub raw URL without rebuilding or redeploying GMS to change connector versions. * *

The remote JSON must follow this schema: * @@ -55,7 +55,7 @@ * header is sent (the public-URL path is unchanged). */ @Slf4j -public class HttpUrlMatrixSource implements MatrixSource { +public class HttpUrlIngestionCliVersionMatrixSource implements IngestionCliVersionMatrixSource { private static final String DEFAULT_FIELD = "_default"; private static final String COHORTS_FIELD = "cohorts"; @@ -74,19 +74,20 @@ public class HttpUrlMatrixSource implements MatrixSource { private final String url; @Nullable private final String authHeader; - private final AtomicReference cached; + private final AtomicReference cached; private final AtomicLong lastFetchedAtMillis; private final ObjectMapper objectMapper; /** Convenience constructor for unauthenticated (public) URLs. */ - public HttpUrlMatrixSource(String url, int refreshIntervalSeconds) { + public HttpUrlIngestionCliVersionMatrixSource(String url, int refreshIntervalSeconds) { this(url, refreshIntervalSeconds, null); } - public HttpUrlMatrixSource(String url, int refreshIntervalSeconds, @Nullable String authHeader) { + public HttpUrlIngestionCliVersionMatrixSource( + String url, int refreshIntervalSeconds, @Nullable String authHeader) { this.url = url; this.authHeader = authHeader; - this.cached = new AtomicReference<>(Matrix.EMPTY); + this.cached = new AtomicReference<>(IngestionCliVersionMatrix.EMPTY); this.lastFetchedAtMillis = new AtomicLong(0L); this.objectMapper = new ObjectMapper(); @@ -96,7 +97,7 @@ public HttpUrlMatrixSource(String url, int refreshIntervalSeconds, @Nullable Str } @Override - public Matrix getMatrix() { + public IngestionCliVersionMatrix getMatrix() { return cached.get(); } @@ -119,7 +120,7 @@ void refresh() { try (InputStream is = conn.getInputStream()) { JsonNode root = objectMapper.readTree(is); - Matrix parsed; + IngestionCliVersionMatrix parsed; try { parsed = parseMatrix(root); } catch (IllegalArgumentException schemaError) { @@ -151,7 +152,8 @@ void refresh() { } /** - * Parses the nested schema into a {@link Matrix} with two layers of validation: + * Parses the nested schema into an {@link IngestionCliVersionMatrix} with two layers of + * validation: * *

    *
  • File-level (fail closed): if the root isn't a JSON object, throws {@link @@ -165,7 +167,7 @@ void refresh() { * *

    Package-private so tests can drive it directly with arbitrary {@link JsonNode} input. */ - static Matrix parseMatrix(JsonNode root) { + static IngestionCliVersionMatrix parseMatrix(JsonNode root) { if (root == null || !root.isObject()) { throw new IllegalArgumentException( "matrix root must be a JSON object, got: " @@ -200,7 +202,7 @@ static Matrix parseMatrix(JsonNode root) { }); entries.put(serverVersion, Collections.unmodifiableMap(connectors)); }); - return new Matrix(entries); + return new IngestionCliVersionMatrix(entries); } /** @@ -237,7 +239,7 @@ private static ConnectorEntry parseConnectorEntry( log.warn( "Ignoring invalid '_default' version: server='{}' connector='{}' version='{}'. " + "Cohort matches still apply; this connector will fall through to " - + "WORKSPACE_DEFAULT when no cohort matches.", + + "APPLICATION_DEFAULT when no cohort matches.", serverVersion, connector, candidate); diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Matrix.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java similarity index 58% rename from metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Matrix.java rename to metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java index da602fc7ebbb..91d25012bc35 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/Matrix.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java @@ -4,23 +4,26 @@ import java.util.Map; /** - * In-memory snapshot of the per-connector CLI version matrix. + * In-memory snapshot of the per-connector ingestion CLI version matrix. * *

    The matrix is keyed by server release version, then by connector type, with each entry * carrying a {@code _default} version and an optional ordered list of canary cohorts. * - *

    This is a pure POJO produced by {@link MatrixSource} implementations and consumed by {@link - * IngestionVersionMatrixService} — the storage layer (HTTP, GMS aspect, config server, …) is - * decoupled from the resolution layer that walks the matrix and applies precedence rules. + *

    This is a pure POJO produced by {@link IngestionCliVersionMatrixSource} implementations and + * consumed by {@link IngestionCliVersionMatrixService} — the storage layer (HTTP, GMS aspect, + * config server, …) is decoupled from the resolution layer that walks the matrix and applies + * precedence rules. */ -public final class Matrix { +public final class IngestionCliVersionMatrix { /** Empty matrix used when no source is configured or fetch has not yet succeeded. */ - public static final Matrix EMPTY = new Matrix(Collections.emptyMap()); + public static final IngestionCliVersionMatrix EMPTY = + new IngestionCliVersionMatrix(Collections.emptyMap()); private final Map> entriesByServerVersion; - public Matrix(Map> entriesByServerVersion) { + public IngestionCliVersionMatrix( + Map> entriesByServerVersion) { this.entriesByServerVersion = entriesByServerVersion == null ? Collections.emptyMap() @@ -29,7 +32,7 @@ public Matrix(Map> entriesByServerVersion) { /** * Lookup the per-connector matrix entries for a given server release. Returns {@code null} if the - * server version has no entry — callers fall back to the workspace default. + * server version has no entry — callers fall back to the application default. */ public Map getEntriesForServer(String serverVersion) { return entriesByServerVersion.get(serverVersion); diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixService.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java similarity index 84% rename from metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixService.java rename to metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java index 3bd90c139883..012859a17c37 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixService.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java @@ -4,20 +4,20 @@ import java.util.Optional; /** - * Resolves a CLI version for a given connector type, walking a {@link Matrix} returned by a - * pluggable {@link MatrixSource}. + * Resolves a CLI version for a given connector type, walking an {@link IngestionCliVersionMatrix} + * returned by a pluggable {@link IngestionCliVersionMatrixSource}. * *

    This class owns the resolution policy only — cohort ordering, allowlist matching, * connector-default fallback, and forensic metadata stamping. Where the matrix data comes from * (HTTP, a GMS metadata aspect, a config server, an in-memory test fixture, …) is the {@link - * MatrixSource}'s problem. + * IngestionCliVersionMatrixSource}'s problem. * *

    Cohort-based rollouts are aimed at multi-tenant deployments. Single-tenant installations leave * the deployment identifier unset, which makes cohort matching a no-op and falls through to the - * connector's {@code _default}. When no {@code MatrixSource} is configured at all, the {@link - * NoOpMatrixSource} wired by the factory ensures every {@link #resolveVersionWithSource(String)} - * returns {@link Optional#empty()}, preserving the existing {@code defaultCliVersion} behavior - * bit-identically. + * connector's {@code _default}. When no {@code IngestionCliVersionMatrixSource} is configured at + * all, the {@link NoOpIngestionCliVersionMatrixSource} wired by the factory ensures every {@link + * #resolveVersionWithSource(String)} returns {@link Optional#empty()}, preserving the existing + * {@code defaultCliVersion} behavior bit-identically. * *

    Resolution priority when picking a CLI version for an execution: * @@ -34,14 +34,14 @@ *

    Cohorts are evaluated in array order; the first deployments-list hit wins. An empty or missing * {@code deployments} list never matches. */ -public class IngestionVersionMatrixService { +public class IngestionCliVersionMatrixService { - private final MatrixSource source; + private final IngestionCliVersionMatrixSource source; private final String serverVersion; private final String deploymentId; - public IngestionVersionMatrixService( - MatrixSource source, String serverVersion, String deploymentId) { + public IngestionCliVersionMatrixService( + IngestionCliVersionMatrixSource source, String serverVersion, String deploymentId) { this.source = source; this.serverVersion = serverVersion; this.deploymentId = deploymentId; @@ -57,7 +57,8 @@ public String getServerVersion() { * source. Returns {@link Optional#empty()} when: * *

      - *
    • The source returned an empty matrix (no data yet, or {@link NoOpMatrixSource}) + *
    • The source returned an empty matrix (no data yet, or {@link + * NoOpIngestionCliVersionMatrixSource}) *
    • The current server version has no entry in the matrix *
    • The connector has no entry under the current server version *
    @@ -80,7 +81,7 @@ public Optional resolveVersion(String connectorType) { * resulting execution request (for post-hoc forensics). */ public Optional resolveVersionWithSource(String connectorType) { - Matrix matrix = source.getMatrix(); + IngestionCliVersionMatrix matrix = source.getMatrix(); Map serverEntry = matrix.getEntriesForServer(serverVersion); if (serverEntry == null) { return Optional.empty(); diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixSource.java new file mode 100644 index 000000000000..fb11aa55649c --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixSource.java @@ -0,0 +1,49 @@ +package com.linkedin.metadata.ingestion; + +/** + * Storage abstraction for the per-connector ingestion CLI version matrix. + * + *

    Decouples how the matrix is fetched/stored from how it is consumed. {@link + * IngestionCliVersionMatrixService} (the consumer) only knows that "something" returns an {@link + * IngestionCliVersionMatrix}; implementations of this interface decide where the data comes from + * and how/when it refreshes. + * + *

    Current implementations: + * + *

      + *
    • {@link HttpUrlIngestionCliVersionMatrixSource} — periodic GET of a JSON document from a + * remote URL (S3, CDN, GitHub raw, …). + *
    • {@link NoOpIngestionCliVersionMatrixSource} — always returns an empty matrix. Used as the + * default when no source is configured, so the resolution service never needs null checks. + *
    + * + *

    Future implementations could include: + * + *

      + *
    • {@code GmsAspectIngestionCliVersionMatrixSource} — reads the matrix from a metadata aspect + * on a {@code globalSettings} entity inside DataHub itself. Lets workspace admins edit the + * matrix through the UI/GraphQL the same way they edit any other setting. + *
    • {@code ConfigServerIngestionCliVersionMatrixSource} — generic config-server backend (AWS + * AppConfig, Consul, etcd, …). + *
    + * + *

    Implementations are responsible for their own caching, refresh cadence, and failure handling. + * The consumer assumes {@link #getMatrix()} is cheap to call on the hot path. + */ +public interface IngestionCliVersionMatrixSource { + + /** + * Returns the latest available matrix snapshot. Never {@code null}; implementations should return + * {@link IngestionCliVersionMatrix#EMPTY} if they have no data yet (e.g. initial fetch hasn't + * completed) or if the source is intentionally a no-op. + */ + IngestionCliVersionMatrix getMatrix(); + + /** + * Returns the epoch-millis timestamp of when the currently-cached matrix was last successfully + * populated, or {@code 0} if no successful fetch has happened. Used by the resolution service to + * stamp forensic metadata on the execution request (so post-hoc triage can correlate the resolved + * version with matrix freshness). + */ + long getLastFetchedAtMillis(); +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/MatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/MatrixSource.java deleted file mode 100644 index 361a242ad450..000000000000 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/MatrixSource.java +++ /dev/null @@ -1,49 +0,0 @@ -package com.linkedin.metadata.ingestion; - -/** - * Storage abstraction for the per-connector CLI version matrix. - * - *

    Decouples how the matrix is fetched/stored from how it is consumed. {@link - * IngestionVersionMatrixService} (the consumer) only knows that "something" returns a {@link - * Matrix}; implementations of this interface decide where the data comes from and how/when it - * refreshes. - * - *

    Current implementations: - * - *

      - *
    • {@link HttpUrlMatrixSource} — periodic GET of a JSON document from a remote URL (S3, CDN, - * GitHub raw, …). - *
    • {@link NoOpMatrixSource} — always returns an empty matrix. Used as the default when no - * source is configured, so the resolution service never needs null checks. - *
    - * - *

    Future implementations could include: - * - *

      - *
    • {@code GmsAspectMatrixSource} — reads the matrix from a metadata aspect on a {@code - * globalSettings} entity inside DataHub itself. Lets workspace admins edit the matrix through - * the UI/GraphQL the same way they edit any other setting. - *
    • {@code ConfigServerMatrixSource} — generic config-server backend (AWS AppConfig, Consul, - * etcd, …). - *
    - * - *

    Implementations are responsible for their own caching, refresh cadence, and failure handling. - * The consumer assumes {@link #getMatrix()} is cheap to call on the hot path. - */ -public interface MatrixSource { - - /** - * Returns the latest available matrix snapshot. Never {@code null}; implementations should return - * {@link Matrix#EMPTY} if they have no data yet (e.g. initial fetch hasn't completed) or if the - * source is intentionally a no-op. - */ - Matrix getMatrix(); - - /** - * Returns the epoch-millis timestamp of when the currently-cached matrix was last successfully - * populated, or {@code 0} if no successful fetch has happened. Used by the resolution service to - * stamp forensic metadata on the execution request (so post-hoc triage can correlate the resolved - * version with matrix freshness). - */ - long getLastFetchedAtMillis(); -} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpIngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpIngestionCliVersionMatrixSource.java new file mode 100644 index 000000000000..2986d4a375cd --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpIngestionCliVersionMatrixSource.java @@ -0,0 +1,23 @@ +package com.linkedin.metadata.ingestion; + +/** + * {@link IngestionCliVersionMatrixSource} that always returns an empty matrix. Used when no matrix + * backend is configured (the OSS default — {@code INGESTION_VERSION_MATRIX_URL} is unset). + * + *

    Always wiring a {@code NoOpIngestionCliVersionMatrixSource} instead of leaving the consumer + * service null means {@link IngestionCliVersionMatrixService} can rely on a non-null source without + * null checks on the hot path, and unit tests that don't care about matrix data don't have to + * construct a real source. + */ +public final class NoOpIngestionCliVersionMatrixSource implements IngestionCliVersionMatrixSource { + + @Override + public IngestionCliVersionMatrix getMatrix() { + return IngestionCliVersionMatrix.EMPTY; + } + + @Override + public long getLastFetchedAtMillis() { + return 0L; + } +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpMatrixSource.java deleted file mode 100644 index 95fa6e7405a0..000000000000 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/NoOpMatrixSource.java +++ /dev/null @@ -1,22 +0,0 @@ -package com.linkedin.metadata.ingestion; - -/** - * {@link MatrixSource} that always returns an empty matrix. Used when no matrix backend is - * configured (the OSS default — {@code INGESTION_VERSION_MATRIX_URL} is unset). - * - *

    Always wiring a {@code NoOpMatrixSource} instead of leaving the consumer service null means - * {@link IngestionVersionMatrixService} can rely on a non-null source without null checks on the - * hot path, and unit tests that don't care about matrix data don't have to construct a real source. - */ -public final class NoOpMatrixSource implements MatrixSource { - - @Override - public Matrix getMatrix() { - return Matrix.EMPTY; - } - - @Override - public long getLastFetchedAtMillis() { - return 0L; - } -} diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java index 25431e1d4997..988b726ff3e8 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java @@ -11,12 +11,12 @@ /** * Focused unit tests for {@link CliVersionResolutionHelper}. * - *

    Covers the precedence ladder (per-source > matrix cohort > matrix connector default > - * workspace default) and the per-source normalization contract (null, empty, and whitespace-only - * strings all fall through to the next tier). The whitespace case matches the contract of {@code - * IngestionUtils.resolveIngestionCliVersion(...)} from #17471 — bootstrap YAML templating can - * render any of the three, and forwarding them to the executor silently picks the bundled CLI - * rather than the configured default. + *

    Covers the precedence ladder (source config override > matrix cohort > matrix connector + * default > application default) and the per-source normalization contract (null, empty, and + * whitespace-only strings all fall through to the next tier). The whitespace case matches the + * contract of {@code IngestionUtils.resolveIngestionCliVersion(...)} from #17471 — bootstrap YAML + * templating can render any of the three, and forwarding them to the executor silently picks the + * bundled CLI rather than the configured default. */ public class CliVersionResolutionHelperTest { @@ -30,7 +30,7 @@ public void testPerSourceVersionWins() { "0.13.5", "snowflake", null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), "0.13.5"); - assertEquals(result.getStamp().getSource(), CliVersionSource.PER_SOURCE); + assertEquals(result.getStamp().getSource(), CliVersionSource.SOURCE_CONFIG_OVERRIDE); assertEquals(result.getStamp().getServerVersion(), SERVER_VERSION); } @@ -41,7 +41,7 @@ public void testPerSourceWhitespaceIsTrimmed() { " 0.13.5 ", "snowflake", null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), "0.13.5"); - assertEquals(result.getStamp().getSource(), CliVersionSource.PER_SOURCE); + assertEquals(result.getStamp().getSource(), CliVersionSource.SOURCE_CONFIG_OVERRIDE); } @Test @@ -50,7 +50,7 @@ public void testPerSourceNullFallsThroughToDefault() { CliVersionResolutionHelper.resolve(null, null, null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), DEFAULT_CLI); - assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); } @Test @@ -59,29 +59,31 @@ public void testPerSourceEmptyFallsThroughToDefault() { CliVersionResolutionHelper.resolve("", null, null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), DEFAULT_CLI); - assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); } @Test public void testPerSourceWhitespaceOnlyFallsThroughToDefault() { // Documents the contract from #17471: a bootstrap YAML field that renders as a blank string - // must be treated as "unset" so we hit the workspace default rather than passing the blank + // must be treated as "unset" so we hit the application default rather than passing the blank // through to the executor. CliVersionResolutionHelper.Result result = CliVersionResolutionHelper.resolve(" ", null, null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), DEFAULT_CLI); - assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); } @Test - public void testMatrixConnectorDefaultWinsOverWorkspaceDefault() { - IngestionVersionMatrixService matrixService = Mockito.mock(IngestionVersionMatrixService.class); + public void testMatrixConnectorDefaultWinsOverApplicationDefault() { + IngestionCliVersionMatrixService matrixService = + Mockito.mock(IngestionCliVersionMatrixService.class); Mockito.when(matrixService.resolveVersionWithSource("snowflake")) .thenReturn( Optional.of( - new IngestionVersionMatrixService.MatrixResolution( - "0.13.5", IngestionVersionMatrixService.MatrixSourceLevel.CONNECTOR_DEFAULT))); + new IngestionCliVersionMatrixService.MatrixResolution( + "0.13.5", + IngestionCliVersionMatrixService.MatrixSourceLevel.CONNECTOR_DEFAULT))); CliVersionResolutionHelper.Result result = CliVersionResolutionHelper.resolve( @@ -93,12 +95,13 @@ public void testMatrixConnectorDefaultWinsOverWorkspaceDefault() { @Test public void testMatrixCohortWinsOverConnectorDefault() { - IngestionVersionMatrixService matrixService = Mockito.mock(IngestionVersionMatrixService.class); + IngestionCliVersionMatrixService matrixService = + Mockito.mock(IngestionCliVersionMatrixService.class); Mockito.when(matrixService.resolveVersionWithSource("snowflake")) .thenReturn( Optional.of( - new IngestionVersionMatrixService.MatrixResolution( - "0.13.6", IngestionVersionMatrixService.MatrixSourceLevel.COHORT))); + new IngestionCliVersionMatrixService.MatrixResolution( + "0.13.6", IngestionCliVersionMatrixService.MatrixSourceLevel.COHORT))); CliVersionResolutionHelper.Result result = CliVersionResolutionHelper.resolve( @@ -111,14 +114,15 @@ public void testMatrixCohortWinsOverConnectorDefault() { @Test public void testNullConnectorTypeSkipsMatrix() { // A malformed test-connection recipe produces a null connector type; we must skip the matrix - // and fall through to the workspace default rather than throwing. - IngestionVersionMatrixService matrixService = Mockito.mock(IngestionVersionMatrixService.class); + // and fall through to the application default rather than throwing. + IngestionCliVersionMatrixService matrixService = + Mockito.mock(IngestionCliVersionMatrixService.class); CliVersionResolutionHelper.Result result = CliVersionResolutionHelper.resolve(null, null, matrixService, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), DEFAULT_CLI); - assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); Mockito.verifyNoInteractions(matrixService); } @@ -131,6 +135,6 @@ public void testNullDefaultStillReturnsStamp() { assertEquals(result.getVersion(), ""); assertNotNull(result.getStamp()); - assertEquals(result.getStamp().getSource(), CliVersionSource.WORKSPACE_DEFAULT); + assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); } } diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSourceValidationTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java similarity index 86% rename from metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSourceValidationTest.java rename to metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java index d82d03da4cb5..6dec07e05ba0 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlMatrixSourceValidationTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java @@ -13,7 +13,7 @@ /** * Direct unit tests for the JSON-schema validation rules in {@link - * HttpUrlMatrixSource#parseMatrix}. + * HttpUrlIngestionCliVersionMatrixSource#parseMatrix}. * *

    Two layers of validation are tested: * @@ -22,10 +22,10 @@ * violations — caller refuses to swap the cache. *

  • Entry-level (fail open + log): bad sub-entries are skipped; good entries around them * are kept. We don't assert on the log lines themselves (brittle) — only on the resulting - * {@link Matrix} shape. + * {@link IngestionCliVersionMatrix} shape. *
*/ -public class HttpUrlMatrixSourceValidationTest { +public class HttpUrlIngestionCliVersionMatrixSourceValidationTest { private static final ObjectMapper MAPPER = new ObjectMapper(); @@ -37,15 +37,20 @@ public class HttpUrlMatrixSourceValidationTest { public void rootNotObjectThrowsAndCallerRetainsCache() throws Exception { // A JSON array at the root is the realistic operator-error case (e.g. they // exported a list of versions instead of the keyed object). We refuse to swap. - // The thrown IllegalArgumentException is the signal HttpUrlMatrixSource.refresh() + // The thrown IllegalArgumentException is the signal + // HttpUrlIngestionCliVersionMatrixSource.refresh() // uses to retain the last-known-good cache. JsonNode root = MAPPER.readTree("[ {\"snowflake\": {} } ]"); - assertThrows(IllegalArgumentException.class, () -> HttpUrlMatrixSource.parseMatrix(root)); + assertThrows( + IllegalArgumentException.class, + () -> HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root)); } @Test public void rootNullThrows() { - assertThrows(IllegalArgumentException.class, () -> HttpUrlMatrixSource.parseMatrix(null)); + assertThrows( + IllegalArgumentException.class, + () -> HttpUrlIngestionCliVersionMatrixSource.parseMatrix(null)); } // --------------------------------------------------------------------------- @@ -56,14 +61,14 @@ public void rootNullThrows() { public void invalidDefaultVersionIgnoredButCohortsKept() throws Exception { // The "_default" is unusable (has a space), but cohorts are well-formed and // should still drive cohort matches. The connector falls through to - // WORKSPACE_DEFAULT when no cohort matches. + // APPLICATION_DEFAULT when no cohort matches. JsonNode root = MAPPER.readTree( "{\"1.5.0\": {\"snowflake\": {" + "\"_default\": \"not a version\"," + "\"cohorts\": [{\"version\": \"1.5.0.6\", \"deployments\": [\"acme\"]}]" + "}}}"); - Matrix m = HttpUrlMatrixSource.parseMatrix(root); + IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); assertNotNull(snowflake); assertNull( @@ -81,7 +86,7 @@ public void cohortMissingVersionIsSkippedOthersKept() throws Exception { + "{\"deployments\": [\"acme\"]}," + "{\"version\": \"1.5.0.6\", \"deployments\": [\"acme\"]}" + "]}}}"); - Matrix m = HttpUrlMatrixSource.parseMatrix(root); + IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); assertEquals(snowflake.getCohorts().size(), 1, "first cohort (no version) should be dropped"); assertEquals(snowflake.getCohorts().get(0).getVersion(), "1.5.0.6"); @@ -95,7 +100,7 @@ public void cohortWithGarbageVersionIsSkipped() throws Exception { "{\"1.5.0\": {\"snowflake\": {\"cohorts\": [" + "{\"version\": \"\", \"deployments\": [\"acme\"]}" + "]}}}"); - Matrix m = HttpUrlMatrixSource.parseMatrix(root); + IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); assertTrue( snowflake.getCohorts().isEmpty(), "cohort with invalid version pattern should be dropped"); @@ -120,7 +125,7 @@ public void permissiveVersionPatternAcceptsRealPyPiVersions() throws Exception { } JsonNode root = MAPPER.readTree("{\"1.5.0\": {\"snowflake\": {\"cohorts\": [" + cohorts + "]}}}"); - Matrix m = HttpUrlMatrixSource.parseMatrix(root); + IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); assertEquals( m.getEntriesForServer("1.5.0").get("snowflake").getCohorts().size(), realVersions.length, @@ -136,7 +141,7 @@ public void connectorValueNotObjectIsSkippedOthersKept() throws Exception { + "\"snowflake\": [\"oops\", \"this is wrong\"]," + "\"bigquery\": {\"_default\": \"1.4.0.3\"}" + "}}"); - Matrix m = HttpUrlMatrixSource.parseMatrix(root); + IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); assertFalse( m.getEntriesForServer("1.5.0").containsKey("snowflake"), "malformed connector entry should be dropped"); @@ -160,7 +165,7 @@ public void wellFormedMatrixParsesUnchanged() throws Exception { + "}," + "\"bigquery\": {\"_default\": \"1.4.0.3\"}" + "}}"); - Matrix m = HttpUrlMatrixSource.parseMatrix(root); + IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); assertEquals(snowflake.getDefaultVersion(), "1.5.0.5"); assertEquals(snowflake.getCohorts().size(), 1); diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixServiceTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java similarity index 76% rename from metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixServiceTest.java rename to metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java index c8adbe32867b..0966cfe49863 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionVersionMatrixServiceTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java @@ -14,7 +14,7 @@ import java.util.concurrent.atomic.AtomicReference; import org.testng.annotations.Test; -public class IngestionVersionMatrixServiceTest { +public class IngestionCliVersionMatrixServiceTest { private static final String SERVER_VERSION = "1.3.1.4"; @@ -44,19 +44,20 @@ public class IngestionVersionMatrixServiceTest { + "}"; /** - * Returns a service backed by {@link HttpUrlMatrixSource} pointed at a temp-file URL containing - * {@link #MATRIX_JSON}. Polls briefly so the asynchronous initial fetch has a chance to populate - * the cache before the assertions run. + * Returns a service backed by {@link HttpUrlIngestionCliVersionMatrixSource} pointed at a + * temp-file URL containing {@link #MATRIX_JSON}. Polls briefly so the asynchronous initial fetch + * has a chance to populate the cache before the assertions run. */ - private IngestionVersionMatrixService serviceWithMatrix(String serverVersion, String deploymentId) - throws IOException { + private IngestionCliVersionMatrixService serviceWithMatrix( + String serverVersion, String deploymentId) throws IOException { Path tmp = Files.createTempFile("version-matrix", ".json"); Files.write(tmp, MATRIX_JSON.getBytes()); tmp.toFile().deleteOnExit(); - HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); - IngestionVersionMatrixService svc = - new IngestionVersionMatrixService(httpSource, serverVersion, deploymentId); + HttpUrlIngestionCliVersionMatrixSource httpSource = + new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + IngestionCliVersionMatrixService svc = + new IngestionCliVersionMatrixService(httpSource, serverVersion, deploymentId); // Wait briefly for the initial fetch to complete (delay=0 in the scheduled executor). for (int i = 0; i < 20; i++) { @@ -72,13 +73,15 @@ private IngestionVersionMatrixService serviceWithMatrix(String serverVersion, St } // ------------------------------------------------------------------------- - // Feature disabled (NoOpMatrixSource — what the factory binds when URL is unset) + // Feature disabled (NoOpIngestionCliVersionMatrixSource — what the factory binds when URL is + // unset) // ------------------------------------------------------------------------- @Test public void testDisabled_noOpSource() { - IngestionVersionMatrixService svc = - new IngestionVersionMatrixService(new NoOpMatrixSource(), SERVER_VERSION, "deployment-b1"); + IngestionCliVersionMatrixService svc = + new IngestionCliVersionMatrixService( + new NoOpIngestionCliVersionMatrixSource(), SERVER_VERSION, "deployment-b1"); assertEquals(svc.resolveVersion("snowflake"), Optional.empty()); } @@ -88,14 +91,14 @@ public void testDisabled_noOpSource() { @Test public void testCustomerNotInAnyCohort_usesConnectorDefault() throws Exception { - IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-unknown"); + IngestionCliVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-unknown"); assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.4")); } @Test public void testConnectorWithoutCohorts_returnsDefault() throws Exception { // bigquery is in the matrix with only a `default` and no `cohorts` array. - IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-b1"); + IngestionCliVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-b1"); assertEquals(svc.resolveVersion("bigquery"), Optional.of("0.14.2")); } @@ -105,13 +108,13 @@ public void testConnectorWithoutCohorts_returnsDefault() throws Exception { @Test public void testCustomerInFirstCohort_getsCohortVersion() throws Exception { - IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-b2"); + IngestionCliVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-b2"); assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.5")); } @Test public void testCustomerInSecondCohort_getsCohortVersion() throws Exception { - IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-d1"); + IngestionCliVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-d1"); assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.6")); } @@ -123,13 +126,13 @@ public void testCustomerInSecondCohort_getsCohortVersion() throws Exception { public void testNullCustomerId_neverMatchesCohort() throws Exception { // A deployment that didn't wire DATAHUB_EXECUTOR_CUSTOMER_ID should fall through to the // connector default rather than throw or behave unpredictably. - IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, null); + IngestionCliVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, null); assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.4")); } @Test public void testEmptyCustomerId_neverMatchesCohort() throws Exception { - IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, ""); + IngestionCliVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, ""); assertEquals(svc.resolveVersion("snowflake"), Optional.of("1.3.1.4")); } @@ -140,15 +143,15 @@ public void testEmptyCustomerId_neverMatchesCohort() throws Exception { @Test public void testUnknownConnector_returnsEmpty() throws Exception { // Connectors not present in the matrix should return empty so the caller falls back to the - // workspace-wide defaultCliVersion. This is intentionally different from the old `_default` + // application-wide defaultCliVersion. This is intentionally different from the old `_default` // server-level fallback the prior schema had. - IngestionVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-b1"); + IngestionCliVersionMatrixService svc = serviceWithMatrix(SERVER_VERSION, "deployment-b1"); assertEquals(svc.resolveVersion("redshift"), Optional.empty()); } @Test public void testUnknownServerVersion_returnsEmpty() throws Exception { - IngestionVersionMatrixService svc = serviceWithMatrix("2.0.0", "deployment-b1"); + IngestionCliVersionMatrixService svc = serviceWithMatrix("2.0.0", "deployment-b1"); assertEquals(svc.resolveVersion("snowflake"), Optional.empty()); } @@ -160,10 +163,10 @@ public void testUnknownServerVersion_returnsEmpty() throws Exception { public void testUnreachableUrlReturnsEmpty() { // A URL that will always fail — the source should log a warning and the service returns empty // (no prior matrix to fall back to). - HttpUrlMatrixSource httpSource = - new HttpUrlMatrixSource("http://localhost:19999/does-not-exist", 3600); - IngestionVersionMatrixService svc = - new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); + HttpUrlIngestionCliVersionMatrixSource httpSource = + new HttpUrlIngestionCliVersionMatrixSource("http://localhost:19999/does-not-exist", 3600); + IngestionCliVersionMatrixService svc = + new IngestionCliVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); try { Thread.sleep(200); @@ -179,9 +182,10 @@ public void testMalformedJsonReturnsEmpty() throws Exception { Files.write(tmp, "not valid json".getBytes()); tmp.toFile().deleteOnExit(); - HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); - IngestionVersionMatrixService svc = - new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); + HttpUrlIngestionCliVersionMatrixSource httpSource = + new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + IngestionCliVersionMatrixService svc = + new IngestionCliVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); try { Thread.sleep(300); @@ -217,9 +221,10 @@ public void testCohortWithoutVersion_isSkipped() throws Exception { Files.write(tmp, json.getBytes()); tmp.toFile().deleteOnExit(); - HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); - IngestionVersionMatrixService svc = - new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); + HttpUrlIngestionCliVersionMatrixSource httpSource = + new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + IngestionCliVersionMatrixService svc = + new IngestionCliVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); for (int i = 0; i < 20; i++) { Optional v = svc.resolveVersion("snowflake"); @@ -255,9 +260,10 @@ public void testCohortWithMissingDeployments_neverMatches() throws Exception { Files.write(tmp, json.getBytes()); tmp.toFile().deleteOnExit(); - HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); - IngestionVersionMatrixService svc = - new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); + HttpUrlIngestionCliVersionMatrixSource httpSource = + new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + IngestionCliVersionMatrixService svc = + new IngestionCliVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); for (int i = 0; i < 20; i++) { Optional v = svc.resolveVersion("snowflake"); @@ -271,7 +277,8 @@ public void testCohortWithMissingDeployments_neverMatches() throws Exception { } // ------------------------------------------------------------------------- - // HttpUrlMatrixSource HTTP-level behavior (auth header, fetch-failure cache retention) + // HttpUrlIngestionCliVersionMatrixSource HTTP-level behavior (auth header, fetch-failure cache + // retention) // These exercise the network code path; the other tests use file:// URIs which can't surface // request-header or HTTP-status behavior. // ------------------------------------------------------------------------- @@ -293,7 +300,8 @@ public void testAuthHeader_sentVerbatimWhenConfigured() throws Exception { server.start(); try { String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; - HttpUrlMatrixSource source = new HttpUrlMatrixSource(url, 3600, "token ghp_test_xyz"); + HttpUrlIngestionCliVersionMatrixSource source = + new HttpUrlIngestionCliVersionMatrixSource(url, 3600, "token ghp_test_xyz"); waitForFirstFetch(source); assertEquals( captured.get(), @@ -322,7 +330,8 @@ public void testAuthHeader_omittedWhenUnset() throws Exception { try { String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; // 2-arg constructor — no authHeader. - HttpUrlMatrixSource source = new HttpUrlMatrixSource(url, 3600); + HttpUrlIngestionCliVersionMatrixSource source = + new HttpUrlIngestionCliVersionMatrixSource(url, 3600); waitForFirstFetch(source); assertFalse( sawAuthHeader.get(), @@ -355,10 +364,11 @@ public void testFetchFailureAfterSuccess_retainsCachedMatrix() throws Exception server.start(); try { String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; - HttpUrlMatrixSource source = new HttpUrlMatrixSource(url, 3600); + HttpUrlIngestionCliVersionMatrixSource source = + new HttpUrlIngestionCliVersionMatrixSource(url, 3600); waitForFirstFetch(source); - Matrix beforeFailure = source.getMatrix(); + IngestionCliVersionMatrix beforeFailure = source.getMatrix(); long firstFetchTimestamp = source.getLastFetchedAtMillis(); assertTrue(firstFetchTimestamp > 0, "Initial fetch should have populated the cache"); @@ -379,7 +389,8 @@ public void testFetchFailureAfterSuccess_retainsCachedMatrix() throws Exception } /** Polls until the source's initial scheduled fetch has populated the cache, or fails fast. */ - private static void waitForFirstFetch(HttpUrlMatrixSource source) throws InterruptedException { + private static void waitForFirstFetch(HttpUrlIngestionCliVersionMatrixSource source) + throws InterruptedException { for (int i = 0; i < 30; i++) { if (source.getLastFetchedAtMillis() > 0) { return; @@ -406,8 +417,10 @@ public void testAuthHeader_omittedWhenEmptyString() throws Exception { server.start(); try { String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; - // Explicit empty string — distinct branch from null in HttpUrlMatrixSource.refresh(). - HttpUrlMatrixSource source = new HttpUrlMatrixSource(url, 3600, ""); + // Explicit empty string — distinct branch from null in + // HttpUrlIngestionCliVersionMatrixSource.refresh(). + HttpUrlIngestionCliVersionMatrixSource source = + new HttpUrlIngestionCliVersionMatrixSource(url, 3600, ""); waitForFirstFetch(source); assertFalse( sawAuthHeader.get(), @@ -423,14 +436,16 @@ public void testAuthHeader_omittedWhenEmptyString() throws Exception { // ------------------------------------------------------------------------- @Test - public void testNoOpMatrixSource_returnsEmptyAndZeroTimestamp() { - NoOpMatrixSource source = new NoOpMatrixSource(); + public void testNoOpIngestionCliVersionMatrixSource_returnsEmptyAndZeroTimestamp() { + NoOpIngestionCliVersionMatrixSource source = new NoOpIngestionCliVersionMatrixSource(); assertSame( - source.getMatrix(), Matrix.EMPTY, "NoOpMatrixSource should always return Matrix.EMPTY"); + source.getMatrix(), + IngestionCliVersionMatrix.EMPTY, + "NoOpIngestionCliVersionMatrixSource should always return IngestionCliVersionMatrix.EMPTY"); assertEquals( source.getLastFetchedAtMillis(), 0L, - "NoOpMatrixSource should report 0 for last-fetched timestamp"); + "NoOpIngestionCliVersionMatrixSource should report 0 for last-fetched timestamp"); } @Test @@ -454,7 +469,7 @@ public void testConnectorEntry_nullCohortsBecomesEmpty() { @Test public void testConnectorWithoutDefault_andNoCohortMatch_returnsEmpty() throws Exception { // Connector exists in the matrix but has no `_default` field AND we don't match any cohort. - // Caller should fall through to workspace-wide defaultCliVersion. + // Caller should fall through to application-wide defaultCliVersion. String json = "{\n" + " \"1.3.1.4\": {\n" @@ -469,9 +484,10 @@ public void testConnectorWithoutDefault_andNoCohortMatch_returnsEmpty() throws E Files.write(tmp, json.getBytes()); tmp.toFile().deleteOnExit(); - HttpUrlMatrixSource httpSource = new HttpUrlMatrixSource(tmp.toUri().toString(), 3600); - IngestionVersionMatrixService svc = - new IngestionVersionMatrixService(httpSource, SERVER_VERSION, "deployment-unknown"); + HttpUrlIngestionCliVersionMatrixSource httpSource = + new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + IngestionCliVersionMatrixService svc = + new IngestionCliVersionMatrixService(httpSource, SERVER_VERSION, "deployment-unknown"); for (int i = 0; i < 20; i++) { if (httpSource.getLastFetchedAtMillis() > 0) { diff --git a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactory.java b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactory.java index 8584fd8ea568..5849489f0da5 100644 --- a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactory.java +++ b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactory.java @@ -31,7 +31,7 @@ import com.linkedin.metadata.graph.GraphClient; import com.linkedin.metadata.graph.GraphService; import com.linkedin.metadata.graph.SiblingGraphService; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.models.registry.EntityRegistry; import com.linkedin.metadata.recommendation.RecommendationsService; import com.linkedin.metadata.search.SemanticSearchService; @@ -139,8 +139,8 @@ public class GraphQLEngineFactory { private GitVersion gitVersion; @Autowired - @Qualifier("ingestionVersionMatrixService") - private IngestionVersionMatrixService versionMatrixService; + @Qualifier("ingestionCliVersionMatrixService") + private IngestionCliVersionMatrixService versionMatrixService; @Autowired @Qualifier("timelineService") @@ -272,7 +272,7 @@ protected GraphQLEngine graphQLEngine( args.setSecretService(secretService); args.setNativeUserService(nativeUserService); args.setIngestionConfiguration(configProvider.getIngestion()); - args.setIngestionVersionMatrixService(versionMatrixService); + args.setIngestionCliVersionMatrixService(versionMatrixService); args.setAuthenticationConfiguration(configProvider.getAuthentication()); args.setAuthorizationConfiguration(configProvider.getAuthorization()); args.setGitVersion(gitVersion); diff --git a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactory.java b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java similarity index 57% rename from metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactory.java rename to metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java index c2e191038a38..ad9fc982fdfd 100644 --- a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactory.java +++ b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java @@ -2,10 +2,10 @@ import com.linkedin.gms.factory.config.ConfigurationProvider; import com.linkedin.metadata.config.IngestionConfiguration; -import com.linkedin.metadata.ingestion.HttpUrlMatrixSource; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; -import com.linkedin.metadata.ingestion.MatrixSource; -import com.linkedin.metadata.ingestion.NoOpMatrixSource; +import com.linkedin.metadata.ingestion.HttpUrlIngestionCliVersionMatrixSource; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixSource; +import com.linkedin.metadata.ingestion.NoOpIngestionCliVersionMatrixSource; import com.linkedin.metadata.version.GitVersion; import javax.annotation.Nonnull; import org.springframework.beans.factory.annotation.Autowired; @@ -15,18 +15,20 @@ import org.springframework.context.annotation.Scope; /** - * Wires up the per-connector CLI version matrix. + * Wires up the per-connector ingestion CLI version matrix. * *

The wiring is split into two beans so that storage and consumption are decoupled: * *

    - *
  • {@code matrixSource} — implements {@link MatrixSource}; chosen based on configuration. - * Today the only "live" implementation is {@link HttpUrlMatrixSource}, picked when {@code - * ingestion.versionMatrixUrl} is set. Otherwise a {@link NoOpMatrixSource} is bound. Future - * implementations (GMS-aspect-backed, config-server-backed, …) can plug in here without any - * change to the consumer service. - *
  • {@code ingestionVersionMatrixService} — consumes whichever {@link MatrixSource} is bound - * and applies the resolution policy (cohort → connector default → workspace default). + *
  • {@code ingestionCliVersionMatrixSource} — implements {@link + * IngestionCliVersionMatrixSource}; chosen based on configuration. Today the only "live" + * implementation is {@link HttpUrlIngestionCliVersionMatrixSource}, picked when {@code + * ingestion.versionMatrixUrl} is set. Otherwise a {@link NoOpIngestionCliVersionMatrixSource} + * is bound. Future implementations (GMS-aspect-backed, config-server-backed, …) can plug in + * here without any change to the consumer service. + *
  • {@code ingestionCliVersionMatrixService} — consumes whichever {@link + * IngestionCliVersionMatrixSource} is bound and applies the resolution policy (cohort → + * connector default → application default). *
* *

The deployment identifier is sourced from {@code ingestion.deploymentId}, which is bound to @@ -36,7 +38,7 @@ * applies. */ @Configuration -public class IngestionVersionMatrixServiceFactory { +public class IngestionCliVersionMatrixServiceFactory { @Autowired @Qualifier("configurationProvider") @@ -51,29 +53,30 @@ public class IngestionVersionMatrixServiceFactory { * decision is driven by whether a URL is configured. New backends should be added here behind an * explicit config flag rather than by replacing the existing decision. */ - @Bean(name = "matrixSource") + @Bean(name = "ingestionCliVersionMatrixSource") @Scope("singleton") @Nonnull - protected MatrixSource matrixSource() { + protected IngestionCliVersionMatrixSource ingestionCliVersionMatrixSource() { IngestionConfiguration ingestionConfig = configProvider.getIngestion(); String url = ingestionConfig.getVersionMatrixUrl(); if (url == null || url.isEmpty()) { - return new NoOpMatrixSource(); + return new NoOpIngestionCliVersionMatrixSource(); } - return new HttpUrlMatrixSource( + return new HttpUrlIngestionCliVersionMatrixSource( url, ingestionConfig.getVersionMatrixRefreshSeconds(), ingestionConfig.getVersionMatrixAuthToken()); } - @Bean(name = "ingestionVersionMatrixService") + @Bean(name = "ingestionCliVersionMatrixService") @Scope("singleton") @Nonnull - protected IngestionVersionMatrixService getInstance( - @Qualifier("matrixSource") final MatrixSource matrixSource) { + protected IngestionCliVersionMatrixService getInstance( + @Qualifier("ingestionCliVersionMatrixSource") + final IngestionCliVersionMatrixSource matrixSource) { IngestionConfiguration ingestionConfig = configProvider.getIngestion(); String serverVersion = (String) gitVersion.toConfig().get("version"); - return new IngestionVersionMatrixService( + return new IngestionCliVersionMatrixService( matrixSource, serverVersion, ingestionConfig.getDeploymentId()); } } diff --git a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionSchedulerFactory.java b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionSchedulerFactory.java index b9921b4f0ced..cd8a5b02ae65 100644 --- a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionSchedulerFactory.java +++ b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionSchedulerFactory.java @@ -4,7 +4,7 @@ import com.linkedin.entity.client.SystemEntityClient; import com.linkedin.gms.factory.auth.SystemAuthenticationFactory; import com.linkedin.gms.factory.config.ConfigurationProvider; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import io.datahubproject.metadata.context.OperationContext; import javax.annotation.Nonnull; import org.springframework.beans.factory.annotation.Autowired; @@ -14,7 +14,7 @@ import org.springframework.context.annotation.Import; import org.springframework.context.annotation.Scope; -@Import({SystemAuthenticationFactory.class, IngestionVersionMatrixServiceFactory.class}) +@Import({SystemAuthenticationFactory.class, IngestionCliVersionMatrixServiceFactory.class}) public class IngestionSchedulerFactory { @Autowired @@ -22,8 +22,8 @@ public class IngestionSchedulerFactory { private ConfigurationProvider configProvider; @Autowired - @Qualifier("ingestionVersionMatrixService") - private IngestionVersionMatrixService versionMatrixService; + @Qualifier("ingestionCliVersionMatrixService") + private IngestionCliVersionMatrixService versionMatrixService; @Value("${ingestion.scheduler.delayIntervalSeconds:45}") // Boot up ingestion source cache after // waiting 45 seconds for startup. diff --git a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactoryTest.java b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactoryTest.java index 03d4ff2fcaed..f4588a179a4d 100644 --- a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactoryTest.java +++ b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/graphql/GraphQLEngineFactoryTest.java @@ -24,7 +24,7 @@ import com.linkedin.metadata.graph.GraphClient; import com.linkedin.metadata.graph.GraphService; import com.linkedin.metadata.graph.SiblingGraphService; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.models.registry.EntityRegistry; import com.linkedin.metadata.recommendation.RecommendationsService; import com.linkedin.metadata.recommendation.candidatesource.RecentlySearchedSource; @@ -166,8 +166,8 @@ public void setup() { private GitVersion gitVersion; @MockitoBean - @Qualifier("ingestionVersionMatrixService") - private IngestionVersionMatrixService versionMatrixService; + @Qualifier("ingestionCliVersionMatrixService") + private IngestionCliVersionMatrixService versionMatrixService; @MockitoBean @Qualifier("timelineService") diff --git a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactoryTest.java b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java similarity index 60% rename from metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactoryTest.java rename to metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java index ea93f1513faa..6f600f94b585 100644 --- a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionVersionMatrixServiceFactoryTest.java +++ b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java @@ -5,10 +5,10 @@ import com.linkedin.gms.factory.config.ConfigurationProvider; import com.linkedin.metadata.config.IngestionConfiguration; -import com.linkedin.metadata.ingestion.HttpUrlMatrixSource; -import com.linkedin.metadata.ingestion.IngestionVersionMatrixService; -import com.linkedin.metadata.ingestion.MatrixSource; -import com.linkedin.metadata.ingestion.NoOpMatrixSource; +import com.linkedin.metadata.ingestion.HttpUrlIngestionCliVersionMatrixSource; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixSource; +import com.linkedin.metadata.ingestion.NoOpIngestionCliVersionMatrixSource; import com.linkedin.metadata.version.GitVersion; import java.lang.reflect.Field; import java.util.Map; @@ -16,28 +16,30 @@ import org.testng.annotations.Test; /** - * Direct unit tests for {@link IngestionVersionMatrixServiceFactory}. Exercises the branch that - * picks {@link NoOpMatrixSource} vs {@link HttpUrlMatrixSource} based on whether {@code - * versionMatrixUrl} is configured — the rest of the codebase only ever exercises the no-op path - * (test contexts don't set the env var). + * Direct unit tests for {@link IngestionCliVersionMatrixServiceFactory}. Exercises the branch that + * picks {@link NoOpIngestionCliVersionMatrixSource} vs {@link + * HttpUrlIngestionCliVersionMatrixSource} based on whether {@code versionMatrixUrl} is configured — + * the rest of the codebase only ever exercises the no-op path (test contexts don't set the env + * var). */ -public class IngestionVersionMatrixServiceFactoryTest { +public class IngestionCliVersionMatrixServiceFactoryTest { - private IngestionVersionMatrixServiceFactory factory; + private IngestionCliVersionMatrixServiceFactory factory; private ConfigurationProvider configProvider; private IngestionConfiguration ingestionConfig; private GitVersion gitVersion; @BeforeMethod public void setUp() { - factory = new IngestionVersionMatrixServiceFactory(); + factory = new IngestionCliVersionMatrixServiceFactory(); configProvider = mock(ConfigurationProvider.class); ingestionConfig = new IngestionConfiguration(); gitVersion = mock(GitVersion.class); when(configProvider.getIngestion()).thenReturn(ingestionConfig); // GitVersion.toConfig() is called for the server-version key. Returning an empty config is - // fine for the matrixSource() bean; only getInstance() reads the server version. + // fine for the ingestionCliVersionMatrixSource() bean; only getInstance() reads the server + // version. when(gitVersion.toConfig()).thenReturn(Map.of("version", "test-server-1.0")); setField(factory, "configProvider", configProvider); @@ -48,22 +50,22 @@ public void setUp() { public void testMatrixSource_whenUrlIsNull_wiresNoOp() { ingestionConfig.setVersionMatrixUrl(null); - MatrixSource source = factory.matrixSource(); + IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); assertTrue( - source instanceof NoOpMatrixSource, - "Unset versionMatrixUrl should wire NoOpMatrixSource (OSS-safe default)"); + source instanceof NoOpIngestionCliVersionMatrixSource, + "Unset versionMatrixUrl should wire NoOpIngestionCliVersionMatrixSource (OSS-safe default)"); } @Test public void testMatrixSource_whenUrlIsEmpty_wiresNoOp() { ingestionConfig.setVersionMatrixUrl(""); - MatrixSource source = factory.matrixSource(); + IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); assertTrue( - source instanceof NoOpMatrixSource, - "Empty-string versionMatrixUrl should be treated like unset → NoOpMatrixSource"); + source instanceof NoOpIngestionCliVersionMatrixSource, + "Empty-string versionMatrixUrl should be treated like unset → NoOpIngestionCliVersionMatrixSource"); } @Test @@ -73,11 +75,11 @@ public void testMatrixSource_whenUrlIsSet_wiresHttpUrlSource() { ingestionConfig.setVersionMatrixRefreshSeconds(3600); ingestionConfig.setVersionMatrixAuthToken(null); - MatrixSource source = factory.matrixSource(); + IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); assertTrue( - source instanceof HttpUrlMatrixSource, - "Configured versionMatrixUrl should wire HttpUrlMatrixSource"); + source instanceof HttpUrlIngestionCliVersionMatrixSource, + "Configured versionMatrixUrl should wire HttpUrlIngestionCliVersionMatrixSource"); } @Test @@ -86,7 +88,8 @@ public void testGetInstance_buildsServiceWithServerVersionFromGitVersion() { ingestionConfig.setDeploymentId("test-deployment"); when(gitVersion.toConfig()).thenReturn(Map.of("version", "1.5.0")); - IngestionVersionMatrixService service = factory.getInstance(new NoOpMatrixSource()); + IngestionCliVersionMatrixService service = + factory.getInstance(new NoOpIngestionCliVersionMatrixSource()); assertNotNull(service); assertEquals( From 5d9bd0a5a32aa05be56e6af6d309cbcfba572088 Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Thu, 28 May 2026 14:14:47 +0530 Subject: [PATCH 05/20] refactor(ingestion): wrap inner matrix map in a named ServerEntry POJO MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses PR review feedback that `Map>` read as opaque at lookup sites — the meaning of either string key was only inferable from variable names and Javadoc. Wrapping the inner map in a named POJO makes call sites self-documenting: Before: Map serverEntry = matrix.getEntriesForServer(v); ConnectorEntry entry = serverEntry.get(connectorType); After: ServerEntry serverEntry = matrix.getEntriesForServer(v); ConnectorEntry entry = serverEntry.getConnectorEntry(connectorType); `ServerEntry` also gives a single place to add server-level behavior (e.g. `getConnectorTypes()` for diagnostics) without changing every signature that used to traffic in `Map`. Scope: - New file: ServerEntry.java - IngestionCliVersionMatrix: inner Map value type → ServerEntry - IngestionCliVersionMatrixService: lookup uses getConnectorEntry(...) - HttpUrlIngestionCliVersionMatrixSource.parseMatrix: builds ServerEntry instances (its constructor takes over the unmodifiableMap wrapping) - HttpUrlIngestionCliVersionMatrixSourceValidationTest: call-site updates; one containsKey assertion replaced with assertNull(getConnectorEntry) Declarative Jackson deserialization (the reviewer's secondary suggestion) was considered and deferred: the current manual tree-API parser keeps fail-closed file-level validation alongside fail-open entry-level skipping with structured path-aware WARN logs, behavior that's awkward to preserve with @JsonDeserialize. All 90 configuration unit tests still pass; end-to-end re-verified against a running GMS (3 demo scripts: 4-tier ladder, full happy path, phase-2 edge cases including scheduled trigger and executor observation). Co-Authored-By: Claude Opus 4.7 --- ...ttpUrlIngestionCliVersionMatrixSource.java | 5 +- .../ingestion/IngestionCliVersionMatrix.java | 17 +++---- .../IngestionCliVersionMatrixService.java | 5 +- .../metadata/ingestion/ServerEntry.java | 47 +++++++++++++++++++ ...nCliVersionMatrixSourceValidationTest.java | 23 ++++----- 5 files changed, 72 insertions(+), 25 deletions(-) create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ServerEntry.java diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java index 6add93047af2..79216f679ef5 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java @@ -6,7 +6,6 @@ import java.net.URL; import java.net.URLConnection; import java.util.ArrayList; -import java.util.Collections; import java.util.HashMap; import java.util.HashSet; import java.util.List; @@ -174,7 +173,7 @@ static IngestionCliVersionMatrix parseMatrix(JsonNode root) { + (root == null ? "null" : root.getNodeType().toString().toLowerCase())); } - Map> entries = new HashMap<>(); + Map entries = new HashMap<>(); root.fields() .forEachRemaining( serverEntry -> { @@ -200,7 +199,7 @@ static IngestionCliVersionMatrix parseMatrix(JsonNode root) { connectors.put(connector, parsed); } }); - entries.put(serverVersion, Collections.unmodifiableMap(connectors)); + entries.put(serverVersion, new ServerEntry(connectors)); }); return new IngestionCliVersionMatrix(entries); } diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java index 91d25012bc35..1679f403a354 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java @@ -6,8 +6,10 @@ /** * In-memory snapshot of the per-connector ingestion CLI version matrix. * - *

The matrix is keyed by server release version, then by connector type, with each entry - * carrying a {@code _default} version and an optional ordered list of canary cohorts. + *

The matrix is keyed by server release version → {@link ServerEntry}, which in turn maps + * connector type to a {@link ConnectorEntry} carrying a {@code _default} version and an optional + * ordered list of canary cohorts. Wrapping the inner map in {@link ServerEntry} keeps lookup sites + * self-documenting and gives us a single place to add server-level behavior later. * *

This is a pure POJO produced by {@link IngestionCliVersionMatrixSource} implementations and * consumed by {@link IngestionCliVersionMatrixService} — the storage layer (HTTP, GMS aspect, @@ -20,10 +22,9 @@ public final class IngestionCliVersionMatrix { public static final IngestionCliVersionMatrix EMPTY = new IngestionCliVersionMatrix(Collections.emptyMap()); - private final Map> entriesByServerVersion; + private final Map entriesByServerVersion; - public IngestionCliVersionMatrix( - Map> entriesByServerVersion) { + public IngestionCliVersionMatrix(Map entriesByServerVersion) { this.entriesByServerVersion = entriesByServerVersion == null ? Collections.emptyMap() @@ -31,10 +32,10 @@ public IngestionCliVersionMatrix( } /** - * Lookup the per-connector matrix entries for a given server release. Returns {@code null} if the - * server version has no entry — callers fall back to the application default. + * Lookup the per-connector entries for a given server release. Returns {@code null} if the server + * version has no entry — callers fall back to the application default. */ - public Map getEntriesForServer(String serverVersion) { + public ServerEntry getEntriesForServer(String serverVersion) { return entriesByServerVersion.get(serverVersion); } diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java index 012859a17c37..f089f9f0ddd5 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java @@ -1,6 +1,5 @@ package com.linkedin.metadata.ingestion; -import java.util.Map; import java.util.Optional; /** @@ -82,11 +81,11 @@ public Optional resolveVersion(String connectorType) { */ public Optional resolveVersionWithSource(String connectorType) { IngestionCliVersionMatrix matrix = source.getMatrix(); - Map serverEntry = matrix.getEntriesForServer(serverVersion); + ServerEntry serverEntry = matrix.getEntriesForServer(serverVersion); if (serverEntry == null) { return Optional.empty(); } - ConnectorEntry connectorEntry = serverEntry.get(connectorType); + ConnectorEntry connectorEntry = serverEntry.getConnectorEntry(connectorType); if (connectorEntry == null) { return Optional.empty(); } diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ServerEntry.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ServerEntry.java new file mode 100644 index 000000000000..4f15d1b33a2e --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ServerEntry.java @@ -0,0 +1,47 @@ +package com.linkedin.metadata.ingestion; + +import java.util.Collections; +import java.util.Map; +import java.util.Set; + +/** + * The per-connector entries for a single GMS server release in the ingestion CLI version matrix. + * + *

Maps connector type (e.g. {@code "snowflake"}, {@code "bigquery"}) to that connector's {@link + * ConnectorEntry}. Wrapping the inner map in a named POJO instead of exposing {@code Map} keeps lookup sites self-documenting: {@code serverEntry.getConnectorEntry(type)} + * vs the previous {@code serverEntry.get(type)} where the meaning of either string was opaque. + */ +public final class ServerEntry { + + /** Empty entry returned to callers that ask about an unknown server version. */ + public static final ServerEntry EMPTY = new ServerEntry(Collections.emptyMap()); + + private final Map connectorEntries; + + public ServerEntry(Map connectorEntries) { + this.connectorEntries = + connectorEntries == null + ? Collections.emptyMap() + : Collections.unmodifiableMap(connectorEntries); + } + + /** + * Lookup the matrix entry for a connector type (e.g. {@code "snowflake"}). Returns {@code null} + * when the connector has no entry under this server version — callers fall through to the next + * tier of the resolution ladder. + */ + public ConnectorEntry getConnectorEntry(String connectorType) { + return connectorEntries.get(connectorType); + } + + /** Connector types present under this server version. Useful for diagnostic logging. */ + public Set getConnectorTypes() { + return connectorEntries.keySet(); + } + + /** Number of connector entries for this server version. */ + public int size() { + return connectorEntries.size(); + } +} diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java index 6dec07e05ba0..c957ed1a05e5 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java @@ -1,7 +1,6 @@ package com.linkedin.metadata.ingestion; import static org.testng.Assert.assertEquals; -import static org.testng.Assert.assertFalse; import static org.testng.Assert.assertNotNull; import static org.testng.Assert.assertNull; import static org.testng.Assert.assertThrows; @@ -69,7 +68,7 @@ public void invalidDefaultVersionIgnoredButCohortsKept() throws Exception { + "\"cohorts\": [{\"version\": \"1.5.0.6\", \"deployments\": [\"acme\"]}]" + "}}}"); IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); - ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); + ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").getConnectorEntry("snowflake"); assertNotNull(snowflake); assertNull( snowflake.getDefaultVersion(), "invalid _default should be dropped, not stored verbatim"); @@ -87,7 +86,7 @@ public void cohortMissingVersionIsSkippedOthersKept() throws Exception { + "{\"version\": \"1.5.0.6\", \"deployments\": [\"acme\"]}" + "]}}}"); IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); - ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); + ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").getConnectorEntry("snowflake"); assertEquals(snowflake.getCohorts().size(), 1, "first cohort (no version) should be dropped"); assertEquals(snowflake.getCohorts().get(0).getVersion(), "1.5.0.6"); } @@ -101,7 +100,7 @@ public void cohortWithGarbageVersionIsSkipped() throws Exception { + "{\"version\": \"\", \"deployments\": [\"acme\"]}" + "]}}}"); IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); - ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); + ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").getConnectorEntry("snowflake"); assertTrue( snowflake.getCohorts().isEmpty(), "cohort with invalid version pattern should be dropped"); } @@ -127,7 +126,7 @@ public void permissiveVersionPatternAcceptsRealPyPiVersions() throws Exception { MAPPER.readTree("{\"1.5.0\": {\"snowflake\": {\"cohorts\": [" + cohorts + "]}}}"); IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); assertEquals( - m.getEntriesForServer("1.5.0").get("snowflake").getCohorts().size(), + m.getEntriesForServer("1.5.0").getConnectorEntry("snowflake").getCohorts().size(), realVersions.length, "all real PyPI-style versions should pass the permissive pattern"); } @@ -142,13 +141,15 @@ public void connectorValueNotObjectIsSkippedOthersKept() throws Exception { + "\"bigquery\": {\"_default\": \"1.4.0.3\"}" + "}}"); IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); - assertFalse( - m.getEntriesForServer("1.5.0").containsKey("snowflake"), + assertNull( + m.getEntriesForServer("1.5.0").getConnectorEntry("snowflake"), "malformed connector entry should be dropped"); assertNotNull( - m.getEntriesForServer("1.5.0").get("bigquery"), + m.getEntriesForServer("1.5.0").getConnectorEntry("bigquery"), "well-formed sibling connector should survive"); - assertEquals(m.getEntriesForServer("1.5.0").get("bigquery").getDefaultVersion(), "1.4.0.3"); + assertEquals( + m.getEntriesForServer("1.5.0").getConnectorEntry("bigquery").getDefaultVersion(), + "1.4.0.3"); } @Test @@ -166,7 +167,7 @@ public void wellFormedMatrixParsesUnchanged() throws Exception { + "\"bigquery\": {\"_default\": \"1.4.0.3\"}" + "}}"); IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); - ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").get("snowflake"); + ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").getConnectorEntry("snowflake"); assertEquals(snowflake.getDefaultVersion(), "1.5.0.5"); assertEquals(snowflake.getCohorts().size(), 1); Cohort cohort = snowflake.getCohorts().get(0); @@ -175,7 +176,7 @@ public void wellFormedMatrixParsesUnchanged() throws Exception { assertTrue(cohort.getDeployments().contains("acme")); assertTrue(cohort.getDeployments().contains("beta")); - ConnectorEntry bigquery = m.getEntriesForServer("1.5.0").get("bigquery"); + ConnectorEntry bigquery = m.getEntriesForServer("1.5.0").getConnectorEntry("bigquery"); assertEquals(bigquery.getDefaultVersion(), "1.4.0.3"); assertTrue(bigquery.getCohorts().isEmpty()); } From a70e4d8bafa2ecc5c54e0d918f7eadaa3f440b0d Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Thu, 28 May 2026 14:32:44 +0530 Subject: [PATCH 06/20] refactor(ingestion): nest matrix config under cliVersionMatrix with a source discriminator MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses PR review feedback that the HTTP-specific matrix config was hardcoded directly under `ingestion:` and the backend choice was implicit ("HTTP if URL set, NoOp otherwise"). Adding a second backend would have required piling more flat properties on `IngestionConfiguration` plus arguing about precedence between them. New structure in application.yaml: ingestion: cliVersionMatrix: source: "${INGESTION_VERSION_MATRIX_SOURCE:}" # "http" | "none" | unset http: url: "${INGESTION_VERSION_MATRIX_URL:}" refreshSeconds: ${INGESTION_VERSION_MATRIX_REFRESH_SECONDS:600} authToken: "${INGESTION_VERSION_MATRIX_AUTH_TOKEN:}" Adding a future backend (GMS aspect, AppConfig, …) becomes purely additive: a new nested block under `cliVersionMatrix`, a new case in the factory's source switch — no changes to HTTP's config or precedence logic. Backward-compat: when `source` is unset, the factory infers from `http.url` presence. Pre-discriminator deployments that only set the INGESTION_VERSION_MATRIX_URL env var keep working unchanged. Explicit `source: none` is a kill-switch — useful for ops to disable the feature without unsetting URL env vars. The discriminator is case-insensitive. Changes: - New POJO: HttpMatrixSourceConfiguration (url, refreshSeconds, authToken) - New POJO: CliVersionMatrixConfiguration (source + http) - IngestionConfiguration: 3 flat versionMatrix* fields -> cliVersionMatrix - application.yaml: restructured (env var names preserved) - IngestionCliVersionMatrixServiceFactory: reads nested config + applies discriminator with URL-presence inference fallback - IngestionCliVersionMatrixServiceFactoryTest: rewrote to cover all paths (unset/http/none, URL-presence inference, case-insensitivity); 7 tests up from 4 - PropertiesCollectorConfigurationTest: property paths updated for the redaction allowlist + visible-properties list. authToken still auto-redacted because the leaf name ends with "Token". End-to-end verified against a running GMS using existing env vars (no operator change required): matrix fetched + cached, all 4 resolution tiers + test-connection + scheduled trigger green; system-info reports the new nested keys with authToken redacted. Co-Authored-By: Claude Opus 4.7 --- .../PropertiesCollectorConfigurationTest.java | 7 +- .../config/CliVersionMatrixConfiguration.java | 35 ++++++++ .../config/HttpMatrixSourceConfiguration.java | 43 +++++++++ .../config/IngestionConfiguration.java | 31 +------ .../src/main/resources/application.yaml | 29 ++++--- ...gestionCliVersionMatrixServiceFactory.java | 54 ++++++++---- ...ionCliVersionMatrixServiceFactoryTest.java | 87 +++++++++++++++---- 7 files changed, 212 insertions(+), 74 deletions(-) create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/config/CliVersionMatrixConfiguration.java create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/config/HttpMatrixSourceConfiguration.java diff --git a/metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java b/metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java index 67d5fcc8f84d..30933fde5ed8 100644 --- a/metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java +++ b/metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java @@ -89,7 +89,7 @@ public PropertiesCollector propertiesCollector(Environment environment) { // (e.g. "token ghp_xxx" for a private GitHub repo, "Bearer ey..." for OIDC). Property // name intentionally ends with "Token" so PropertiesCollector's keyword-based redaction // catches it without needing a new keyword in SENSITIVE_PATTERNS. - "ingestion.versionMatrixAuthToken"); + "ingestion.cliVersionMatrix.http.authToken"); /** * Template patterns for sensitive properties that contain dynamic parts. Use [*] for numeric @@ -760,8 +760,9 @@ public PropertiesCollector propertiesCollector(Environment environment) { "ingestion.deploymentId", "ingestion.enabled", "ingestion.maxSerializedStringLength", - "ingestion.versionMatrixRefreshSeconds", - "ingestion.versionMatrixUrl", + "ingestion.cliVersionMatrix.http.refreshSeconds", + "ingestion.cliVersionMatrix.http.url", + "ingestion.cliVersionMatrix.source", "ingestionMetrics.enabled", "ingestionScheduler.consumerGroupSuffix", "ingestionScheduler.enabled", diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/CliVersionMatrixConfiguration.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/CliVersionMatrixConfiguration.java new file mode 100644 index 000000000000..958efcc714d6 --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/CliVersionMatrixConfiguration.java @@ -0,0 +1,35 @@ +package com.linkedin.metadata.config; + +import lombok.Data; + +/** + * Per-connector ingestion CLI version matrix configuration. Bound under {@code + * ingestion.cliVersionMatrix} in application.yaml. + * + *

The nested structure replaces an earlier set of flat {@code versionMatrix*} keys on the parent + * {@link IngestionConfiguration}. Keeping all matrix configuration under one block makes adding + * future backends (GMS aspect, AppConfig, etcd, …) a localized change — each new backend gets its + * own nested block keyed off {@link #source}. + */ +@Data +public class CliVersionMatrixConfiguration { + + /** + * Source backend discriminator. Recognised values: + * + *

    + *
  • {@code "http"} — fetch via HTTP using {@link #http} + *
  • {@code "none"} — explicit disable (no matrix consulted) + *
  • unset / empty — backward-compatible auto-inference: if {@code http.url} is set, treat as + * {@code "http"}; otherwise treat as {@code "none"}. Lets existing deployments that + * pre-date this discriminator continue working without an env change. + *
+ */ + private String source; + + /** + * Configuration for the HTTP-fetched matrix backend. Always present so the factory can read + * {@code http.url} for the backward-compat auto-inference path even when {@code source} is unset. + */ + private HttpMatrixSourceConfiguration http; +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/HttpMatrixSourceConfiguration.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/HttpMatrixSourceConfiguration.java new file mode 100644 index 000000000000..367c120df394 --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/HttpMatrixSourceConfiguration.java @@ -0,0 +1,43 @@ +package com.linkedin.metadata.config; + +import lombok.Data; + +/** + * HTTP-fetched ingestion CLI version matrix configuration. Bound under {@code + * ingestion.cliVersionMatrix.http} in application.yaml. + * + *

Lives in its own POJO so adding alternative matrix backends in the future (GMS aspect, + * AppConfig, etcd, …) does not require piling more flat properties under {@code ingestion:} — each + * backend gets its own nested block keyed off the {@code source} discriminator on the parent. + */ +@Data +public class HttpMatrixSourceConfiguration { + + /** + * URL to a JSON document containing the per-connector version matrix keyed by server release. + * When empty, no HTTP fetch happens (the factory binds a no-op matrix source). + */ + private String url; + + /** + * How often (in seconds) to re-fetch the matrix. Defaults to 600 (10 minutes) via {@code + * application.yaml}. + */ + private int refreshSeconds; + + /** + * Optional value sent verbatim as the {@code Authorization} HTTP header when fetching the matrix. + * Required when the URL is hosted behind authentication (e.g. a private GitHub gist). + * + *

Format examples: + * + *

    + *
  • GitHub PAT: {@code "token ghp_xxxxxxxxxxxxxxxx"} + *
  • OAuth / OIDC bearer: {@code "Bearer eyJ..."} + *
+ * + *

The property name ends with "Token" so it is auto-redacted by the system-info properties + * collector (see {@code PropertiesCollectorConfigurationTest}). + */ + private String authToken; +} diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/IngestionConfiguration.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/IngestionConfiguration.java index dae517b91eba..19ad4134a12c 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/IngestionConfiguration.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/IngestionConfiguration.java @@ -14,34 +14,11 @@ public class IngestionConfiguration { private Integer batchRefreshCount; /** - * Optional URL to a publicly accessible JSON file containing a per-connector version matrix keyed - * by server release version. When set, the server fetches and caches this matrix and uses it to - * resolve the CLI version per connector type. When empty, the existing defaultCliVersion is used - * for all connectors. + * Per-connector CLI version matrix configuration. Nested so additional matrix backends (GMS + * aspect, AppConfig, …) can be added under their own keys without piling more flat properties on + * this class. See {@link CliVersionMatrixConfiguration}. */ - private String versionMatrixUrl; - - /** - * How often (in seconds) to re-fetch the version matrix from versionMatrixUrl. Defaults to 600 - * (10 minutes). - */ - private int versionMatrixRefreshSeconds; - - /** - * Optional value sent verbatim as the {@code Authorization} HTTP header when fetching the version - * matrix. Required when the matrix URL is hosted behind authentication (e.g. a private GitHub - * repo's {@code raw.githubusercontent.com} URL). - * - *

Format is whatever the host expects: - * - *

    - *
  • GitHub PAT: {@code "token ghp_xxxxxxxxxxxxxxxx"} - *
  • OAuth / OIDC bearer: {@code "Bearer eyJ..."} - *
- * - *

When empty or unset, no {@code Authorization} header is sent (public-URL semantics). - */ - private String versionMatrixAuthToken; + private CliVersionMatrixConfiguration cliVersionMatrix; /** * Identifier for this deployment, matched against {@code deployments} entries in the version diff --git a/metadata-service/configuration/src/main/resources/application.yaml b/metadata-service/configuration/src/main/resources/application.yaml index f46ee1556f1c..2a99c1487763 100644 --- a/metadata-service/configuration/src/main/resources/application.yaml +++ b/metadata-service/configuration/src/main/resources/application.yaml @@ -104,17 +104,24 @@ ingestion: defaultCliVersion: "${UI_INGESTION_DEFAULT_CLI_VERSION:@cliVersion@}" maxSerializedStringLength: "${INGESTION_MAX_SERIALIZED_STRING_LENGTH:16000000}" # Indicates the maximum allowed JSON String length Jackson will handle, impacts the maximum size of ingested aspects batchRefreshCount: ${INGESTION_BATCH_REFRESH_COUNT:100} # The number of entities to refresh in a single batch when refreshing entities after ingestion - # Optional: URL to a publicly accessible JSON file containing a per-connector CLI version matrix - # keyed by server release. When set, overrides defaultCliVersion on a per-connector basis. - # Update this file externally (e.g. S3/CDN) without redeploying to change connector versions. - versionMatrixUrl: "${INGESTION_VERSION_MATRIX_URL:}" - versionMatrixRefreshSeconds: ${INGESTION_VERSION_MATRIX_REFRESH_SECONDS:600} - # Optional. Sent verbatim as the `Authorization` header when fetching versionMatrixUrl. Required - # when the matrix is hosted behind authentication (e.g. a private GitHub repo). Format examples: - # token ghp_xxxxxxxxxxxxxxxx (GitHub PAT) - # Bearer eyJ... (OAuth / OIDC bearer) - # Leave unset for public URLs. Property name ends with "Token" so it is auto-redacted in system-info. - versionMatrixAuthToken: "${INGESTION_VERSION_MATRIX_AUTH_TOKEN:}" + # Per-connector CLI version matrix. When configured, overrides defaultCliVersion on a per-connector + # basis. Update the matrix file externally (S3/CDN/gist) without redeploying GMS to change + # connector versions. Set source=none (or leave url empty) to disable. + cliVersionMatrix: + # Source backend: "http" or "none". When unset, inferred from http.url presence — leaves + # existing deployments that only set INGESTION_VERSION_MATRIX_URL working unchanged. + source: "${INGESTION_VERSION_MATRIX_SOURCE:}" + http: + url: "${INGESTION_VERSION_MATRIX_URL:}" + refreshSeconds: ${INGESTION_VERSION_MATRIX_REFRESH_SECONDS:600} + # Optional. Sent verbatim as the `Authorization` header when fetching the matrix URL. + # Required when the URL is hosted behind authentication (e.g. a private GitHub gist). + # Format examples: + # token ghp_xxxxxxxxxxxxxxxx (GitHub PAT) + # Bearer eyJ... (OAuth / OIDC bearer) + # Leave unset for public URLs. Property name ends with "Token" so it is auto-redacted in + # system-info. + authToken: "${INGESTION_VERSION_MATRIX_AUTH_TOKEN:}" # Identifier for this deployment, used for matching against `deployments` allowlists in the # version matrix. Sourced from DATAHUB_EXECUTOR_CUSTOMER_ID, which the Acryl Cloud Helm chart # injects from the K8s namespace. Empty in single-tenant / OSS deployments — cohort matching diff --git a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java index ad9fc982fdfd..2acc0b2b38ae 100644 --- a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java +++ b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java @@ -1,6 +1,8 @@ package com.linkedin.gms.factory.ingestion; import com.linkedin.gms.factory.config.ConfigurationProvider; +import com.linkedin.metadata.config.CliVersionMatrixConfiguration; +import com.linkedin.metadata.config.HttpMatrixSourceConfiguration; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.HttpUrlIngestionCliVersionMatrixSource; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; @@ -21,11 +23,13 @@ * *

    *
  • {@code ingestionCliVersionMatrixSource} — implements {@link - * IngestionCliVersionMatrixSource}; chosen based on configuration. Today the only "live" - * implementation is {@link HttpUrlIngestionCliVersionMatrixSource}, picked when {@code - * ingestion.versionMatrixUrl} is set. Otherwise a {@link NoOpIngestionCliVersionMatrixSource} - * is bound. Future implementations (GMS-aspect-backed, config-server-backed, …) can plug in - * here without any change to the consumer service. + * IngestionCliVersionMatrixSource}. The implementation is selected from {@code + * ingestion.cliVersionMatrix.source}: {@code "http"} → {@link + * HttpUrlIngestionCliVersionMatrixSource}, {@code "none"} → {@link + * NoOpIngestionCliVersionMatrixSource}. When the discriminator is unset, the choice is + * inferred from {@code ingestion.cliVersionMatrix.http.url} so existing deployments keep + * working without an env change. Future implementations (GMS-aspect-backed, + * config-server-backed, …) get a new discriminator value here. *
  • {@code ingestionCliVersionMatrixService} — consumes whichever {@link * IngestionCliVersionMatrixSource} is bound and applies the resolution policy (cohort → * connector default → application default). @@ -40,6 +44,9 @@ @Configuration public class IngestionCliVersionMatrixServiceFactory { + private static final String SOURCE_HTTP = "http"; + private static final String SOURCE_NONE = "none"; + @Autowired @Qualifier("configurationProvider") private ConfigurationProvider configProvider; @@ -49,23 +56,40 @@ public class IngestionCliVersionMatrixServiceFactory { private GitVersion gitVersion; /** - * Picks the storage backend for the matrix. Today this is either HTTP-fetch or no-op; the - * decision is driven by whether a URL is configured. New backends should be added here behind an - * explicit config flag rather than by replacing the existing decision. + * Picks the storage backend for the matrix. Reads {@code ingestion.cliVersionMatrix.source} as + * the primary signal; falls back to inferring from {@code http.url} presence when the + * discriminator is unset (backward-compat for deployments that pre-date this discriminator). */ @Bean(name = "ingestionCliVersionMatrixSource") @Scope("singleton") @Nonnull protected IngestionCliVersionMatrixSource ingestionCliVersionMatrixSource() { - IngestionConfiguration ingestionConfig = configProvider.getIngestion(); - String url = ingestionConfig.getVersionMatrixUrl(); - if (url == null || url.isEmpty()) { + CliVersionMatrixConfiguration matrixConfig = + configProvider.getIngestion().getCliVersionMatrix(); + if (matrixConfig == null) { return new NoOpIngestionCliVersionMatrixSource(); } - return new HttpUrlIngestionCliVersionMatrixSource( - url, - ingestionConfig.getVersionMatrixRefreshSeconds(), - ingestionConfig.getVersionMatrixAuthToken()); + + HttpMatrixSourceConfiguration httpConfig = matrixConfig.getHttp(); + boolean httpUrlPresent = + httpConfig != null && httpConfig.getUrl() != null && !httpConfig.getUrl().isEmpty(); + + if (resolveSource(matrixConfig.getSource(), httpUrlPresent).equals(SOURCE_HTTP)) { + return new HttpUrlIngestionCliVersionMatrixSource( + httpConfig.getUrl(), httpConfig.getRefreshSeconds(), httpConfig.getAuthToken()); + } + return new NoOpIngestionCliVersionMatrixSource(); + } + + /** + * Resolve the active source backend. Explicit {@code source} wins; an unset value is inferred + * from URL presence so deployments that pre-date this discriminator continue to work. + */ + private static String resolveSource(String configuredSource, boolean httpUrlPresent) { + if (configuredSource != null && !configuredSource.trim().isEmpty()) { + return configuredSource.trim().toLowerCase(); + } + return httpUrlPresent ? SOURCE_HTTP : SOURCE_NONE; } @Bean(name = "ingestionCliVersionMatrixService") diff --git a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java index 6f600f94b585..3bcfd6927f6c 100644 --- a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java +++ b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java @@ -4,6 +4,8 @@ import static org.testng.Assert.*; import com.linkedin.gms.factory.config.ConfigurationProvider; +import com.linkedin.metadata.config.CliVersionMatrixConfiguration; +import com.linkedin.metadata.config.HttpMatrixSourceConfiguration; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.HttpUrlIngestionCliVersionMatrixSource; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; @@ -16,11 +18,10 @@ import org.testng.annotations.Test; /** - * Direct unit tests for {@link IngestionCliVersionMatrixServiceFactory}. Exercises the branch that - * picks {@link NoOpIngestionCliVersionMatrixSource} vs {@link - * HttpUrlIngestionCliVersionMatrixSource} based on whether {@code versionMatrixUrl} is configured — - * the rest of the codebase only ever exercises the no-op path (test contexts don't set the env - * var). + * Direct unit tests for {@link IngestionCliVersionMatrixServiceFactory}. Covers the {@code + * ingestion.cliVersionMatrix.source} discriminator (explicit {@code "http"} / {@code "none"}) and + * the backward-compat auto-inference path (unset discriminator infers from {@code http.url} + * presence so deployments that pre-date this discriminator keep working). */ public class IngestionCliVersionMatrixServiceFactoryTest { @@ -34,6 +35,8 @@ public void setUp() { factory = new IngestionCliVersionMatrixServiceFactory(); configProvider = mock(ConfigurationProvider.class); ingestionConfig = new IngestionConfiguration(); + ingestionConfig.setCliVersionMatrix(new CliVersionMatrixConfiguration()); + ingestionConfig.getCliVersionMatrix().setHttp(new HttpMatrixSourceConfiguration()); gitVersion = mock(GitVersion.class); when(configProvider.getIngestion()).thenReturn(ingestionConfig); @@ -46,45 +49,93 @@ public void setUp() { setField(factory, "gitVersion", gitVersion); } + // --------------------------------------------------------------------------- + // Backward-compat auto-inference (source unset) + // --------------------------------------------------------------------------- + @Test - public void testMatrixSource_whenUrlIsNull_wiresNoOp() { - ingestionConfig.setVersionMatrixUrl(null); + public void testMatrixSource_whenSourceUnsetAndUrlNull_wiresNoOp() { + // Default state from setUp: source unset, http.url null. Should infer "none". + IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); + + assertTrue( + source instanceof NoOpIngestionCliVersionMatrixSource, + "Unset source + unset url should infer 'none' (OSS-safe default)"); + } + + @Test + public void testMatrixSource_whenSourceUnsetAndUrlEmpty_wiresNoOp() { + ingestionConfig.getCliVersionMatrix().getHttp().setUrl(""); IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); assertTrue( source instanceof NoOpIngestionCliVersionMatrixSource, - "Unset versionMatrixUrl should wire NoOpIngestionCliVersionMatrixSource (OSS-safe default)"); + "Unset source + empty url should infer 'none'"); + } + + @Test + public void testMatrixSource_whenSourceUnsetButUrlPresent_infersHttp() { + // Backward-compat: pre-discriminator deployments only set the URL. Should still wire HTTP. + ingestionConfig.getCliVersionMatrix().getHttp().setUrl("file:///tmp/nonexistent.json"); + ingestionConfig.getCliVersionMatrix().getHttp().setRefreshSeconds(3600); + + IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); + + assertTrue( + source instanceof HttpUrlIngestionCliVersionMatrixSource, + "URL-only configuration must keep wiring HTTP for pre-discriminator deployments"); } + // --------------------------------------------------------------------------- + // Explicit discriminator + // --------------------------------------------------------------------------- + @Test - public void testMatrixSource_whenUrlIsEmpty_wiresNoOp() { - ingestionConfig.setVersionMatrixUrl(""); + public void testMatrixSource_explicitHttp_wiresHttpUrlSource() { + ingestionConfig.getCliVersionMatrix().setSource("http"); + ingestionConfig.getCliVersionMatrix().getHttp().setUrl("file:///tmp/nonexistent.json"); + ingestionConfig.getCliVersionMatrix().getHttp().setRefreshSeconds(3600); + + IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); + + assertTrue( + source instanceof HttpUrlIngestionCliVersionMatrixSource, + "Explicit source='http' with url present should wire HttpUrlIngestionCliVersionMatrixSource"); + } + + @Test + public void testMatrixSource_explicitNone_overridesUrlPresence() { + // Explicit "none" must win even when a URL is configured. + ingestionConfig.getCliVersionMatrix().setSource("none"); + ingestionConfig.getCliVersionMatrix().getHttp().setUrl("file:///tmp/nonexistent.json"); IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); assertTrue( source instanceof NoOpIngestionCliVersionMatrixSource, - "Empty-string versionMatrixUrl should be treated like unset → NoOpIngestionCliVersionMatrixSource"); + "Explicit source='none' must override URL presence (kill-switch semantics)"); } @Test - public void testMatrixSource_whenUrlIsSet_wiresHttpUrlSource() { - // file:// URI is fine — the factory only inspects the string, not whether it's reachable. - ingestionConfig.setVersionMatrixUrl("file:///tmp/nonexistent-matrix.json"); - ingestionConfig.setVersionMatrixRefreshSeconds(3600); - ingestionConfig.setVersionMatrixAuthToken(null); + public void testMatrixSource_sourceIsCaseInsensitive() { + ingestionConfig.getCliVersionMatrix().setSource("HTTP"); + ingestionConfig.getCliVersionMatrix().getHttp().setUrl("file:///tmp/nonexistent.json"); + ingestionConfig.getCliVersionMatrix().getHttp().setRefreshSeconds(3600); IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); assertTrue( source instanceof HttpUrlIngestionCliVersionMatrixSource, - "Configured versionMatrixUrl should wire HttpUrlIngestionCliVersionMatrixSource"); + "Source discriminator should be case-insensitive (operators may set HTTP or http)"); } + // --------------------------------------------------------------------------- + // Service construction + // --------------------------------------------------------------------------- + @Test public void testGetInstance_buildsServiceWithServerVersionFromGitVersion() { - ingestionConfig.setVersionMatrixUrl(null); ingestionConfig.setDeploymentId("test-deployment"); when(gitVersion.toConfig()).thenReturn(Map.of("version", "1.5.0")); From 5aa149a2d795aadd079e8d4dbd27a13e16598d0d Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Thu, 28 May 2026 14:47:51 +0530 Subject: [PATCH 07/20] refactor(ingestion): shut down matrix refresh scheduler on Spring teardown MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses PR review feedback that HttpUrlIngestionCliVersionMatrixSource created a ScheduledExecutorService in its constructor without storing the handle, leaving no way to stop the background refresh thread on context teardown. The default thread factory creates non-daemon threads, so a leaked executor keeps the JVM from exiting cleanly. In steady-state production this is invisible (one executor for the JVM lifetime), but it bites in: - Spring DevTools dev-loop: every restart leaks one refresh thread - Integration tests with multiple Spring contexts: thread accumulation, confusing log noise from dead-context refreshes - K8s pod shutdown: non-daemon thread blocks clean JVM exit, forcing SIGKILL after terminationGracePeriodSeconds rather than clean drain Fix: - Promote `executor` to a field so the bean can manage its lifecycle - @PreDestroy public void shutdown() — graceful drain pattern matching the project convention (see KafkaTraceReaderFactory.shutdown()): executor.shutdown() -> awaitTermination(5s) -> executor.shutdownNow() on timeout or interruption -> propagate Thread.currentThread().interrupt() on InterruptedException - Uses jakarta.annotation.PreDestroy (matches existing usage in SystemInfoService, KafkaTraceReaderFactory, etc.) Test added: testShutdown_stopsBackgroundRefresh — spins up an in-process HTTP server, configures a 1-second refresh, waits for the first fetch, calls shutdown(), sleeps 2.5s, asserts the request counter did not move. Proves the scheduled executor stopped firing. Co-Authored-By: Claude Opus 4.7 --- ...ttpUrlIngestionCliVersionMatrixSource.java | 34 ++++++++++++++- .../IngestionCliVersionMatrixServiceTest.java | 43 +++++++++++++++++++ 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java index 79216f679ef5..9b30d0c7736a 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java @@ -2,6 +2,7 @@ import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.ObjectMapper; +import jakarta.annotation.PreDestroy; import java.io.InputStream; import java.net.URL; import java.net.URLConnection; @@ -71,12 +72,21 @@ public class HttpUrlIngestionCliVersionMatrixSource implements IngestionCliVersi */ private static final Pattern VALID_VERSION_PATTERN = Pattern.compile("^[\\w.+!-]+$"); + /** Seconds to wait for the refresh thread to drain on graceful shutdown. */ + private static final int SHUTDOWN_TIMEOUT_SECONDS = 5; + private final String url; @Nullable private final String authHeader; private final AtomicReference cached; private final AtomicLong lastFetchedAtMillis; private final ObjectMapper objectMapper; + /** + * The background refresh scheduler. Held as a field so {@link #shutdown()} can stop it on Spring + * context teardown — without this, the thread would leak across context restarts in dev / tests. + */ + private final ScheduledExecutorService executor; + /** Convenience constructor for unauthenticated (public) URLs. */ public HttpUrlIngestionCliVersionMatrixSource(String url, int refreshIntervalSeconds) { this(url, refreshIntervalSeconds, null); @@ -90,9 +100,29 @@ public HttpUrlIngestionCliVersionMatrixSource( this.lastFetchedAtMillis = new AtomicLong(0L); this.objectMapper = new ObjectMapper(); - ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); + this.executor = Executors.newSingleThreadScheduledExecutor(); // Fetch immediately on startup (delay=0), then repeat on the configured interval. - executor.scheduleAtFixedRate(this::refresh, 0, refreshIntervalSeconds, TimeUnit.SECONDS); + this.executor.scheduleAtFixedRate(this::refresh, 0, refreshIntervalSeconds, TimeUnit.SECONDS); + } + + /** + * Gracefully stop the background refresh on Spring context teardown. Without this hook the + * scheduled-executor thread keeps the JVM alive and leaks across context restarts (matters in dev + * and in test contexts that re-create the bean). + */ + @PreDestroy + public void shutdown() { + executor.shutdown(); + try { + if (!executor.awaitTermination(SHUTDOWN_TIMEOUT_SECONDS, TimeUnit.SECONDS)) { + // Refresh in progress took longer than the timeout — interrupt it. If a fetch was mid-IO + // the connection will close, and parseMatrix is purely in-memory so it can't dangle. + executor.shutdownNow(); + } + } catch (InterruptedException e) { + executor.shutdownNow(); + Thread.currentThread().interrupt(); + } } @Override diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java index 0966cfe49863..de8b8c7cfdc0 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java @@ -497,4 +497,47 @@ public void testConnectorWithoutDefault_andNoCohortMatch_returnsEmpty() throws E } assertEquals(svc.resolveVersion("snowflake"), Optional.empty()); } + + // ------------------------------------------------------------------------- + // Shutdown lifecycle — Spring @PreDestroy must terminate the refresh thread. + // Without this, a redeployment / context restart would leak the + // single-thread scheduled executor and keep the JVM from exiting cleanly. + // ------------------------------------------------------------------------- + + @Test + public void testShutdown_stopsBackgroundRefresh() throws Exception { + AtomicInteger callCount = new AtomicInteger(); + HttpServer server = HttpServer.create(new InetSocketAddress("127.0.0.1", 0), 0); + server.createContext( + "/matrix", + exchange -> { + callCount.incrementAndGet(); + byte[] body = MATRIX_JSON.getBytes(); + exchange.sendResponseHeaders(200, body.length); + try (OutputStream os = exchange.getResponseBody()) { + os.write(body); + } + }); + server.start(); + try { + String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; + // 1-second refresh so we can see the cadence without slowing the test suite. + HttpUrlIngestionCliVersionMatrixSource source = + new HttpUrlIngestionCliVersionMatrixSource(url, 1); + waitForFirstFetch(source); + + source.shutdown(); + int callsAtShutdown = callCount.get(); + + // Wait well past two refresh intervals; callCount must not move because shutdown stopped + // the scheduled executor. + Thread.sleep(2500); + assertEquals( + callCount.get(), + callsAtShutdown, + "Refresh thread must stop firing after shutdown() — no new HTTP calls expected"); + } finally { + server.stop(0); + } + } } From def1f4d37b33d96a3236f3698be65e2d8321e8d5 Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Thu, 28 May 2026 14:59:51 +0530 Subject: [PATCH 08/20] refactor(ingestion): name the matrix refresh thread and make it a daemon MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses PR review feedback that the matrix refresh thread was carrying the default Executors.defaultThreadFactory() name `pool-N-thread-1`, which tells an operator triaging a hung pod nothing about what the thread does. Plumbing matches the project convention in UpdateIndicesUpgradeStrategy: construct the executor with a ThreadFactory that names the thread and flips the daemon bit: Executors.newSingleThreadScheduledExecutor(r -> { Thread t = new Thread(r, "ingestion-cli-version-matrix-refresh"); t.setDaemon(true); return t; }); Daemon flag is belt-and-suspenders with the @PreDestroy shutdown added in the previous commit — if the hook somehow never fires (forced kill, container-runtime quirks), the JVM can still exit cleanly because the refresh thread is no longer counted toward the "block JVM exit" set. @PreDestroy remains the primary shutdown path. Test added: testRefreshThreadIsNamedAndDaemon — after first fetch, scans Thread.getAllStackTraces() for the named thread, asserts it exists and isDaemon() is true. Cleans up via shutdown() so the thread doesn't linger into the next test. Co-Authored-By: Claude Opus 4.7 --- ...ttpUrlIngestionCliVersionMatrixSource.java | 17 +++++++- .../IngestionCliVersionMatrixServiceTest.java | 42 +++++++++++++++++++ 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java index 9b30d0c7736a..fef69dd8fe83 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java @@ -75,6 +75,13 @@ public class HttpUrlIngestionCliVersionMatrixSource implements IngestionCliVersi /** Seconds to wait for the refresh thread to drain on graceful shutdown. */ private static final int SHUTDOWN_TIMEOUT_SECONDS = 5; + /** + * Thread name for the background refresh worker. Named so it stands out in thread dumps (the + * default {@code Executors.defaultThreadFactory()} would produce {@code pool-N-thread-1}, which + * gives an operator triaging a hung pod no idea what the thread does). + */ + private static final String REFRESH_THREAD_NAME = "ingestion-cli-version-matrix-refresh"; + private final String url; @Nullable private final String authHeader; private final AtomicReference cached; @@ -100,7 +107,15 @@ public HttpUrlIngestionCliVersionMatrixSource( this.lastFetchedAtMillis = new AtomicLong(0L); this.objectMapper = new ObjectMapper(); - this.executor = Executors.newSingleThreadScheduledExecutor(); + this.executor = + Executors.newSingleThreadScheduledExecutor( + r -> { + Thread t = new Thread(r, REFRESH_THREAD_NAME); + // Daemon so the JVM can still exit cleanly if @PreDestroy somehow doesn't fire + // (kill -9, container-runtime quirks). PreDestroy remains the primary shutdown path. + t.setDaemon(true); + return t; + }); // Fetch immediately on startup (delay=0), then repeat on the configured interval. this.executor.scheduleAtFixedRate(this::refresh, 0, refreshIntervalSeconds, TimeUnit.SECONDS); } diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java index de8b8c7cfdc0..d149b196c847 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java @@ -498,6 +498,48 @@ public void testConnectorWithoutDefault_andNoCohortMatch_returnsEmpty() throws E assertEquals(svc.resolveVersion("snowflake"), Optional.empty()); } + // ------------------------------------------------------------------------- + // Thread-factory plumbing — name + daemon flag. Generic `pool-N-thread-1` + // names are useless in production thread dumps; daemon is belt-and-suspenders + // with the @PreDestroy shutdown hook in case the hook never fires. + // ------------------------------------------------------------------------- + + @Test + public void testRefreshThreadIsNamedAndDaemon() throws Exception { + // file:// URI is fine for this test — we just need the source to spin up its scheduler; + // the fetch itself is not what we're inspecting. + Path tmp = Files.createTempFile("thread-name", ".json"); + Files.write(tmp, MATRIX_JSON.getBytes()); + tmp.toFile().deleteOnExit(); + + HttpUrlIngestionCliVersionMatrixSource source = + new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + try { + waitForFirstFetch(source); + + // Scan live threads for ours. Using Thread.getAllStackTraces() means we don't need to reach + // into the executor's internals — we just verify the thread exists with the expected name + // and is daemon, which is what an operator reading a thread dump would care about. + Thread refreshThread = + Thread.getAllStackTraces().keySet().stream() + .filter(t -> "ingestion-cli-version-matrix-refresh".equals(t.getName())) + .findFirst() + .orElse(null); + + assertNotNull( + refreshThread, + "Refresh thread should carry a descriptive name (not the default pool-N-thread-1) so it " + + "is identifiable in thread dumps"); + assertTrue( + refreshThread.isDaemon(), + "Refresh thread should be a daemon thread so the JVM can exit cleanly even if " + + "@PreDestroy is never called (e.g. forced kill, container-runtime edge cases)"); + } finally { + // Clean shutdown so the daemon thread doesn't linger into the next test. + source.shutdown(); + } + } + // ------------------------------------------------------------------------- // Shutdown lifecycle — Spring @PreDestroy must terminate the refresh thread. // Without this, a redeployment / context restart would leak the From 81f7ce745cda326f1875271b3f31e3b568bd3fb4 Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Thu, 28 May 2026 15:46:31 +0530 Subject: [PATCH 09/20] refactor(ingestion): structured CLI version resolution logging at call sites Addresses PR review feedback that CliVersionResolutionHelper should stay logging-free, with structured debug/warn logs added at the resolver layer where the resolution is actually consumed. Three OSS call sites now emit identically-shaped log lines: Resolved ingestion CLI version (manual trigger): tier=... version=... Resolved ingestion CLI version (test-connection): tier=... version=... Resolved ingestion CLI version (scheduled trigger): tier=... version=... Each line carries tier (cliVersionAudit.source), version, serverVersion, connector type, and an identifier (ingestion source URN or execution request URN). DEBUG is the normal level; a WARN fires only when the resolved version is empty -- meaning every tier including defaultCliVersion fell through and the executor will silently use its bundled CLI. Implementation: rather than duplicate the helper across three callers, introduced CliVersionResolutionLogger in metadata-service/configuration (same module as CliVersionResolutionHelper). Callers pass their own @Slf4j-generated logger so log entries appear under the calling class, preserving operators' existing class-name-based log filters. Constants expose the trigger labels (TRIGGER_MANUAL, TRIGGER_TEST_CONNECTION, TRIGGER_SCHEDULED) and identifier keys (IDENTIFIER_INGESTION_SOURCE, IDENTIFIER_EXECUTION_REQUEST) so the call sites read declaratively. Changes: - New: CliVersionResolutionLogger - CreateIngestionExecutionRequestResolver: @Slf4j + one log call - CreateTestConnectionRequestResolver: extracted connectorType local + one log call (avoids calling extractSourceType twice) - IngestionScheduler: one log call End-to-end verified against a running GMS: all three trigger labels appear in the GMS log under their respective resolver classes with the correct tier/version/connector/identifier values for every scenario from the demo scripts (4-tier ladder, test-connection, scheduled cron). Co-Authored-By: Claude Opus 4.7 --- ...eateIngestionExecutionRequestResolver.java | 10 +++ .../CreateTestConnectionRequestResolver.java | 11 ++- .../ingestion/IngestionScheduler.java | 8 ++ .../ingestion/CliVersionResolutionLogger.java | 86 +++++++++++++++++++ 4 files changed, 114 insertions(+), 1 deletion(-) create mode 100644 metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionLogger.java diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java index 23e8d46d75a8..83936bde262e 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java @@ -24,6 +24,7 @@ import com.linkedin.ingestion.DataHubIngestionSourceInfo; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; +import com.linkedin.metadata.ingestion.CliVersionResolutionLogger; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.utils.EntityKeyUtils; @@ -35,10 +36,12 @@ import java.util.Map; import java.util.UUID; import java.util.concurrent.CompletableFuture; +import lombok.extern.slf4j.Slf4j; import org.json.JSONException; import org.json.JSONObject; /** Creates an on-demand ingestion execution request. */ +@Slf4j public class CreateIngestionExecutionRequestResolver implements DataFetcher> { @@ -158,6 +161,13 @@ public CompletableFuture get(final DataFetchingEnvironment environment) : null); arguments.put(VERSION_ARG_NAME, resolution.getVersion()); execInput.setCliVersionAudit(resolution.getStamp()); + CliVersionResolutionLogger.log( + log, + CliVersionResolutionLogger.TRIGGER_MANUAL, + resolution, + ingestionSourceInfo.getType(), + CliVersionResolutionLogger.IDENTIFIER_INGESTION_SOURCE, + ingestionSourceUrn.toString()); String debugMode = "false"; if (ingestionSourceInfo.getConfig().hasDebugMode()) { debugMode = ingestionSourceInfo.getConfig().isDebugMode() ? "true" : "false"; diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java index 48217bae11a7..8111207cb946 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java @@ -17,6 +17,7 @@ import com.linkedin.execution.ExecutionRequestSource; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; +import com.linkedin.metadata.ingestion.CliVersionResolutionLogger; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.utils.EntityKeyUtils; @@ -124,10 +125,11 @@ public CompletableFuture get(final DataFetchingEnvironment environment) // input.getVersion() may be null, empty, or whitespace-only (UI forms can submit any // of these); the helper normalizes all three to "unset" and falls through to the // matrix / application default. See #17471 for the whitespace-only edge case. + final String connectorType = extractSourceType(input.getRecipe()); final CliVersionResolutionHelper.Result resolution = CliVersionResolutionHelper.resolve( input.getVersion(), - extractSourceType(input.getRecipe()), + connectorType, _versionMatrixService, _ingestionConfiguration.getDefaultCliVersion(), _versionMatrixService != null @@ -138,6 +140,13 @@ public CompletableFuture get(final DataFetchingEnvironment environment) } execInput.setArgs(new StringMap(arguments)); execInput.setCliVersionAudit(resolution.getStamp()); + CliVersionResolutionLogger.log( + log, + CliVersionResolutionLogger.TRIGGER_TEST_CONNECTION, + resolution, + connectorType, + CliVersionResolutionLogger.IDENTIFIER_EXECUTION_REQUEST, + executionRequestUrn.toString()); final MetadataChangeProposal proposal = buildMetadataChangeProposalWithKey( diff --git a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java index b3526c76be73..6c3ac80b1f48 100644 --- a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java +++ b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java @@ -18,6 +18,7 @@ import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; +import com.linkedin.metadata.ingestion.CliVersionResolutionLogger; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.query.ListResult; @@ -432,6 +433,13 @@ public void run() { versionMatrixService != null ? versionMatrixService.getServerVersion() : null); arguments.put(VERSION_ARGUMENT_NAME, resolution.getVersion()); input.setCliVersionAudit(resolution.getStamp()); + CliVersionResolutionLogger.log( + log, + CliVersionResolutionLogger.TRIGGER_SCHEDULED, + resolution, + ingestionSourceInfo.getType(), + CliVersionResolutionLogger.IDENTIFIER_INGESTION_SOURCE, + ingestionSourceUrn.toString()); String debugMode = "false"; if (ingestionSourceInfo.getConfig().hasDebugMode()) { debugMode = ingestionSourceInfo.getConfig().isDebugMode() ? "true" : "false"; diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionLogger.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionLogger.java new file mode 100644 index 000000000000..9504056a10d6 --- /dev/null +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionLogger.java @@ -0,0 +1,86 @@ +package com.linkedin.metadata.ingestion; + +import javax.annotation.Nullable; +import org.slf4j.Logger; + +/** + * Shared structured logging for CLI version resolution at the three call sites ({@code + * CreateIngestionExecutionRequestResolver}, {@code CreateTestConnectionRequestResolver}, {@code + * IngestionScheduler}). Lives separately from {@link CliVersionResolutionHelper} on purpose — the + * helper stays logging-free per review guidance, and emitting the line at the resolver layer keeps + * each log entry tagged with the caller's class. + * + *

    Why structured logging matters here: the resolution ladder spans four tiers and three + * different trigger paths. Without a parallel-shaped log line at each call site, "which CLI did + * this run get and why?" requires log archaeology. With it, {@code grep "Resolved ingestion CLI + * version"} across all three resolvers' logs shows every resolution in the same shape. + */ +public final class CliVersionResolutionLogger { + + /** Trigger label for the on-demand GraphQL execution request path. */ + public static final String TRIGGER_MANUAL = "manual trigger"; + + /** Trigger label for the test-connection GraphQL path. */ + public static final String TRIGGER_TEST_CONNECTION = "test-connection"; + + /** Trigger label for the cron-fired scheduled execution path. */ + public static final String TRIGGER_SCHEDULED = "scheduled trigger"; + + /** Identifier key for a persistent ingestion source URN. */ + public static final String IDENTIFIER_INGESTION_SOURCE = "ingestionSource"; + + /** Identifier key for an execution-request URN (used when no ingestion source exists). */ + public static final String IDENTIFIER_EXECUTION_REQUEST = "executionRequest"; + + private CliVersionResolutionLogger() {} + + /** + * Emit a DEBUG line capturing which tier of the resolution ladder produced the version, and a + * WARN line if the resolved version is empty — empty means every tier (including {@code + * defaultCliVersion}) fell through and the executor will silently use its bundled CLI. + * + * @param log SLF4J logger to use. Pass the caller's {@code @Slf4j}-generated {@code log} so log + * lines appear under the caller's class. + * @param trigger short label distinguishing the call site (see {@link #TRIGGER_MANUAL}, etc.) + * @param resolution result from {@link CliVersionResolutionHelper#resolve(String, String, + * IngestionCliVersionMatrixService, String, String)} + * @param connectorType source type from the recipe, may be {@code null} when the recipe lacks a + * parseable {@code source.type} + * @param identifierKey self-describing key for {@code identifierValue} — typically {@link + * #IDENTIFIER_INGESTION_SOURCE} or {@link #IDENTIFIER_EXECUTION_REQUEST} + * @param identifierValue the URN of the ingestion source or execution request being logged + */ + public static void log( + final Logger log, + final String trigger, + final CliVersionResolutionHelper.Result resolution, + @Nullable final String connectorType, + final String identifierKey, + final String identifierValue) { + final String serverVersion = + resolution.getStamp().hasServerVersion() + ? resolution.getStamp().getServerVersion() + : ""; + final String connector = connectorType != null ? connectorType : ""; + + log.debug( + "Resolved ingestion CLI version ({}): tier={} version={} serverVersion={} connector={} {}={}", + trigger, + resolution.getStamp().getSource(), + resolution.getVersion(), + serverVersion, + connector, + identifierKey, + identifierValue); + + if (resolution.getVersion() == null || resolution.getVersion().isEmpty()) { + log.warn( + "Resolved CLI version is empty for {} {}={} (connector={}); the executor will fall back " + + "to its bundled CLI. Set ingestion.defaultCliVersion or configure the matrix.", + trigger, + identifierKey, + identifierValue, + connector); + } + } +} From 4fffad598fba3e3ded7fd57c3d1c24ccf0603a7c Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Thu, 28 May 2026 15:57:17 +0530 Subject: [PATCH 10/20] docs(ingestion): inline PR refs + document version regex shape Addresses two review suggestions on PR #17436: (#9) Linear / PR ID references in code go stale and force readers off-site to recover the context. Replaced six `#17471` references across the matrix helper, two GraphQL resolvers, the scheduler, and two test files with self-contained explanations of WHY each whitespace/empty normalization exists -- namely that bootstrap YAML rendering produces `version: " "` (3 spaces) when an ingestion source has no version pin, and forwarding that blank to the executor would silently use its bundled CLI instead of defaultCliVersion. (#10) Expanded the version-regex doc comment in HttpUrlIngestionCliVersionMatrixSource to state explicitly that the pattern is NOT a PEP 440 validator and to list five concrete accepted shapes (standard release, rc, post, dev with epoch, local version identifier) covering everything DataHub publishes to PyPI today. Operators who fat-finger the matrix file get a clean WARN at fetch time rather than a cryptic pip error minutes later; pip remains the source of truth for whether a version actually exists. All doc-comment changes, no logic touched. Tests confirm no behavior change: metadata-service:configuration:test, the two resolver test suites, and ingestion-scheduler:test all green. Co-Authored-By: Claude Opus 4.7 --- ...eateIngestionExecutionRequestResolver.java | 10 ++++---- .../CreateTestConnectionRequestResolver.java | 6 +++-- ...eateTestConnectionRequestResolverTest.java | 6 +++-- .../ingestion/IngestionScheduler.java | 9 ++++--- .../ingestion/CliVersionResolutionHelper.java | 8 ++++--- ...ttpUrlIngestionCliVersionMatrixSource.java | 24 +++++++++++++++---- .../CliVersionResolutionHelperTest.java | 15 ++++++------ 7 files changed, 53 insertions(+), 25 deletions(-) diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java index 83936bde262e..827f6c3aeff4 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java @@ -142,10 +142,12 @@ public CompletableFuture get(final DataFetchingEnvironment environment) recipe = injectRunId(recipe, executionRequestUrn.toString()); recipe = IngestionUtils.injectPipelineName(recipe, ingestionSourceUrn.toString()); arguments.put(RECIPE_ARG_NAME, recipe); - // Per-source version may be null, empty, or whitespace-only (bootstrap YAML - // templating can render any of these); the helper normalizes all three to "unset" - // and falls through to the matrix / application default. See #17471 for the - // whitespace-only edge case. + // Per-source version may be null, empty, or whitespace-only — bootstrap YAML + // templating renders `version: "{{ config.version }}"` as 3 spaces when the source + // has no pin, and Mustache treats missing keys as empty rather than failing. The + // helper normalizes all three to "unset" so resolution falls through to the matrix + // / application default; without that, the blank would forward to the executor and + // silently pin to its bundled CLI. final String explicitVersion = ingestionSourceInfo.getConfig().hasVersion() ? ingestionSourceInfo.getConfig().getVersion() diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java index 8111207cb946..798c9b5140d0 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java @@ -123,8 +123,10 @@ public CompletableFuture get(final DataFetchingEnvironment environment) IngestionUtils.injectPipelineName( input.getRecipe(), executionRequestUrn.toString())); // input.getVersion() may be null, empty, or whitespace-only (UI forms can submit any - // of these); the helper normalizes all three to "unset" and falls through to the - // matrix / application default. See #17471 for the whitespace-only edge case. + // of these — an unfilled "version" field commonly renders as a 3-space string). The + // helper normalizes all three to "unset" so resolution falls through to the matrix / + // application default; without that normalization the blank would forward verbatim to + // the executor and silently pin to its bundled CLI. final String connectorType = extractSourceType(input.getRecipe()); final CliVersionResolutionHelper.Result resolution = CliVersionResolutionHelper.resolve( diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java index 5860d7ea2e45..fd1936a1753d 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java @@ -97,8 +97,10 @@ public void testFallsBackToDefaultCliVersionWhenNoVersionAndNoMatrix() throws Ex @Test public void testEmptyVersionFallsBackToDefault() throws Exception { - // Bootstrap YAML templating can render input.version as an empty string; the helper normalizes - // that to "unset" so we still fall through to defaultCliVersion. See #17471. + // Bootstrap YAML templating can render input.version as an empty string (or whitespace-only) + // when the source has no version pin; the helper normalizes that to "unset" so we still fall + // through to defaultCliVersion. Without this, the blank would forward to the executor and + // silently pin to its bundled CLI. EntityClient mockClient = Mockito.mock(EntityClient.class); IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); diff --git a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java index 6c3ac80b1f48..0acbd6d0d3e8 100644 --- a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java +++ b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java @@ -417,9 +417,12 @@ public void run() { IngestionUtils.injectPipelineName( ingestionSourceInfo.getConfig().getRecipe(), ingestionSourceUrn.toString()); arguments.put(RECIPE_ARGUMENT_NAME, recipe); - // Per-source version may be null, empty, or whitespace-only (bootstrap YAML templating - // can render any of these); the helper normalizes all three to "unset" and falls through - // to the matrix / application default. See #17471 for the whitespace-only edge case. + // Per-source version may be null, empty, or whitespace-only — bootstrap YAML templating + // renders `version: "{{ config.version }}"` as 3 spaces when the source has no pin, and + // Mustache treats missing keys as empty rather than failing. The helper normalizes all + // three to "unset" so resolution falls through to the matrix / application default; + // without that, the blank would forward to the executor and silently pin to its bundled + // CLI. final String explicitVersion = ingestionSourceInfo.getConfig().hasVersion() ? ingestionSourceInfo.getConfig().getVersion() diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java index 5fb3b41edc2b..34a66fcf4121 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java @@ -61,9 +61,11 @@ public static Result resolve( @Nullable String serverVersion) { // Normalize the per-source version: bootstrap YAML templating can render null, empty, or - // whitespace-only strings, and all three should mean "unset" so we fall through to the - // matrix / application default. Matches the contract of - // IngestionUtils.resolveIngestionCliVersion(...) introduced in #17471. + // whitespace-only strings (a source created with no version pin renders as + // `version: " "` after Mustache substitution), and all three must mean "unset" so we fall + // through to the matrix / application default. Without this trim, a blank template would + // forward verbatim to the executor and silently pin to whatever CLI it ships with — + // defeating both defaultCliVersion and the matrix. final String normalizedExplicit = explicitVersion != null && !explicitVersion.trim().isEmpty() ? explicitVersion.trim() diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java index fef69dd8fe83..e1a4eed4138d 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java @@ -64,11 +64,27 @@ public class HttpUrlIngestionCliVersionMatrixSource implements IngestionCliVersi private static final int FETCH_TIMEOUT_MS = 10_000; /** - * Basic version-string shape — alphanumeric, underscore, dot, plus, exclamation, hyphen. Catches - * obvious typos (whitespace, embedded JSON, HTML, etc.) without trying to validate PEP 440 fully. - * pip will catch any version that doesn't actually exist on PyPI; this check exists so operators - * fat-fingering the matrix file get a clean WARN at fetch time rather than a cryptic pip error + * Permissive cleanliness check for version strings — NOT a PEP 440 validator. The regex accepts + * any non-empty string composed of alphanumerics, underscore, dot, plus, exclamation, and hyphen. + * We deliberately do not parse semantic version structure (release segments, pre / post / dev + * tags, local version identifier) because pip is the source of truth: any string that passes here + * but isn't a real PyPI release surfaces a clear "No matching distribution" error from pip at + * install time. This check exists so operator fat-fingering of the matrix file (pasting HTML, + * whitespace, JSON) produces a structured WARN at fetch time rather than a cryptic pip error * minutes later on every execution. + * + *

    Accepts every shape DataHub publishes to PyPI today: + * + *

      + *
    • {@code 1.5.0.5}, {@code 1.4.0.3} — standard release + *
    • {@code 1.5.0.6rc1} — release candidate + *
    • {@code 1.5.0.13.post1} — post-release + *
    • {@code 1!0.0.0.dev0} — PEP 440 epoch + dev tag + *
    • {@code acryl-1.6.0+acryl.20251031} — internal build with local version identifier + *
    + * + *

    Rejects: whitespace, embedded JSON like {@code {"version":"…"}}, HTML fragments, URLs, + * anything containing characters outside the allowed set. */ private static final Pattern VALID_VERSION_PATTERN = Pattern.compile("^[\\w.+!-]+$"); diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java index 988b726ff3e8..8343f57e94b4 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java @@ -13,10 +13,10 @@ * *

    Covers the precedence ladder (source config override > matrix cohort > matrix connector * default > application default) and the per-source normalization contract (null, empty, and - * whitespace-only strings all fall through to the next tier). The whitespace case matches the - * contract of {@code IngestionUtils.resolveIngestionCliVersion(...)} from #17471 — bootstrap YAML - * templating can render any of the three, and forwarding them to the executor silently picks the - * bundled CLI rather than the configured default. + * whitespace-only strings all fall through to the next tier). The whitespace case matters because + * bootstrap YAML templating renders {@code version: "{{ config.version }}"} as three spaces when + * the source has no version pin — forwarding that verbatim to the executor would silently pin to + * the bundled CLI rather than the configured default. */ public class CliVersionResolutionHelperTest { @@ -64,9 +64,10 @@ public void testPerSourceEmptyFallsThroughToDefault() { @Test public void testPerSourceWhitespaceOnlyFallsThroughToDefault() { - // Documents the contract from #17471: a bootstrap YAML field that renders as a blank string - // must be treated as "unset" so we hit the application default rather than passing the blank - // through to the executor. + // A bootstrap YAML field rendered through Mustache as `version: " "` (3 spaces, what we get + // when the source has no version pin) must be treated as "unset" — otherwise we'd forward the + // blank string to the executor, which would silently use its bundled CLI rather than the + // configured application default. CliVersionResolutionHelper.Result result = CliVersionResolutionHelper.resolve(" ", null, null, DEFAULT_CLI, SERVER_VERSION); From 5467528eec771801f8b84213e7743c88aabe99fb Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Fri, 29 May 2026 11:41:03 +0530 Subject: [PATCH 11/20] refactor(ingestion): finish naming, simplify source discriminator, present-tense Javadoc Addresses three more items from the latest review pass on PR #17436: (a) Rename the two remaining types we forgot in the first naming sweep: CliVersionResolutionHelper -> IngestionCliVersionResolutionHelper CliVersionResolutionLogger -> IngestionCliVersionResolutionLogger Everything in the matrix surface area now carries the `Ingestion` prefix consistently. (c) Strip development-narrative phrasing from Javadoc. The reviewer called out ServerEntry's "vs the previous" framing; the same shape appeared in several other files I added. Rewrote each as a present-tense description of the current design, dropping "Lives separately ... on purpose", "Held as a field so ...", "replaces an earlier set of flat keys", "without an env change", and similar phrases that read as a changelog rather than docs. (d) Simplify the cliVersionMatrix.source discriminator. The original "URL- presence auto-inference for backward compat" branch was complexity for zero benefit -- there are no pre-discriminator deployments. application.yaml now defaults source to "http"; the factory does: if source == "none" -> NoOp (explicit kill-switch) else if http.url empty -> NoOp (OSS default) else -> HttpUrlIngestionCliVersionMatrixSource This is the same observable behavior with one fewer branch and no "compatibility" framing. The dropped resolveSource(...) helper, the URL-presence inference, and the related Javadoc all came out. Tests (all 77 across the 8 affected suites still green): - IngestionCliVersionMatrixServiceFactoryTest rewritten around the new contract: "URL controls HTTP vs NoOp when source != none" + "source=none is a kill-switch". noneIsCaseInsensitive added in place of the previous httpIsCaseInsensitive (only `none` short-circuits now, anything else falls through to URL-presence). - Inner-class refs (IngestionCliVersionResolutionHelper.Result) and callers (CreateIngestionExecutionRequestResolver, CreateTestConnectionRequestResolver, IngestionScheduler) updated by rename. End-to-end verified against a running GMS: all 4 resolution tiers + test- connection + scheduled trigger + executor consumption green; system-info now shows ingestion.cliVersionMatrix.source = "http" (the new default). Co-Authored-By: Claude Opus 4.7 --- ...eateIngestionExecutionRequestResolver.java | 14 ++--- .../CreateTestConnectionRequestResolver.java | 14 ++--- .../ingestion/IngestionScheduler.java | 14 ++--- .../config/CliVersionMatrixConfiguration.java | 20 +++---- .../config/HttpMatrixSourceConfiguration.java | 8 +-- ...ttpUrlIngestionCliVersionMatrixSource.java | 11 ++-- .../ingestion/IngestionCliVersionMatrix.java | 11 ++-- .../IngestionCliVersionMatrixService.java | 7 +-- ... IngestionCliVersionResolutionHelper.java} | 15 +++-- ... IngestionCliVersionResolutionLogger.java} | 22 ++++--- .../metadata/ingestion/ServerEntry.java | 5 +- .../src/main/resources/application.yaml | 6 +- ...estionCliVersionResolutionHelperTest.java} | 41 ++++++------- ...gestionCliVersionMatrixServiceFactory.java | 43 +++++--------- ...ionCliVersionMatrixServiceFactoryTest.java | 59 ++++++++----------- 15 files changed, 128 insertions(+), 162 deletions(-) rename metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/{CliVersionResolutionHelper.java => IngestionCliVersionResolutionHelper.java} (91%) rename metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/{CliVersionResolutionLogger.java => IngestionCliVersionResolutionLogger.java} (77%) rename metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/{CliVersionResolutionHelperTest.java => IngestionCliVersionResolutionHelperTest.java} (78%) diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java index 827f6c3aeff4..4e60d80a9666 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java @@ -23,9 +23,9 @@ import com.linkedin.execution.ExecutionRequestSource; import com.linkedin.ingestion.DataHubIngestionSourceInfo; import com.linkedin.metadata.config.IngestionConfiguration; -import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; -import com.linkedin.metadata.ingestion.CliVersionResolutionLogger; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionResolutionHelper; +import com.linkedin.metadata.ingestion.IngestionCliVersionResolutionLogger; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.utils.EntityKeyUtils; import com.linkedin.metadata.utils.IngestionUtils; @@ -152,8 +152,8 @@ public CompletableFuture get(final DataFetchingEnvironment environment) ingestionSourceInfo.getConfig().hasVersion() ? ingestionSourceInfo.getConfig().getVersion() : null; - final CliVersionResolutionHelper.Result resolution = - CliVersionResolutionHelper.resolve( + final IngestionCliVersionResolutionHelper.Result resolution = + IngestionCliVersionResolutionHelper.resolve( explicitVersion, ingestionSourceInfo.getType(), _versionMatrixService, @@ -163,12 +163,12 @@ public CompletableFuture get(final DataFetchingEnvironment environment) : null); arguments.put(VERSION_ARG_NAME, resolution.getVersion()); execInput.setCliVersionAudit(resolution.getStamp()); - CliVersionResolutionLogger.log( + IngestionCliVersionResolutionLogger.log( log, - CliVersionResolutionLogger.TRIGGER_MANUAL, + IngestionCliVersionResolutionLogger.TRIGGER_MANUAL, resolution, ingestionSourceInfo.getType(), - CliVersionResolutionLogger.IDENTIFIER_INGESTION_SOURCE, + IngestionCliVersionResolutionLogger.IDENTIFIER_INGESTION_SOURCE, ingestionSourceUrn.toString()); String debugMode = "false"; if (ingestionSourceInfo.getConfig().hasDebugMode()) { diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java index 798c9b5140d0..d03da7ad4ca5 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java @@ -16,9 +16,9 @@ import com.linkedin.execution.ExecutionRequestInput; import com.linkedin.execution.ExecutionRequestSource; import com.linkedin.metadata.config.IngestionConfiguration; -import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; -import com.linkedin.metadata.ingestion.CliVersionResolutionLogger; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionResolutionHelper; +import com.linkedin.metadata.ingestion.IngestionCliVersionResolutionLogger; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.utils.EntityKeyUtils; import com.linkedin.metadata.utils.IngestionUtils; @@ -128,8 +128,8 @@ public CompletableFuture get(final DataFetchingEnvironment environment) // application default; without that normalization the blank would forward verbatim to // the executor and silently pin to its bundled CLI. final String connectorType = extractSourceType(input.getRecipe()); - final CliVersionResolutionHelper.Result resolution = - CliVersionResolutionHelper.resolve( + final IngestionCliVersionResolutionHelper.Result resolution = + IngestionCliVersionResolutionHelper.resolve( input.getVersion(), connectorType, _versionMatrixService, @@ -142,12 +142,12 @@ public CompletableFuture get(final DataFetchingEnvironment environment) } execInput.setArgs(new StringMap(arguments)); execInput.setCliVersionAudit(resolution.getStamp()); - CliVersionResolutionLogger.log( + IngestionCliVersionResolutionLogger.log( log, - CliVersionResolutionLogger.TRIGGER_TEST_CONNECTION, + IngestionCliVersionResolutionLogger.TRIGGER_TEST_CONNECTION, resolution, connectorType, - CliVersionResolutionLogger.IDENTIFIER_EXECUTION_REQUEST, + IngestionCliVersionResolutionLogger.IDENTIFIER_EXECUTION_REQUEST, executionRequestUrn.toString()); final MetadataChangeProposal proposal = diff --git a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java index 0acbd6d0d3e8..8ce49ce17c60 100644 --- a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java +++ b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java @@ -17,9 +17,9 @@ import com.linkedin.ingestion.DataHubIngestionSourceSchedule; import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; -import com.linkedin.metadata.ingestion.CliVersionResolutionHelper; -import com.linkedin.metadata.ingestion.CliVersionResolutionLogger; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; +import com.linkedin.metadata.ingestion.IngestionCliVersionResolutionHelper; +import com.linkedin.metadata.ingestion.IngestionCliVersionResolutionLogger; import com.linkedin.metadata.key.ExecutionRequestKey; import com.linkedin.metadata.query.ListResult; import com.linkedin.metadata.utils.GenericRecordUtils; @@ -427,8 +427,8 @@ public void run() { ingestionSourceInfo.getConfig().hasVersion() ? ingestionSourceInfo.getConfig().getVersion() : null; - final CliVersionResolutionHelper.Result resolution = - CliVersionResolutionHelper.resolve( + final IngestionCliVersionResolutionHelper.Result resolution = + IngestionCliVersionResolutionHelper.resolve( explicitVersion, ingestionSourceInfo.getType(), versionMatrixService, @@ -436,12 +436,12 @@ public void run() { versionMatrixService != null ? versionMatrixService.getServerVersion() : null); arguments.put(VERSION_ARGUMENT_NAME, resolution.getVersion()); input.setCliVersionAudit(resolution.getStamp()); - CliVersionResolutionLogger.log( + IngestionCliVersionResolutionLogger.log( log, - CliVersionResolutionLogger.TRIGGER_SCHEDULED, + IngestionCliVersionResolutionLogger.TRIGGER_SCHEDULED, resolution, ingestionSourceInfo.getType(), - CliVersionResolutionLogger.IDENTIFIER_INGESTION_SOURCE, + IngestionCliVersionResolutionLogger.IDENTIFIER_INGESTION_SOURCE, ingestionSourceUrn.toString()); String debugMode = "false"; if (ingestionSourceInfo.getConfig().hasDebugMode()) { diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/CliVersionMatrixConfiguration.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/CliVersionMatrixConfiguration.java index 958efcc714d6..c4378a8b5ceb 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/CliVersionMatrixConfiguration.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/CliVersionMatrixConfiguration.java @@ -6,10 +6,9 @@ * Per-connector ingestion CLI version matrix configuration. Bound under {@code * ingestion.cliVersionMatrix} in application.yaml. * - *

    The nested structure replaces an earlier set of flat {@code versionMatrix*} keys on the parent - * {@link IngestionConfiguration}. Keeping all matrix configuration under one block makes adding - * future backends (GMS aspect, AppConfig, etcd, …) a localized change — each new backend gets its - * own nested block keyed off {@link #source}. + *

    Each matrix backend gets its own nested configuration block keyed off {@link #source}, so + * adding new backends (GMS aspect, AppConfig, etcd, …) does not pile flat properties under {@code + * ingestion:}. */ @Data public class CliVersionMatrixConfiguration { @@ -18,18 +17,13 @@ public class CliVersionMatrixConfiguration { * Source backend discriminator. Recognised values: * *

      - *
    • {@code "http"} — fetch via HTTP using {@link #http} - *
    • {@code "none"} — explicit disable (no matrix consulted) - *
    • unset / empty — backward-compatible auto-inference: if {@code http.url} is set, treat as - * {@code "http"}; otherwise treat as {@code "none"}. Lets existing deployments that - * pre-date this discriminator continue working without an env change. + *
    • {@code "http"} (default) — fetch the matrix over HTTP using {@link #http}. When the URL + * in that block is empty, the factory wires a no-op source. + *
    • {@code "none"} — explicit kill-switch that wins even when an {@code http.url} is set. *
    */ private String source; - /** - * Configuration for the HTTP-fetched matrix backend. Always present so the factory can read - * {@code http.url} for the backward-compat auto-inference path even when {@code source} is unset. - */ + /** Configuration for the HTTP-fetched matrix backend. */ private HttpMatrixSourceConfiguration http; } diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/HttpMatrixSourceConfiguration.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/HttpMatrixSourceConfiguration.java index 367c120df394..8174dfcc1a97 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/HttpMatrixSourceConfiguration.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/config/HttpMatrixSourceConfiguration.java @@ -4,11 +4,9 @@ /** * HTTP-fetched ingestion CLI version matrix configuration. Bound under {@code - * ingestion.cliVersionMatrix.http} in application.yaml. - * - *

    Lives in its own POJO so adding alternative matrix backends in the future (GMS aspect, - * AppConfig, etcd, …) does not require piling more flat properties under {@code ingestion:} — each - * backend gets its own nested block keyed off the {@code source} discriminator on the parent. + * ingestion.cliVersionMatrix.http} in application.yaml. Each backend defined under {@code + * cliVersionMatrix} gets its own nested block keyed off the {@code source} discriminator on the + * parent. */ @Data public class HttpMatrixSourceConfiguration { diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java index e1a4eed4138d..284a9959a663 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java @@ -104,10 +104,7 @@ public class HttpUrlIngestionCliVersionMatrixSource implements IngestionCliVersi private final AtomicLong lastFetchedAtMillis; private final ObjectMapper objectMapper; - /** - * The background refresh scheduler. Held as a field so {@link #shutdown()} can stop it on Spring - * context teardown — without this, the thread would leak across context restarts in dev / tests. - */ + /** Background refresh scheduler, stopped by {@link #shutdown()} on Spring context teardown. */ private final ScheduledExecutorService executor; /** Convenience constructor for unauthenticated (public) URLs. */ @@ -137,9 +134,9 @@ public HttpUrlIngestionCliVersionMatrixSource( } /** - * Gracefully stop the background refresh on Spring context teardown. Without this hook the - * scheduled-executor thread keeps the JVM alive and leaks across context restarts (matters in dev - * and in test contexts that re-create the bean). + * Gracefully stop the background refresh on Spring context teardown. Spring invokes this hook + * during bean destruction so the scheduled-executor thread does not leak across context restarts + * (relevant in dev hot-reload and integration-test contexts that re-create the bean). */ @PreDestroy public void shutdown() { diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java index 1679f403a354..7b14e40bdff1 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrix.java @@ -8,13 +8,12 @@ * *

    The matrix is keyed by server release version → {@link ServerEntry}, which in turn maps * connector type to a {@link ConnectorEntry} carrying a {@code _default} version and an optional - * ordered list of canary cohorts. Wrapping the inner map in {@link ServerEntry} keeps lookup sites - * self-documenting and gives us a single place to add server-level behavior later. + * ordered list of canary cohorts. * - *

    This is a pure POJO produced by {@link IngestionCliVersionMatrixSource} implementations and - * consumed by {@link IngestionCliVersionMatrixService} — the storage layer (HTTP, GMS aspect, - * config server, …) is decoupled from the resolution layer that walks the matrix and applies - * precedence rules. + *

    A pure POJO: {@link IngestionCliVersionMatrixSource} implementations produce it and {@link + * IngestionCliVersionMatrixService} consumes it. The storage layer (HTTP, GMS aspect, config + * server, …) is decoupled from the resolution layer that walks the matrix and applies precedence + * rules. */ public final class IngestionCliVersionMatrix { diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java index f089f9f0ddd5..97bc1ff24db5 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixService.java @@ -13,10 +13,9 @@ * *

    Cohort-based rollouts are aimed at multi-tenant deployments. Single-tenant installations leave * the deployment identifier unset, which makes cohort matching a no-op and falls through to the - * connector's {@code _default}. When no {@code IngestionCliVersionMatrixSource} is configured at - * all, the {@link NoOpIngestionCliVersionMatrixSource} wired by the factory ensures every {@link - * #resolveVersionWithSource(String)} returns {@link Optional#empty()}, preserving the existing - * {@code defaultCliVersion} behavior bit-identically. + * connector's {@code _default}. When no matrix backend is configured, the factory wires a {@link + * NoOpIngestionCliVersionMatrixSource} so {@link #resolveVersionWithSource(String)} always returns + * {@link Optional#empty()} and callers use {@code defaultCliVersion}. * *

    Resolution priority when picking a CLI version for an execution: * diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelper.java similarity index 91% rename from metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java rename to metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelper.java index 34a66fcf4121..0d480177518c 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelper.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelper.java @@ -32,9 +32,9 @@ *

  • Application default — {@code defaultCliVersion} from application.yaml * */ -public final class CliVersionResolutionHelper { +public final class IngestionCliVersionResolutionHelper { - private CliVersionResolutionHelper() {} + private IngestionCliVersionResolutionHelper() {} /** * Resolve a CLI version for an ingestion or test-connection request. @@ -60,12 +60,11 @@ public static Result resolve( @Nullable String defaultCliVersion, @Nullable String serverVersion) { - // Normalize the per-source version: bootstrap YAML templating can render null, empty, or - // whitespace-only strings (a source created with no version pin renders as - // `version: " "` after Mustache substitution), and all three must mean "unset" so we fall - // through to the matrix / application default. Without this trim, a blank template would - // forward verbatim to the executor and silently pin to whatever CLI it ships with — - // defeating both defaultCliVersion and the matrix. + // Normalize the per-source version: bootstrap YAML templating can render `version: "{{ + // config.version }}"` as null, empty, or three spaces when the source has no version pin, + // and all three must collapse to "unset" so resolution falls through to the matrix / + // application default. A blank string forwarded to the executor would silently pin to its + // bundled CLI rather than the configured default. final String normalizedExplicit = explicitVersion != null && !explicitVersion.trim().isEmpty() ? explicitVersion.trim() diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionLogger.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionLogger.java similarity index 77% rename from metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionLogger.java rename to metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionLogger.java index 9504056a10d6..0938a3ccd5c2 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/CliVersionResolutionLogger.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionLogger.java @@ -6,16 +6,14 @@ /** * Shared structured logging for CLI version resolution at the three call sites ({@code * CreateIngestionExecutionRequestResolver}, {@code CreateTestConnectionRequestResolver}, {@code - * IngestionScheduler}). Lives separately from {@link CliVersionResolutionHelper} on purpose — the - * helper stays logging-free per review guidance, and emitting the line at the resolver layer keeps - * each log entry tagged with the caller's class. + * IngestionScheduler}). Callers pass their own SLF4J logger so each entry is tagged with the + * caller's class — operators' existing class-name log filters keep working. * - *

    Why structured logging matters here: the resolution ladder spans four tiers and three - * different trigger paths. Without a parallel-shaped log line at each call site, "which CLI did - * this run get and why?" requires log archaeology. With it, {@code grep "Resolved ingestion CLI - * version"} across all three resolvers' logs shows every resolution in the same shape. + *

    The resolution ladder spans four tiers and three trigger paths. Producing a parallel-shaped + * line at every call site lets {@code grep "Resolved ingestion CLI version"} surface every + * resolution in the same shape, with the trigger label distinguishing the source. */ -public final class CliVersionResolutionLogger { +public final class IngestionCliVersionResolutionLogger { /** Trigger label for the on-demand GraphQL execution request path. */ public static final String TRIGGER_MANUAL = "manual trigger"; @@ -32,7 +30,7 @@ public final class CliVersionResolutionLogger { /** Identifier key for an execution-request URN (used when no ingestion source exists). */ public static final String IDENTIFIER_EXECUTION_REQUEST = "executionRequest"; - private CliVersionResolutionLogger() {} + private IngestionCliVersionResolutionLogger() {} /** * Emit a DEBUG line capturing which tier of the resolution ladder produced the version, and a @@ -42,8 +40,8 @@ private CliVersionResolutionLogger() {} * @param log SLF4J logger to use. Pass the caller's {@code @Slf4j}-generated {@code log} so log * lines appear under the caller's class. * @param trigger short label distinguishing the call site (see {@link #TRIGGER_MANUAL}, etc.) - * @param resolution result from {@link CliVersionResolutionHelper#resolve(String, String, - * IngestionCliVersionMatrixService, String, String)} + * @param resolution result from {@link IngestionCliVersionResolutionHelper#resolve(String, + * String, IngestionCliVersionMatrixService, String, String)} * @param connectorType source type from the recipe, may be {@code null} when the recipe lacks a * parseable {@code source.type} * @param identifierKey self-describing key for {@code identifierValue} — typically {@link @@ -53,7 +51,7 @@ private CliVersionResolutionLogger() {} public static void log( final Logger log, final String trigger, - final CliVersionResolutionHelper.Result resolution, + final IngestionCliVersionResolutionHelper.Result resolution, @Nullable final String connectorType, final String identifierKey, final String identifierValue) { diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ServerEntry.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ServerEntry.java index 4f15d1b33a2e..9a55468805d4 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ServerEntry.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/ServerEntry.java @@ -8,9 +8,8 @@ * The per-connector entries for a single GMS server release in the ingestion CLI version matrix. * *

    Maps connector type (e.g. {@code "snowflake"}, {@code "bigquery"}) to that connector's {@link - * ConnectorEntry}. Wrapping the inner map in a named POJO instead of exposing {@code Map} keeps lookup sites self-documenting: {@code serverEntry.getConnectorEntry(type)} - * vs the previous {@code serverEntry.get(type)} where the meaning of either string was opaque. + * ConnectorEntry}. The named lookup method {@link #getConnectorEntry(String)} makes the meaning of + * the string key obvious at every call site. */ public final class ServerEntry { diff --git a/metadata-service/configuration/src/main/resources/application.yaml b/metadata-service/configuration/src/main/resources/application.yaml index 2a99c1487763..1e1e1d66368b 100644 --- a/metadata-service/configuration/src/main/resources/application.yaml +++ b/metadata-service/configuration/src/main/resources/application.yaml @@ -108,9 +108,9 @@ ingestion: # basis. Update the matrix file externally (S3/CDN/gist) without redeploying GMS to change # connector versions. Set source=none (or leave url empty) to disable. cliVersionMatrix: - # Source backend: "http" or "none". When unset, inferred from http.url presence — leaves - # existing deployments that only set INGESTION_VERSION_MATRIX_URL working unchanged. - source: "${INGESTION_VERSION_MATRIX_SOURCE:}" + # Source backend: "http" (default) or "none". Setting "none" is an explicit kill-switch even + # when an http.url is configured. With "http" but no url, the factory wires a no-op source. + source: "${INGESTION_VERSION_MATRIX_SOURCE:http}" http: url: "${INGESTION_VERSION_MATRIX_URL:}" refreshSeconds: ${INGESTION_VERSION_MATRIX_REFRESH_SECONDS:600} diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelperTest.java similarity index 78% rename from metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java rename to metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelperTest.java index 8343f57e94b4..8f8d0cf4c3d1 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/CliVersionResolutionHelperTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelperTest.java @@ -9,7 +9,7 @@ import org.testng.annotations.Test; /** - * Focused unit tests for {@link CliVersionResolutionHelper}. + * Focused unit tests for {@link IngestionCliVersionResolutionHelper}. * *

    Covers the precedence ladder (source config override > matrix cohort > matrix connector * default > application default) and the per-source normalization contract (null, empty, and @@ -18,15 +18,15 @@ * the source has no version pin — forwarding that verbatim to the executor would silently pin to * the bundled CLI rather than the configured default. */ -public class CliVersionResolutionHelperTest { +public class IngestionCliVersionResolutionHelperTest { private static final String DEFAULT_CLI = "0.14.0"; private static final String SERVER_VERSION = "1.3.1.4"; @Test public void testPerSourceVersionWins() { - CliVersionResolutionHelper.Result result = - CliVersionResolutionHelper.resolve( + IngestionCliVersionResolutionHelper.Result result = + IngestionCliVersionResolutionHelper.resolve( "0.13.5", "snowflake", null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), "0.13.5"); @@ -36,8 +36,8 @@ public void testPerSourceVersionWins() { @Test public void testPerSourceWhitespaceIsTrimmed() { - CliVersionResolutionHelper.Result result = - CliVersionResolutionHelper.resolve( + IngestionCliVersionResolutionHelper.Result result = + IngestionCliVersionResolutionHelper.resolve( " 0.13.5 ", "snowflake", null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), "0.13.5"); @@ -46,8 +46,8 @@ public void testPerSourceWhitespaceIsTrimmed() { @Test public void testPerSourceNullFallsThroughToDefault() { - CliVersionResolutionHelper.Result result = - CliVersionResolutionHelper.resolve(null, null, null, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.Result result = + IngestionCliVersionResolutionHelper.resolve(null, null, null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), DEFAULT_CLI); assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); @@ -55,8 +55,8 @@ public void testPerSourceNullFallsThroughToDefault() { @Test public void testPerSourceEmptyFallsThroughToDefault() { - CliVersionResolutionHelper.Result result = - CliVersionResolutionHelper.resolve("", null, null, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.Result result = + IngestionCliVersionResolutionHelper.resolve("", null, null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), DEFAULT_CLI); assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); @@ -68,8 +68,8 @@ public void testPerSourceWhitespaceOnlyFallsThroughToDefault() { // when the source has no version pin) must be treated as "unset" — otherwise we'd forward the // blank string to the executor, which would silently use its bundled CLI rather than the // configured application default. - CliVersionResolutionHelper.Result result = - CliVersionResolutionHelper.resolve(" ", null, null, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.Result result = + IngestionCliVersionResolutionHelper.resolve(" ", null, null, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), DEFAULT_CLI); assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); @@ -86,8 +86,8 @@ public void testMatrixConnectorDefaultWinsOverApplicationDefault() { "0.13.5", IngestionCliVersionMatrixService.MatrixSourceLevel.CONNECTOR_DEFAULT))); - CliVersionResolutionHelper.Result result = - CliVersionResolutionHelper.resolve( + IngestionCliVersionResolutionHelper.Result result = + IngestionCliVersionResolutionHelper.resolve( null, "snowflake", matrixService, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), "0.13.5"); @@ -104,8 +104,8 @@ public void testMatrixCohortWinsOverConnectorDefault() { new IngestionCliVersionMatrixService.MatrixResolution( "0.13.6", IngestionCliVersionMatrixService.MatrixSourceLevel.COHORT))); - CliVersionResolutionHelper.Result result = - CliVersionResolutionHelper.resolve( + IngestionCliVersionResolutionHelper.Result result = + IngestionCliVersionResolutionHelper.resolve( null, "snowflake", matrixService, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), "0.13.6"); @@ -119,8 +119,9 @@ public void testNullConnectorTypeSkipsMatrix() { IngestionCliVersionMatrixService matrixService = Mockito.mock(IngestionCliVersionMatrixService.class); - CliVersionResolutionHelper.Result result = - CliVersionResolutionHelper.resolve(null, null, matrixService, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.Result result = + IngestionCliVersionResolutionHelper.resolve( + null, null, matrixService, DEFAULT_CLI, SERVER_VERSION); assertEquals(result.getVersion(), DEFAULT_CLI); assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); @@ -131,8 +132,8 @@ public void testNullConnectorTypeSkipsMatrix() { public void testNullDefaultStillReturnsStamp() { // OSS misconfiguration (defaultCliVersion not set) — we still emit a deterministic stamp so // forensic queries see a definite answer rather than a missing field. - CliVersionResolutionHelper.Result result = - CliVersionResolutionHelper.resolve(null, null, null, null, SERVER_VERSION); + IngestionCliVersionResolutionHelper.Result result = + IngestionCliVersionResolutionHelper.resolve(null, null, null, null, SERVER_VERSION); assertEquals(result.getVersion(), ""); assertNotNull(result.getStamp()); diff --git a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java index 2acc0b2b38ae..818a80adb092 100644 --- a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java +++ b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java @@ -24,12 +24,10 @@ *

      *
    • {@code ingestionCliVersionMatrixSource} — implements {@link * IngestionCliVersionMatrixSource}. The implementation is selected from {@code - * ingestion.cliVersionMatrix.source}: {@code "http"} → {@link - * HttpUrlIngestionCliVersionMatrixSource}, {@code "none"} → {@link - * NoOpIngestionCliVersionMatrixSource}. When the discriminator is unset, the choice is - * inferred from {@code ingestion.cliVersionMatrix.http.url} so existing deployments keep - * working without an env change. Future implementations (GMS-aspect-backed, - * config-server-backed, …) get a new discriminator value here. + * ingestion.cliVersionMatrix.source}: {@code "none"} short-circuits to {@link + * NoOpIngestionCliVersionMatrixSource}; otherwise an HTTP source is wired when {@code + * http.url} is set and a no-op source is wired when the URL is empty. Future backends (GMS + * aspect, config server, …) add their own discriminator value handled here. *
    • {@code ingestionCliVersionMatrixService} — consumes whichever {@link * IngestionCliVersionMatrixSource} is bound and applies the resolution policy (cohort → * connector default → application default). @@ -44,7 +42,6 @@ @Configuration public class IngestionCliVersionMatrixServiceFactory { - private static final String SOURCE_HTTP = "http"; private static final String SOURCE_NONE = "none"; @Autowired @@ -56,9 +53,9 @@ public class IngestionCliVersionMatrixServiceFactory { private GitVersion gitVersion; /** - * Picks the storage backend for the matrix. Reads {@code ingestion.cliVersionMatrix.source} as - * the primary signal; falls back to inferring from {@code http.url} presence when the - * discriminator is unset (backward-compat for deployments that pre-date this discriminator). + * Picks the storage backend for the matrix from {@code ingestion.cliVersionMatrix.source}. + * Explicit {@code "none"} is a kill-switch that always wins. Otherwise the HTTP source is wired + * when {@code http.url} is non-empty, and a no-op source when the URL is empty. */ @Bean(name = "ingestionCliVersionMatrixSource") @Scope("singleton") @@ -69,27 +66,19 @@ protected IngestionCliVersionMatrixSource ingestionCliVersionMatrixSource() { if (matrixConfig == null) { return new NoOpIngestionCliVersionMatrixSource(); } - + if (SOURCE_NONE.equalsIgnoreCase(trim(matrixConfig.getSource()))) { + return new NoOpIngestionCliVersionMatrixSource(); + } HttpMatrixSourceConfiguration httpConfig = matrixConfig.getHttp(); - boolean httpUrlPresent = - httpConfig != null && httpConfig.getUrl() != null && !httpConfig.getUrl().isEmpty(); - - if (resolveSource(matrixConfig.getSource(), httpUrlPresent).equals(SOURCE_HTTP)) { - return new HttpUrlIngestionCliVersionMatrixSource( - httpConfig.getUrl(), httpConfig.getRefreshSeconds(), httpConfig.getAuthToken()); + if (httpConfig == null || httpConfig.getUrl() == null || httpConfig.getUrl().isEmpty()) { + return new NoOpIngestionCliVersionMatrixSource(); } - return new NoOpIngestionCliVersionMatrixSource(); + return new HttpUrlIngestionCliVersionMatrixSource( + httpConfig.getUrl(), httpConfig.getRefreshSeconds(), httpConfig.getAuthToken()); } - /** - * Resolve the active source backend. Explicit {@code source} wins; an unset value is inferred - * from URL presence so deployments that pre-date this discriminator continue to work. - */ - private static String resolveSource(String configuredSource, boolean httpUrlPresent) { - if (configuredSource != null && !configuredSource.trim().isEmpty()) { - return configuredSource.trim().toLowerCase(); - } - return httpUrlPresent ? SOURCE_HTTP : SOURCE_NONE; + private static String trim(String s) { + return s == null ? null : s.trim(); } @Bean(name = "ingestionCliVersionMatrixService") diff --git a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java index 3bcfd6927f6c..1ff62834d629 100644 --- a/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java +++ b/metadata-service/factories/src/test/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactoryTest.java @@ -18,10 +18,9 @@ import org.testng.annotations.Test; /** - * Direct unit tests for {@link IngestionCliVersionMatrixServiceFactory}. Covers the {@code - * ingestion.cliVersionMatrix.source} discriminator (explicit {@code "http"} / {@code "none"}) and - * the backward-compat auto-inference path (unset discriminator infers from {@code http.url} - * presence so deployments that pre-date this discriminator keep working). + * Direct unit tests for {@link IngestionCliVersionMatrixServiceFactory}. Covers the source + * selection contract: explicit {@code source: "none"} is a kill-switch; otherwise the HTTP source + * is wired when {@code http.url} is set and a no-op source is wired when the URL is empty. */ public class IngestionCliVersionMatrixServiceFactoryTest { @@ -40,9 +39,8 @@ public void setUp() { gitVersion = mock(GitVersion.class); when(configProvider.getIngestion()).thenReturn(ingestionConfig); - // GitVersion.toConfig() is called for the server-version key. Returning an empty config is - // fine for the ingestionCliVersionMatrixSource() bean; only getInstance() reads the server - // version. + // GitVersion.toConfig() is read by the service-construction bean; an empty fixture is fine + // for the source-selection tests which only exercise ingestionCliVersionMatrixSource(). when(gitVersion.toConfig()).thenReturn(Map.of("version", "test-server-1.0")); setField(factory, "configProvider", configProvider); @@ -50,33 +48,32 @@ public void setUp() { } // --------------------------------------------------------------------------- - // Backward-compat auto-inference (source unset) + // URL controls HTTP vs NoOp when source is not "none" // --------------------------------------------------------------------------- @Test - public void testMatrixSource_whenSourceUnsetAndUrlNull_wiresNoOp() { - // Default state from setUp: source unset, http.url null. Should infer "none". + public void testMatrixSource_whenUrlIsNull_wiresNoOp() { + // Default state from setUp: url null. Factory wires a no-op source. IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); assertTrue( source instanceof NoOpIngestionCliVersionMatrixSource, - "Unset source + unset url should infer 'none' (OSS-safe default)"); + "An unset URL must wire NoOpIngestionCliVersionMatrixSource"); } @Test - public void testMatrixSource_whenSourceUnsetAndUrlEmpty_wiresNoOp() { + public void testMatrixSource_whenUrlIsEmpty_wiresNoOp() { ingestionConfig.getCliVersionMatrix().getHttp().setUrl(""); IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); assertTrue( source instanceof NoOpIngestionCliVersionMatrixSource, - "Unset source + empty url should infer 'none'"); + "An empty URL is treated the same as unset — NoOp"); } @Test - public void testMatrixSource_whenSourceUnsetButUrlPresent_infersHttp() { - // Backward-compat: pre-discriminator deployments only set the URL. Should still wire HTTP. + public void testMatrixSource_whenUrlIsSet_wiresHttpUrlSource() { ingestionConfig.getCliVersionMatrix().getHttp().setUrl("file:///tmp/nonexistent.json"); ingestionConfig.getCliVersionMatrix().getHttp().setRefreshSeconds(3600); @@ -84,15 +81,11 @@ public void testMatrixSource_whenSourceUnsetButUrlPresent_infersHttp() { assertTrue( source instanceof HttpUrlIngestionCliVersionMatrixSource, - "URL-only configuration must keep wiring HTTP for pre-discriminator deployments"); + "A non-empty URL wires HttpUrlIngestionCliVersionMatrixSource"); } - // --------------------------------------------------------------------------- - // Explicit discriminator - // --------------------------------------------------------------------------- - @Test - public void testMatrixSource_explicitHttp_wiresHttpUrlSource() { + public void testMatrixSource_whenSourceIsExplicitHttp_wiresHttpUrlSource() { ingestionConfig.getCliVersionMatrix().setSource("http"); ingestionConfig.getCliVersionMatrix().getHttp().setUrl("file:///tmp/nonexistent.json"); ingestionConfig.getCliVersionMatrix().getHttp().setRefreshSeconds(3600); @@ -101,12 +94,15 @@ public void testMatrixSource_explicitHttp_wiresHttpUrlSource() { assertTrue( source instanceof HttpUrlIngestionCliVersionMatrixSource, - "Explicit source='http' with url present should wire HttpUrlIngestionCliVersionMatrixSource"); + "Explicit source='http' with a URL wires HttpUrlIngestionCliVersionMatrixSource"); } + // --------------------------------------------------------------------------- + // Explicit source="none" is a kill-switch + // --------------------------------------------------------------------------- + @Test - public void testMatrixSource_explicitNone_overridesUrlPresence() { - // Explicit "none" must win even when a URL is configured. + public void testMatrixSource_whenSourceIsNone_overridesUrlPresence() { ingestionConfig.getCliVersionMatrix().setSource("none"); ingestionConfig.getCliVersionMatrix().getHttp().setUrl("file:///tmp/nonexistent.json"); @@ -114,20 +110,19 @@ public void testMatrixSource_explicitNone_overridesUrlPresence() { assertTrue( source instanceof NoOpIngestionCliVersionMatrixSource, - "Explicit source='none' must override URL presence (kill-switch semantics)"); + "source='none' is a kill-switch that wins over a configured URL"); } @Test - public void testMatrixSource_sourceIsCaseInsensitive() { - ingestionConfig.getCliVersionMatrix().setSource("HTTP"); + public void testMatrixSource_noneIsCaseInsensitive() { + ingestionConfig.getCliVersionMatrix().setSource("NONE"); ingestionConfig.getCliVersionMatrix().getHttp().setUrl("file:///tmp/nonexistent.json"); - ingestionConfig.getCliVersionMatrix().getHttp().setRefreshSeconds(3600); IngestionCliVersionMatrixSource source = factory.ingestionCliVersionMatrixSource(); assertTrue( - source instanceof HttpUrlIngestionCliVersionMatrixSource, - "Source discriminator should be case-insensitive (operators may set HTTP or http)"); + source instanceof NoOpIngestionCliVersionMatrixSource, + "Operators may set NONE or none — both must short-circuit to NoOp"); } // --------------------------------------------------------------------------- @@ -144,9 +139,7 @@ public void testGetInstance_buildsServiceWithServerVersionFromGitVersion() { assertNotNull(service); assertEquals( - service.getServerVersion(), - "1.5.0", - "Service should be constructed with the GitVersion's reported version"); + service.getServerVersion(), "1.5.0", "Service uses the version reported by GitVersion"); } /** Reflection helper — the factory's autowired fields are private, like every Spring bean. */ From 8e12e5eaab5f35a9a5eded2b0b2dc9d5dee9e8bc Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Fri, 29 May 2026 11:56:02 +0530 Subject: [PATCH 12/20] refactor(ingestion): use Jackson for extractSourceType (matches matrix fetcher) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Partial response to PR review #6 (two JSON libraries in the matrix surface area). extractSourceType was the only new code my PR added that used org.json. The matrix fetcher (HttpUrlIngestionCliVersionMatrixSource) uses Jackson — having two libraries doing the same parse-a-small-JSON job in adjacent files added maintenance cognitive load without any runtime benefit. Switch extractSourceType to Jackson so all new code emitted by this PR uses a single library. The body shrinks slightly because Jackson's chained .path() returns a missing-node when a segment is absent, so the explicit has() checks for `source` and `type` collapse into a single isTextual() check at the leaf. Out of scope: CreateIngestionExecutionRequestResolver.injectRunId still uses org.json. That method is from 2022 (commit 57b7ade1f0c, John Joyce) and predates this PR. Its conversion is unrelated tech debt — not bundled to keep this PR's review diff focused on what this PR actually adds. Behaviour unchanged: malformed JSON, missing source, missing source.type, non-textual source.type, and empty-string source.type all still produce the same null + DEBUG-log outcome. 12 tests in CreateTestConnectionRequestResolverTest cover these paths and stay green. Co-Authored-By: Claude Opus 4.7 --- .../CreateTestConnectionRequestResolver.java | 24 +++++++++---------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java index d03da7ad4ca5..015f24bd8daa 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java @@ -4,6 +4,9 @@ import static com.linkedin.datahub.graphql.resolvers.mutate.MutationUtils.*; import static com.linkedin.metadata.Constants.*; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; import com.linkedin.common.urn.Urn; import com.linkedin.common.urn.UrnUtils; import com.linkedin.data.template.StringMap; @@ -30,8 +33,6 @@ import java.util.UUID; import java.util.concurrent.CompletableFuture; import lombok.extern.slf4j.Slf4j; -import org.json.JSONException; -import org.json.JSONObject; /** * Creates an on-demand "test connection" ingestion execution request. @@ -63,6 +64,9 @@ public class CreateTestConnectionRequestResolver implements DataFetcher Date: Fri, 29 May 2026 12:16:13 +0530 Subject: [PATCH 13/20] refactor(ingestion): fetch matrix with java.net.http.HttpClient (HTTP/HTTPS only) Addresses PR review #7 (SSRF / URLConnection accepts any protocol). The old implementation used java.net.URLConnection, which transparently handles file://, jar://, ftp://, and other schemes. If a future code path lets a non-admin influence the matrix URL (e.g. a GmsAspect-backed source), the URL would be reachable as an SSRF primitive against the GMS pod's filesystem. Not exploitable today (the URL is operator-controlled via env var), but fixing it now means the future backend is born safe. Switched HttpUrlIngestionCliVersionMatrixSource.refresh() to java.net.http.HttpClient (Java 11+). HttpClient supports HTTP and HTTPS only -- any other scheme throws IllegalArgumentException at send-time, which the existing catch (Exception) branch turns into the same "Failed to refresh ... Retaining last known matrix" WARN that other fetch failures produce. The connect and per-request timeouts (10s, unchanged) carry over. Added an explicit non-2xx response check because URLConnection used to throw IOException on non-2xx from getInputStream() whereas HttpClient does not. Also added an InterruptedException handler that restores Thread.interrupt() per project convention. Test infrastructure migrated from temp-file + file:// URIs to embedded com.sun.net.httpserver.HttpServer fixtures, since HttpClient rejects file://: - IngestionCliVersionMatrixServiceTest: new startMatrixServer(json) helper + @AfterMethod tearDown. Six sites that previously called Files.createTempFile / tmp.toUri() now go through the helper. Dropped the Files/Path imports. - CreateIngestionExecutionRequestResolverTest.matrixServiceForConnector: same migration with a JVM shutdown hook to stop the per-call server. Behaviour preserved end-to-end: 75 unit tests across 7 suites pass; the three demo scripts (4-tier ladder, full happy path, phase 2 edge cases) run green against a live GMS fetching from an HTTPS gist. Internal matrix-content parsing, auth-header forwarding, retain-last-known-good on failure, and the @PreDestroy shutdown all unchanged. Co-Authored-By: Claude Opus 4.7 --- ...IngestionExecutionRequestResolverTest.java | 20 +++- ...ttpUrlIngestionCliVersionMatrixSource.java | 50 +++++++--- .../IngestionCliVersionMatrixServiceTest.java | 96 +++++++++++-------- 3 files changed, 109 insertions(+), 57 deletions(-) diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java index 2dec7669aeb5..3eb54ddfa22a 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java @@ -222,13 +222,23 @@ private static IngestionCliVersionMatrixService matrixServiceForConnector( String json = String.format("{\"%s\":{\"%s\":{\"_default\":\"%s\"}}}", serverVersion, connector, version); - java.nio.file.Path tmp = java.nio.file.Files.createTempFile("matrix", ".json"); - java.nio.file.Files.write(tmp, json.getBytes()); - tmp.toFile().deleteOnExit(); + com.sun.net.httpserver.HttpServer server = + com.sun.net.httpserver.HttpServer.create(new java.net.InetSocketAddress("127.0.0.1", 0), 0); + server.createContext( + "/matrix", + exchange -> { + byte[] body = json.getBytes(java.nio.charset.StandardCharsets.UTF_8); + exchange.sendResponseHeaders(200, body.length); + try (java.io.OutputStream os = exchange.getResponseBody()) { + os.write(body); + } + }); + server.start(); + Runtime.getRuntime().addShutdownHook(new Thread(() -> server.stop(0))); + String url = "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; com.linkedin.metadata.ingestion.HttpUrlIngestionCliVersionMatrixSource httpSource = - new com.linkedin.metadata.ingestion.HttpUrlIngestionCliVersionMatrixSource( - tmp.toUri().toString(), 3600); + new com.linkedin.metadata.ingestion.HttpUrlIngestionCliVersionMatrixSource(url, 3600); IngestionCliVersionMatrixService svc = new IngestionCliVersionMatrixService(httpSource, serverVersion, null); diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java index 284a9959a663..a4234956b094 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java @@ -4,8 +4,11 @@ import com.fasterxml.jackson.databind.ObjectMapper; import jakarta.annotation.PreDestroy; import java.io.InputStream; -import java.net.URL; -import java.net.URLConnection; +import java.net.URI; +import java.net.http.HttpClient; +import java.net.http.HttpRequest; +import java.net.http.HttpResponse; +import java.time.Duration; import java.util.ArrayList; import java.util.HashMap; import java.util.HashSet; @@ -22,9 +25,11 @@ import lombok.extern.slf4j.Slf4j; /** - * {@link IngestionCliVersionMatrixSource} backed by a publicly-readable HTTP URL serving the matrix - * JSON. Suitable for any deployment that wants to fetch the matrix from a CDN, object store (S3, - * GCS), or a GitHub raw URL without rebuilding or redeploying GMS to change connector versions. + * {@link IngestionCliVersionMatrixSource} backed by a publicly-readable HTTP / HTTPS URL serving + * the matrix JSON. Suitable for any deployment that wants to fetch the matrix from a CDN, object + * store (S3, GCS), or a GitHub raw URL without rebuilding or redeploying GMS to change connector + * versions. Uses {@link java.net.http.HttpClient}, which is HTTP/HTTPS-only by design — non-HTTP + * schemes ({@code file://}, {@code jar://}, {@code ftp://}, …) are rejected at request-send time. * *

      The remote JSON must follow this schema: * @@ -103,6 +108,7 @@ public class HttpUrlIngestionCliVersionMatrixSource implements IngestionCliVersi private final AtomicReference cached; private final AtomicLong lastFetchedAtMillis; private final ObjectMapper objectMapper; + private final HttpClient httpClient; /** Background refresh scheduler, stopped by {@link #shutdown()} on Spring context teardown. */ private final ScheduledExecutorService executor; @@ -119,6 +125,8 @@ public HttpUrlIngestionCliVersionMatrixSource( this.cached = new AtomicReference<>(IngestionCliVersionMatrix.EMPTY); this.lastFetchedAtMillis = new AtomicLong(0L); this.objectMapper = new ObjectMapper(); + this.httpClient = + HttpClient.newBuilder().connectTimeout(Duration.ofMillis(FETCH_TIMEOUT_MS)).build(); this.executor = Executors.newSingleThreadScheduledExecutor( @@ -166,16 +174,29 @@ public long getLastFetchedAtMillis() { /** Package-private so tests can force a refresh without waiting for the scheduled tick. */ void refresh() { try { - URL connectionUrl = new URL(url); - URLConnection conn = connectionUrl.openConnection(); - conn.setConnectTimeout(FETCH_TIMEOUT_MS); - conn.setReadTimeout(FETCH_TIMEOUT_MS); - conn.setRequestProperty("User-Agent", "DataHub-GMS"); + HttpRequest.Builder reqBuilder = + HttpRequest.newBuilder(URI.create(url)) + .timeout(Duration.ofMillis(FETCH_TIMEOUT_MS)) + .header("User-Agent", "DataHub-GMS") + .GET(); if (authHeader != null && !authHeader.isEmpty()) { - conn.setRequestProperty("Authorization", authHeader); + reqBuilder.header("Authorization", authHeader); } + // HttpClient enforces HTTP/HTTPS at send-time — non-HTTP schemes (file://, jar://, ftp://, + // …) throw IllegalArgumentException, which the outer catch turns into the same retain-cache + // WARN we already emit for other fetch failures. + HttpResponse response = + httpClient.send(reqBuilder.build(), HttpResponse.BodyHandlers.ofInputStream()); - try (InputStream is = conn.getInputStream()) { + if (response.statusCode() / 100 != 2) { + log.warn( + "Non-2xx response fetching ingestion version matrix from {}: HTTP {}. Retaining last known matrix.", + url, + response.statusCode()); + return; + } + + try (InputStream is = response.body()) { JsonNode root = objectMapper.readTree(is); IngestionCliVersionMatrix parsed; try { @@ -200,6 +221,11 @@ void refresh() { url, parsed.size()); } + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + log.warn( + "Interrupted while refreshing ingestion version matrix from {}. Retaining last known matrix.", + url); } catch (Exception e) { log.warn( "Failed to refresh ingestion version matrix from {}. Retaining last known matrix.", diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java index d149b196c847..e0740471ed72 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionMatrixServiceTest.java @@ -6,12 +6,14 @@ import java.io.IOException; import java.io.OutputStream; import java.net.InetSocketAddress; -import java.nio.file.Files; -import java.nio.file.Path; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.List; import java.util.Optional; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicReference; +import org.testng.annotations.AfterMethod; import org.testng.annotations.Test; public class IngestionCliVersionMatrixServiceTest { @@ -43,25 +45,61 @@ public class IngestionCliVersionMatrixServiceTest { + " }\n" + "}"; + /** Servers started by {@link #startMatrixServer(String)}; stopped after each test method. */ + private final List serversToStop = new ArrayList<>(); + + @AfterMethod + public void stopServers() { + for (HttpServer s : serversToStop) { + try { + s.stop(0); + } catch (Exception ignored) { + } + } + serversToStop.clear(); + } + + /** + * Spins up a one-shot embedded {@link HttpServer} on a random localhost port that serves {@code + * matrixJson} at {@code /matrix}. The server is tracked and stopped by {@link #stopServers()} + * after each test. Returns the URL to point a matrix source at. + */ + private String startMatrixServer(String matrixJson) throws IOException { + HttpServer server = HttpServer.create(new InetSocketAddress("127.0.0.1", 0), 0); + server.createContext( + "/matrix", + exchange -> { + byte[] body = matrixJson.getBytes(StandardCharsets.UTF_8); + exchange.sendResponseHeaders(200, body.length); + try (OutputStream os = exchange.getResponseBody()) { + os.write(body); + } + }); + server.start(); + serversToStop.add(server); + return "http://127.0.0.1:" + server.getAddress().getPort() + "/matrix"; + } + /** - * Returns a service backed by {@link HttpUrlIngestionCliVersionMatrixSource} pointed at a - * temp-file URL containing {@link #MATRIX_JSON}. Polls briefly so the asynchronous initial fetch - * has a chance to populate the cache before the assertions run. + * Returns a service backed by {@link HttpUrlIngestionCliVersionMatrixSource} pointed at an + * embedded HTTP server serving {@link #MATRIX_JSON}. Polls briefly so the asynchronous initial + * fetch has a chance to populate the cache before the assertions run. */ private IngestionCliVersionMatrixService serviceWithMatrix( String serverVersion, String deploymentId) throws IOException { - Path tmp = Files.createTempFile("version-matrix", ".json"); - Files.write(tmp, MATRIX_JSON.getBytes()); - tmp.toFile().deleteOnExit(); + return serviceWithMatrix(MATRIX_JSON, serverVersion, deploymentId); + } + private IngestionCliVersionMatrixService serviceWithMatrix( + String matrixJson, String serverVersion, String deploymentId) throws IOException { HttpUrlIngestionCliVersionMatrixSource httpSource = - new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + new HttpUrlIngestionCliVersionMatrixSource(startMatrixServer(matrixJson), 3600); IngestionCliVersionMatrixService svc = new IngestionCliVersionMatrixService(httpSource, serverVersion, deploymentId); // Wait briefly for the initial fetch to complete (delay=0 in the scheduled executor). for (int i = 0; i < 20; i++) { - if (svc.resolveVersion("bigquery").isPresent()) { + if (httpSource.getLastFetchedAtMillis() > 0) { break; } try { @@ -178,12 +216,8 @@ public void testUnreachableUrlReturnsEmpty() { @Test public void testMalformedJsonReturnsEmpty() throws Exception { - Path tmp = Files.createTempFile("bad-matrix", ".json"); - Files.write(tmp, "not valid json".getBytes()); - tmp.toFile().deleteOnExit(); - HttpUrlIngestionCliVersionMatrixSource httpSource = - new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + new HttpUrlIngestionCliVersionMatrixSource(startMatrixServer("not valid json"), 3600); IngestionCliVersionMatrixService svc = new IngestionCliVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); @@ -217,12 +251,8 @@ public void testCohortWithoutVersion_isSkipped() throws Exception { + " }\n" + "}"; - Path tmp = Files.createTempFile("cohort-no-version", ".json"); - Files.write(tmp, json.getBytes()); - tmp.toFile().deleteOnExit(); - HttpUrlIngestionCliVersionMatrixSource httpSource = - new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + new HttpUrlIngestionCliVersionMatrixSource(startMatrixServer(json), 3600); IngestionCliVersionMatrixService svc = new IngestionCliVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); @@ -256,12 +286,8 @@ public void testCohortWithMissingDeployments_neverMatches() throws Exception { + " }\n" + "}"; - Path tmp = Files.createTempFile("cohort-no-deployments", ".json"); - Files.write(tmp, json.getBytes()); - tmp.toFile().deleteOnExit(); - HttpUrlIngestionCliVersionMatrixSource httpSource = - new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + new HttpUrlIngestionCliVersionMatrixSource(startMatrixServer(json), 3600); IngestionCliVersionMatrixService svc = new IngestionCliVersionMatrixService(httpSource, SERVER_VERSION, "deployment-b1"); @@ -278,9 +304,8 @@ public void testCohortWithMissingDeployments_neverMatches() throws Exception { // ------------------------------------------------------------------------- // HttpUrlIngestionCliVersionMatrixSource HTTP-level behavior (auth header, fetch-failure cache - // retention) - // These exercise the network code path; the other tests use file:// URIs which can't surface - // request-header or HTTP-status behavior. + // retention). These tests inspect request headers and response status codes that the simpler + // serviceWithMatrix() helper does not expose. // ------------------------------------------------------------------------- @Test @@ -480,12 +505,8 @@ public void testConnectorWithoutDefault_andNoCohortMatch_returnsEmpty() throws E + " }\n" + " }\n" + "}"; - Path tmp = Files.createTempFile("no-default", ".json"); - Files.write(tmp, json.getBytes()); - tmp.toFile().deleteOnExit(); - HttpUrlIngestionCliVersionMatrixSource httpSource = - new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + new HttpUrlIngestionCliVersionMatrixSource(startMatrixServer(json), 3600); IngestionCliVersionMatrixService svc = new IngestionCliVersionMatrixService(httpSource, SERVER_VERSION, "deployment-unknown"); @@ -506,14 +527,9 @@ public void testConnectorWithoutDefault_andNoCohortMatch_returnsEmpty() throws E @Test public void testRefreshThreadIsNamedAndDaemon() throws Exception { - // file:// URI is fine for this test — we just need the source to spin up its scheduler; - // the fetch itself is not what we're inspecting. - Path tmp = Files.createTempFile("thread-name", ".json"); - Files.write(tmp, MATRIX_JSON.getBytes()); - tmp.toFile().deleteOnExit(); - + // Any HTTP endpoint suffices — we're inspecting the refresh thread, not the fetched data. HttpUrlIngestionCliVersionMatrixSource source = - new HttpUrlIngestionCliVersionMatrixSource(tmp.toUri().toString(), 3600); + new HttpUrlIngestionCliVersionMatrixSource(startMatrixServer(MATRIX_JSON), 3600); try { waitForFirstFetch(source); From eb57b999d89fc8d328304c6eee561c43960da2ab Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Sun, 31 May 2026 15:51:57 +0530 Subject: [PATCH 14/20] refactor(ingestion): treat CLI version matrix service as a required bean MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The ingestionCliVersionMatrixService is always wired as a Spring bean — a NoOpIngestionCliVersionMatrixSource-backed instance when no matrix backend is configured — so it is never null in production. Stop modelling it as optional: - Remove the telescoping 2-arg constructors from CreateTestConnectionRequest- Resolver and CreateIngestionExecutionRequestResolver; only the 3-arg form was ever used in GmsGraphQLEngine. Both now Objects.requireNonNull the service. - Drop the dead `service != null ? getServerVersion() : null` checks in the resolvers and IngestionScheduler. - IngestionCliVersionResolutionHelper.resolve now takes a @Nonnull service and reads serverVersion from it directly, instead of callers passing both the service and a value pulled from it. - GmsGraphQLEngine requireNonNulls the field to match its @Autowired wiring. Tests updated to construct resolvers with a NoOp-backed service for the matrix-disabled case. No behavior change: OSS still falls through to defaultCliVersion via the empty NoOp matrix. Addresses platform review feedback on PR #17436. Co-Authored-By: Claude Opus 4.8 --- .../datahub/graphql/GmsGraphQLEngine.java | 3 +- ...eateIngestionExecutionRequestResolver.java | 20 ++------ .../CreateTestConnectionRequestResolver.java | 20 ++------ ...IngestionExecutionRequestResolverTest.java | 23 ++++++++-- ...eateTestConnectionRequestResolverTest.java | 32 +++++++++---- .../ingestion/IngestionScheduler.java | 3 +- .../IngestionCliVersionResolutionHelper.java | 26 ++++++----- .../IngestionCliVersionResolutionLogger.java | 2 +- ...gestionCliVersionResolutionHelperTest.java | 46 +++++++++++-------- 9 files changed, 97 insertions(+), 78 deletions(-) diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java index 6ea321524b71..e9eec3cc1811 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java @@ -598,7 +598,8 @@ public GmsGraphQLEngine(final GmsGraphQLEngineArgs args) { this.businessAttributeService = args.businessAttributeService; this.ingestionConfiguration = Objects.requireNonNull(args.ingestionConfiguration); - this.ingestionCliVersionMatrixService = args.ingestionCliVersionMatrixService; + this.ingestionCliVersionMatrixService = + Objects.requireNonNull(args.ingestionCliVersionMatrixService); this.authenticationConfiguration = Objects.requireNonNull(args.authenticationConfiguration); this.authorizationConfiguration = Objects.requireNonNull(args.authorizationConfiguration); this.visualConfiguration = args.visualConfiguration; diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java index 4e60d80a9666..a8cf6d1c9f62 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java @@ -34,6 +34,7 @@ import graphql.schema.DataFetchingEnvironment; import java.util.HashMap; import java.util.Map; +import java.util.Objects; import java.util.UUID; import java.util.concurrent.CompletableFuture; import lombok.extern.slf4j.Slf4j; @@ -55,24 +56,14 @@ public class CreateIngestionExecutionRequestResolver private final IngestionConfiguration _ingestionConfiguration; private final IngestionCliVersionMatrixService _versionMatrixService; - /** Two-arg constructor — no per-connector version matrix is consulted. */ - public CreateIngestionExecutionRequestResolver( - final EntityClient entityClient, final IngestionConfiguration ingestionConfiguration) { - this(entityClient, ingestionConfiguration, null); - } - - /** - * Three-arg constructor for deployments that want matrix-aware version resolution. When {@code - * versionMatrixService} is non-null, the per-connector version matrix is consulted before falling - * back to {@code defaultCliVersion}. - */ public CreateIngestionExecutionRequestResolver( final EntityClient entityClient, final IngestionConfiguration ingestionConfiguration, final IngestionCliVersionMatrixService versionMatrixService) { _entityClient = entityClient; _ingestionConfiguration = ingestionConfiguration; - _versionMatrixService = versionMatrixService; + // Always a wired Spring bean (NoOp-backed when no matrix backend is configured), never null. + _versionMatrixService = Objects.requireNonNull(versionMatrixService); } @Override @@ -157,10 +148,7 @@ public CompletableFuture get(final DataFetchingEnvironment environment) explicitVersion, ingestionSourceInfo.getType(), _versionMatrixService, - _ingestionConfiguration.getDefaultCliVersion(), - _versionMatrixService != null - ? _versionMatrixService.getServerVersion() - : null); + _ingestionConfiguration.getDefaultCliVersion()); arguments.put(VERSION_ARG_NAME, resolution.getVersion()); execInput.setCliVersionAudit(resolution.getStamp()); IngestionCliVersionResolutionLogger.log( diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java index 015f24bd8daa..e745c23f330e 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java @@ -30,6 +30,7 @@ import graphql.schema.DataFetchingEnvironment; import java.util.HashMap; import java.util.Map; +import java.util.Objects; import java.util.UUID; import java.util.concurrent.CompletableFuture; import lombok.extern.slf4j.Slf4j; @@ -71,24 +72,14 @@ public class CreateTestConnectionRequestResolver implements DataFetcher get(final DataFetchingEnvironment environment) input.getVersion(), connectorType, _versionMatrixService, - _ingestionConfiguration.getDefaultCliVersion(), - _versionMatrixService != null - ? _versionMatrixService.getServerVersion() - : null); + _ingestionConfiguration.getDefaultCliVersion()); if (resolution.getVersion() != null && !resolution.getVersion().isEmpty()) { arguments.put(VERSION_ARG_NAME, resolution.getVersion()); } diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java index 3eb54ddfa22a..822ec62223a0 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolverTest.java @@ -21,6 +21,7 @@ import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; +import com.linkedin.metadata.ingestion.NoOpIngestionCliVersionMatrixSource; import com.linkedin.metadata.utils.GenericRecordUtils; import com.linkedin.mxe.MetadataChangeProposal; import com.linkedin.r2.RemoteInvocationException; @@ -37,6 +38,16 @@ public class CreateIngestionExecutionRequestResolverTest { private static final CreateIngestionExecutionRequestInput TEST_INPUT = new CreateIngestionExecutionRequestInput(TEST_INGESTION_SOURCE_URN.toString()); + /** + * A matrix service backed by a {@link NoOpIngestionCliVersionMatrixSource} — what production + * wires when no matrix backend is configured. Always returns an empty matrix, so resolution falls + * through to {@code defaultCliVersion}. + */ + private static IngestionCliVersionMatrixService disabledMatrixService() { + return new IngestionCliVersionMatrixService( + new NoOpIngestionCliVersionMatrixSource(), null, null); + } + @Test public void testGetSuccess() throws Exception { // Create resolver @@ -62,7 +73,8 @@ public void testGetSuccess() throws Exception { IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion("default"); CreateIngestionExecutionRequestResolver resolver = - new CreateIngestionExecutionRequestResolver(mockClient, ingestionConfiguration); + new CreateIngestionExecutionRequestResolver( + mockClient, ingestionConfiguration, disabledMatrixService()); // Execute resolver QueryContext mockContext = getMockAllowContext(); @@ -84,7 +96,8 @@ public void testGetUnauthorized() throws Exception { IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion("default"); CreateIngestionExecutionRequestResolver resolver = - new CreateIngestionExecutionRequestResolver(mockClient, ingestionConfiguration); + new CreateIngestionExecutionRequestResolver( + mockClient, ingestionConfiguration, disabledMatrixService()); // Execute resolver DataFetchingEnvironment mockEnv = Mockito.mock(DataFetchingEnvironment.class); @@ -281,7 +294,8 @@ public void testGetEntityClientException() throws Exception { IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion("default"); CreateIngestionExecutionRequestResolver resolver = - new CreateIngestionExecutionRequestResolver(mockClient, ingestionConfiguration); + new CreateIngestionExecutionRequestResolver( + mockClient, ingestionConfiguration, disabledMatrixService()); // Execute resolver DataFetchingEnvironment mockEnv = Mockito.mock(DataFetchingEnvironment.class); @@ -360,7 +374,8 @@ private String executeAndCaptureVersion(DataHubIngestionSourceInfo sourceInfo) t IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_CLI_VERSION); CreateIngestionExecutionRequestResolver resolver = - new CreateIngestionExecutionRequestResolver(mockClient, ingestionConfiguration); + new CreateIngestionExecutionRequestResolver( + mockClient, ingestionConfiguration, disabledMatrixService()); QueryContext mockContext = getMockAllowContext(); DataFetchingEnvironment mockEnv = Mockito.mock(DataFetchingEnvironment.class); diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java index fd1936a1753d..3514e4eaefb3 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java @@ -13,6 +13,7 @@ import com.linkedin.metadata.Constants; import com.linkedin.metadata.config.IngestionConfiguration; import com.linkedin.metadata.ingestion.IngestionCliVersionMatrixService; +import com.linkedin.metadata.ingestion.NoOpIngestionCliVersionMatrixSource; import com.linkedin.metadata.utils.GenericRecordUtils; import com.linkedin.mxe.MetadataChangeProposal; import graphql.schema.DataFetchingEnvironment; @@ -37,6 +38,16 @@ public class CreateTestConnectionRequestResolverTest { private static final CreateTestConnectionRequestInput TEST_INPUT_NO_VERSION = new CreateTestConnectionRequestInput(SNOWFLAKE_RECIPE, null); + /** + * A matrix service backed by a {@link NoOpIngestionCliVersionMatrixSource} — what production + * wires when no matrix backend is configured. Always returns an empty matrix (so resolution falls + * through to {@code defaultCliVersion}) and reports no server version. + */ + private static IngestionCliVersionMatrixService disabledMatrixService() { + return new IngestionCliVersionMatrixService( + new NoOpIngestionCliVersionMatrixSource(), null, null); + } + @Test public void testExplicitInputVersionWins() throws Exception { EntityClient mockClient = Mockito.mock(EntityClient.class); @@ -90,7 +101,8 @@ public void testFallsBackToDefaultCliVersionWhenNoVersionAndNoMatrix() throws Ex ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); CreateTestConnectionRequestResolver resolver = - new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); + new CreateTestConnectionRequestResolver( + mockClient, ingestionConfiguration, disabledMatrixService()); runAndVerifyVersion(resolver, mockClient, TEST_INPUT_NO_VERSION, DEFAULT_VERSION); } @@ -106,7 +118,8 @@ public void testEmptyVersionFallsBackToDefault() throws Exception { ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); CreateTestConnectionRequestResolver resolver = - new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); + new CreateTestConnectionRequestResolver( + mockClient, ingestionConfiguration, disabledMatrixService()); runAndVerifyVersion( resolver, @@ -123,7 +136,8 @@ public void testWhitespaceVersionFallsBackToDefault() throws Exception { ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); CreateTestConnectionRequestResolver resolver = - new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); + new CreateTestConnectionRequestResolver( + mockClient, ingestionConfiguration, disabledMatrixService()); runAndVerifyVersion( resolver, @@ -175,7 +189,8 @@ public void testGetUnauthorized() throws Exception { IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); CreateTestConnectionRequestResolver resolver = - new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); + new CreateTestConnectionRequestResolver( + mockClient, ingestionConfiguration, disabledMatrixService()); DataFetchingEnvironment mockEnv = Mockito.mock(DataFetchingEnvironment.class); QueryContext mockContext = getMockDenyContext(); @@ -234,10 +249,11 @@ public void testStampsResolutionMetadata_perSourceOverride() throws Exception { EntityClient mockClient = Mockito.mock(EntityClient.class); IngestionConfiguration ingestionConfiguration = new IngestionConfiguration(); ingestionConfiguration.setDefaultCliVersion(DEFAULT_VERSION); - // No matrix configured — the explicit version must still produce a SOURCE_CONFIG_OVERRIDE - // stamp. + // Matrix backend disabled (NoOp source, no server version) — the explicit version must still + // produce a SOURCE_CONFIG_OVERRIDE stamp. CreateTestConnectionRequestResolver resolver = - new CreateTestConnectionRequestResolver(mockClient, ingestionConfiguration); + new CreateTestConnectionRequestResolver( + mockClient, ingestionConfiguration, disabledMatrixService()); ExecutionRequestInput captured = runAndCaptureResolution(resolver, mockClient, TEST_INPUT_WITH_VERSION); @@ -245,7 +261,7 @@ public void testStampsResolutionMetadata_perSourceOverride() throws Exception { assertEquals(captured.getArgs().get("version"), EXPLICIT_VERSION); com.linkedin.execution.CliVersionAudit stamp = captured.getCliVersionAudit(); assertEquals(stamp.getSource(), com.linkedin.execution.CliVersionSource.SOURCE_CONFIG_OVERRIDE); - // No matrix service wired → no serverVersion to stamp. + // The disabled matrix service reports no server version, so none is stamped. assertFalse(stamp.hasServerVersion()); } diff --git a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java index 8ce49ce17c60..b1fbca8290a0 100644 --- a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java +++ b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java @@ -432,8 +432,7 @@ public void run() { explicitVersion, ingestionSourceInfo.getType(), versionMatrixService, - ingestionConfiguration.getDefaultCliVersion(), - versionMatrixService != null ? versionMatrixService.getServerVersion() : null); + ingestionConfiguration.getDefaultCliVersion()); arguments.put(VERSION_ARGUMENT_NAME, resolution.getVersion()); input.setCliVersionAudit(resolution.getStamp()); IngestionCliVersionResolutionLogger.log( diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelper.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelper.java index 0d480177518c..831074211c9c 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelper.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelper.java @@ -3,6 +3,7 @@ import com.linkedin.execution.CliVersionAudit; import com.linkedin.execution.CliVersionSource; import java.util.Optional; +import javax.annotation.Nonnull; import javax.annotation.Nullable; /** @@ -43,12 +44,11 @@ private IngestionCliVersionResolutionHelper() {} * empty if unset * @param connectorType the source-type string from the recipe (e.g. {@code "snowflake"}), or * {@code null} if not derivable (e.g. malformed test-connection recipe) - * @param matrixService the version-matrix service; pass {@code null} for OSS callers that do not - * consult a matrix (e.g. unit-test setups) + * @param matrixService the version-matrix service (always wired as a Spring bean — a {@link + * NoOpIngestionCliVersionMatrixSource}-backed instance when no matrix backend is configured). + * The GMS server version stamped on the audit record is read from {@link + * IngestionCliVersionMatrixService#getServerVersion()}. * @param defaultCliVersion the application-wide fallback from {@code IngestionConfiguration} - * @param serverVersion the GMS server version (typically {@code GitVersion.getVersion()}). - * Stamped on every returned record regardless of which tier hit; pass {@code null} only in - * tests that don't care about audit data. * @return a {@link Result} carrying the resolved version string + the structured stamp. Never * {@code null}; the {@code version} field is guaranteed non-null except when {@code * defaultCliVersion} itself is null/empty (an OSS misconfiguration). @@ -56,9 +56,13 @@ private IngestionCliVersionResolutionHelper() {} public static Result resolve( @Nullable String explicitVersion, @Nullable String connectorType, - @Nullable IngestionCliVersionMatrixService matrixService, - @Nullable String defaultCliVersion, - @Nullable String serverVersion) { + @Nonnull IngestionCliVersionMatrixService matrixService, + @Nullable String defaultCliVersion) { + + // The GMS server version is read from the matrix service (its single source of truth) instead + // of being threaded in by every caller — keeps the audit stamp consistent across all three + // execution paths and avoids passing both the service and a value pulled from it. + final String serverVersion = matrixService.getServerVersion(); // Normalize the per-source version: bootstrap YAML templating can render `version: "{{ // config.version }}"` as null, empty, or three spaces when the source has no version pin, @@ -76,7 +80,7 @@ public static Result resolve( stampWithSource(CliVersionSource.SOURCE_CONFIG_OVERRIDE, serverVersion)); } - if (matrixService != null && connectorType != null && !connectorType.isEmpty()) { + if (connectorType != null && !connectorType.isEmpty()) { Optional matrixResult = matrixService.resolveVersionWithSource(connectorType); if (matrixResult.isPresent()) { @@ -107,8 +111,8 @@ private static CliVersionAudit stampWithSource( /** * Wraps the two outputs of {@link #resolve(String, String, IngestionCliVersionMatrixService, - * String, String)} — the plain CLI version string (for {@code args.version}) and the structured - * audit stamp (for the {@code cliVersionAudit} aspect field). + * String)} — the plain CLI version string (for {@code args.version}) and the structured audit + * stamp (for the {@code cliVersionAudit} aspect field). */ public static final class Result { private final String version; diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionLogger.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionLogger.java index 0938a3ccd5c2..6542b52ac1fd 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionLogger.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionLogger.java @@ -41,7 +41,7 @@ private IngestionCliVersionResolutionLogger() {} * lines appear under the caller's class. * @param trigger short label distinguishing the call site (see {@link #TRIGGER_MANUAL}, etc.) * @param resolution result from {@link IngestionCliVersionResolutionHelper#resolve(String, - * String, IngestionCliVersionMatrixService, String, String)} + * String, IngestionCliVersionMatrixService, String)} * @param connectorType source type from the recipe, may be {@code null} when the recipe lacks a * parseable {@code source.type} * @param identifierKey self-describing key for {@code identifierValue} — typically {@link diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelperTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelperTest.java index 8f8d0cf4c3d1..0839ad22663f 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelperTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/IngestionCliVersionResolutionHelperTest.java @@ -1,5 +1,6 @@ package com.linkedin.metadata.ingestion; +import static org.mockito.ArgumentMatchers.any; import static org.testng.Assert.assertEquals; import static org.testng.Assert.assertNotNull; @@ -17,17 +18,28 @@ * bootstrap YAML templating renders {@code version: "{{ config.version }}"} as three spaces when * the source has no version pin — forwarding that verbatim to the executor would silently pin to * the bundled CLI rather than the configured default. + * + *

      The matrix service is always a wired Spring bean in production, so these tests pass a mock + * (rather than {@code null}); the GMS server version stamped on the audit record is read from + * {@link IngestionCliVersionMatrixService#getServerVersion()}. */ public class IngestionCliVersionResolutionHelperTest { private static final String DEFAULT_CLI = "0.14.0"; private static final String SERVER_VERSION = "1.3.1.4"; + /** A matrix service mock that reports {@link #SERVER_VERSION} and resolves nothing by default. */ + private static IngestionCliVersionMatrixService matrixService() { + IngestionCliVersionMatrixService svc = Mockito.mock(IngestionCliVersionMatrixService.class); + Mockito.when(svc.getServerVersion()).thenReturn(SERVER_VERSION); + return svc; + } + @Test public void testPerSourceVersionWins() { IngestionCliVersionResolutionHelper.Result result = IngestionCliVersionResolutionHelper.resolve( - "0.13.5", "snowflake", null, DEFAULT_CLI, SERVER_VERSION); + "0.13.5", "snowflake", matrixService(), DEFAULT_CLI); assertEquals(result.getVersion(), "0.13.5"); assertEquals(result.getStamp().getSource(), CliVersionSource.SOURCE_CONFIG_OVERRIDE); @@ -38,7 +50,7 @@ public void testPerSourceVersionWins() { public void testPerSourceWhitespaceIsTrimmed() { IngestionCliVersionResolutionHelper.Result result = IngestionCliVersionResolutionHelper.resolve( - " 0.13.5 ", "snowflake", null, DEFAULT_CLI, SERVER_VERSION); + " 0.13.5 ", "snowflake", matrixService(), DEFAULT_CLI); assertEquals(result.getVersion(), "0.13.5"); assertEquals(result.getStamp().getSource(), CliVersionSource.SOURCE_CONFIG_OVERRIDE); @@ -47,7 +59,7 @@ public void testPerSourceWhitespaceIsTrimmed() { @Test public void testPerSourceNullFallsThroughToDefault() { IngestionCliVersionResolutionHelper.Result result = - IngestionCliVersionResolutionHelper.resolve(null, null, null, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.resolve(null, null, matrixService(), DEFAULT_CLI); assertEquals(result.getVersion(), DEFAULT_CLI); assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); @@ -56,7 +68,7 @@ public void testPerSourceNullFallsThroughToDefault() { @Test public void testPerSourceEmptyFallsThroughToDefault() { IngestionCliVersionResolutionHelper.Result result = - IngestionCliVersionResolutionHelper.resolve("", null, null, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.resolve("", null, matrixService(), DEFAULT_CLI); assertEquals(result.getVersion(), DEFAULT_CLI); assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); @@ -69,7 +81,7 @@ public void testPerSourceWhitespaceOnlyFallsThroughToDefault() { // blank string to the executor, which would silently use its bundled CLI rather than the // configured application default. IngestionCliVersionResolutionHelper.Result result = - IngestionCliVersionResolutionHelper.resolve(" ", null, null, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.resolve(" ", null, matrixService(), DEFAULT_CLI); assertEquals(result.getVersion(), DEFAULT_CLI); assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); @@ -77,8 +89,7 @@ public void testPerSourceWhitespaceOnlyFallsThroughToDefault() { @Test public void testMatrixConnectorDefaultWinsOverApplicationDefault() { - IngestionCliVersionMatrixService matrixService = - Mockito.mock(IngestionCliVersionMatrixService.class); + IngestionCliVersionMatrixService matrixService = matrixService(); Mockito.when(matrixService.resolveVersionWithSource("snowflake")) .thenReturn( Optional.of( @@ -87,8 +98,7 @@ public void testMatrixConnectorDefaultWinsOverApplicationDefault() { IngestionCliVersionMatrixService.MatrixSourceLevel.CONNECTOR_DEFAULT))); IngestionCliVersionResolutionHelper.Result result = - IngestionCliVersionResolutionHelper.resolve( - null, "snowflake", matrixService, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.resolve(null, "snowflake", matrixService, DEFAULT_CLI); assertEquals(result.getVersion(), "0.13.5"); assertEquals(result.getStamp().getSource(), CliVersionSource.MATRIX_CONNECTOR_DEFAULT); @@ -96,8 +106,7 @@ public void testMatrixConnectorDefaultWinsOverApplicationDefault() { @Test public void testMatrixCohortWinsOverConnectorDefault() { - IngestionCliVersionMatrixService matrixService = - Mockito.mock(IngestionCliVersionMatrixService.class); + IngestionCliVersionMatrixService matrixService = matrixService(); Mockito.when(matrixService.resolveVersionWithSource("snowflake")) .thenReturn( Optional.of( @@ -105,8 +114,7 @@ public void testMatrixCohortWinsOverConnectorDefault() { "0.13.6", IngestionCliVersionMatrixService.MatrixSourceLevel.COHORT))); IngestionCliVersionResolutionHelper.Result result = - IngestionCliVersionResolutionHelper.resolve( - null, "snowflake", matrixService, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.resolve(null, "snowflake", matrixService, DEFAULT_CLI); assertEquals(result.getVersion(), "0.13.6"); assertEquals(result.getStamp().getSource(), CliVersionSource.MATRIX_COHORT); @@ -116,16 +124,16 @@ public void testMatrixCohortWinsOverConnectorDefault() { public void testNullConnectorTypeSkipsMatrix() { // A malformed test-connection recipe produces a null connector type; we must skip the matrix // and fall through to the application default rather than throwing. - IngestionCliVersionMatrixService matrixService = - Mockito.mock(IngestionCliVersionMatrixService.class); + IngestionCliVersionMatrixService matrixService = matrixService(); IngestionCliVersionResolutionHelper.Result result = - IngestionCliVersionResolutionHelper.resolve( - null, null, matrixService, DEFAULT_CLI, SERVER_VERSION); + IngestionCliVersionResolutionHelper.resolve(null, null, matrixService, DEFAULT_CLI); assertEquals(result.getVersion(), DEFAULT_CLI); assertEquals(result.getStamp().getSource(), CliVersionSource.APPLICATION_DEFAULT); - Mockito.verifyNoInteractions(matrixService); + // The matrix is never consulted when the connector type is unknown (getServerVersion is still + // read for the audit stamp, so we assert specifically on the resolution call). + Mockito.verify(matrixService, Mockito.never()).resolveVersionWithSource(any()); } @Test @@ -133,7 +141,7 @@ public void testNullDefaultStillReturnsStamp() { // OSS misconfiguration (defaultCliVersion not set) — we still emit a deterministic stamp so // forensic queries see a definite answer rather than a missing field. IngestionCliVersionResolutionHelper.Result result = - IngestionCliVersionResolutionHelper.resolve(null, null, null, null, SERVER_VERSION); + IngestionCliVersionResolutionHelper.resolve(null, null, matrixService(), null); assertEquals(result.getVersion(), ""); assertNotNull(result.getStamp()); From 1c029e3717fff3b92448cc990b095ea4b22bf80e Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Sun, 31 May 2026 16:35:42 +0530 Subject: [PATCH 15/20] refactor(ingestion): move extractSourceType to IngestionUtils, reuse shared ObjectMapper CreateTestConnectionRequestResolver parsed the recipe's source.type with a private static `new ObjectMapper()` and a resolver-local helper method. Move that logic to IngestionUtils.extractSourceType(ObjectMapper, String) and pass the OperationContext's shared mapper instead of allocating a new one. - IngestionUtils.extractSourceType: best-effort source.type extraction, ObjectMapper supplied by the caller. - Resolver calls IngestionUtils.extractSourceType( context.getOperationContext().getObjectMapper(), recipe); drops the static MAPPER, the local method, and the Jackson imports. - Test moved to IngestionUtilsTest; IngestTestUtils.getMockAllowContext now stubs getObjectMapper() so resolver tests exercise the op-context mapper. Addresses review feedback on PR #17436. Co-Authored-By: Claude Opus 4.8 --- .../CreateTestConnectionRequestResolver.java | 33 ++---------------- .../resolvers/ingest/IngestTestUtils.java | 2 ++ ...eateTestConnectionRequestResolverTest.java | 11 ------ .../metadata/utils/IngestionUtils.java | 34 +++++++++++++++++++ .../metadata/utils/IngestionUtilsTest.java | 16 +++++++++ 5 files changed, 55 insertions(+), 41 deletions(-) diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java index e745c23f330e..2f5000b7a80c 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolver.java @@ -4,9 +4,6 @@ import static com.linkedin.datahub.graphql.resolvers.mutate.MutationUtils.*; import static com.linkedin.metadata.Constants.*; -import com.fasterxml.jackson.core.JsonProcessingException; -import com.fasterxml.jackson.databind.JsonNode; -import com.fasterxml.jackson.databind.ObjectMapper; import com.linkedin.common.urn.Urn; import com.linkedin.common.urn.UrnUtils; import com.linkedin.data.template.StringMap; @@ -62,11 +59,6 @@ public class CreateTestConnectionRequestResolver implements DataFetcher get(final DataFetchingEnvironment environment) // helper normalizes all three to "unset" so resolution falls through to the matrix / // application default; without that normalization the blank would forward verbatim to // the executor and silently pin to its bundled CLI. - final String connectorType = extractSourceType(input.getRecipe()); + final String connectorType = + IngestionUtils.extractSourceType( + context.getOperationContext().getObjectMapper(), input.getRecipe()); final IngestionCliVersionResolutionHelper.Result resolution = IngestionCliVersionResolutionHelper.resolve( input.getVersion(), @@ -159,25 +153,4 @@ public CompletableFuture get(final DataFetchingEnvironment environment) this.getClass().getSimpleName(), "get"); } - - /** - * Best-effort extraction of {@code source.type} from a recipe JSON document. Returns {@code null} - * for any malformed input — the resolver falls back to {@code defaultCliVersion} in that case - * rather than failing the request, since a malformed recipe will surface a clearer error - * downstream when the executor parses it. - */ - static String extractSourceType(final String recipeJson) { - if (recipeJson == null || recipeJson.isEmpty()) { - return null; - } - try { - // path() returns a missing node when a segment is absent, so the chained lookup handles - // missing source / missing type uniformly without explicit has() checks. - JsonNode type = MAPPER.readTree(recipeJson).path(SOURCE_FIELD).path(TYPE_FIELD); - return type.isTextual() && !type.asText().isEmpty() ? type.asText() : null; - } catch (JsonProcessingException e) { - log.debug("Could not extract source.type from recipe for version-matrix lookup", e); - return null; - } - } } diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/IngestTestUtils.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/IngestTestUtils.java index bc8162de35c9..76a31728b964 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/IngestTestUtils.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/IngestTestUtils.java @@ -8,6 +8,7 @@ import com.datahub.authorization.AuthorizationResult; import com.datahub.authorization.EntitySpec; +import com.fasterxml.jackson.databind.ObjectMapper; import com.linkedin.common.urn.Urn; import com.linkedin.datahub.graphql.QueryContext; import com.linkedin.datahub.graphql.generated.Secret; @@ -35,6 +36,7 @@ public static QueryContext getMockAllowContext() { when(mockContext.getOperationContext()).thenReturn(mock(OperationContext.class)); when(mockContext.getOperationContext().authorize(any(), nullable(EntitySpec.class), any())) .thenReturn(new AuthorizationResult(null, AuthorizationResult.Type.ALLOW, "")); + when(mockContext.getOperationContext().getObjectMapper()).thenReturn(new ObjectMapper()); return mockContext; } diff --git a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java index 3514e4eaefb3..03ac6cf10b1c 100644 --- a/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java +++ b/datahub-graphql-core/src/test/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateTestConnectionRequestResolverTest.java @@ -31,7 +31,6 @@ public class CreateTestConnectionRequestResolverTest { "{\"source\":{\"type\":\"snowflake\",\"config\":{\"account_id\":\"abc123\"}}}"; private static final String RECIPE_WITHOUT_TYPE = "{\"source\":{\"config\":{\"account_id\":\"abc123\"}}}"; - private static final String MALFORMED_RECIPE = "{not valid json"; private static final CreateTestConnectionRequestInput TEST_INPUT_WITH_VERSION = new CreateTestConnectionRequestInput(SNOWFLAKE_RECIPE, EXPLICIT_VERSION); @@ -201,16 +200,6 @@ public void testGetUnauthorized() throws Exception { Mockito.verify(mockClient, Mockito.times(0)).ingestProposal(any(), Mockito.any(), anyBoolean()); } - @Test - public void testExtractSourceType() { - assertEquals( - CreateTestConnectionRequestResolver.extractSourceType(SNOWFLAKE_RECIPE), "snowflake"); - assertNull(CreateTestConnectionRequestResolver.extractSourceType(RECIPE_WITHOUT_TYPE)); - assertNull(CreateTestConnectionRequestResolver.extractSourceType(MALFORMED_RECIPE)); - assertNull(CreateTestConnectionRequestResolver.extractSourceType("")); - assertNull(CreateTestConnectionRequestResolver.extractSourceType(null)); - } - /** * Forensic stamp: the resolution record on the ExecutionRequestInput must reflect which * resolution path actually fired (cohort vs connector default vs workspace default), with diff --git a/metadata-utils/src/main/java/com/linkedin/metadata/utils/IngestionUtils.java b/metadata-utils/src/main/java/com/linkedin/metadata/utils/IngestionUtils.java index 6cb00e2666a3..fc1a9eeb8551 100644 --- a/metadata-utils/src/main/java/com/linkedin/metadata/utils/IngestionUtils.java +++ b/metadata-utils/src/main/java/com/linkedin/metadata/utils/IngestionUtils.java @@ -1,13 +1,20 @@ package com.linkedin.metadata.utils; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; import javax.annotation.Nonnull; import javax.annotation.Nullable; +import lombok.extern.slf4j.Slf4j; import org.json.JSONException; import org.json.JSONObject; +@Slf4j public class IngestionUtils { private static final String PIPELINE_NAME = "pipeline_name"; + private static final String SOURCE_FIELD = "source"; + private static final String TYPE_FIELD = "type"; private IngestionUtils() {} @@ -25,6 +32,33 @@ public static String resolveIngestionCliVersion( return defaultCliVersion; } + /** + * Best-effort extraction of {@code source.type} from a recipe JSON document. Returns {@code null} + * for any malformed input — callers fall back to a default rather than failing the request, since + * a malformed recipe surfaces a clearer error downstream when the executor parses it. + * + * @param mapper the {@link ObjectMapper} to parse with — pass a shared instance (e.g. the + * OperationContext's) rather than allocating a new one per call + * @param recipeJson the recipe JSON string, may be {@code null} or empty + * @return the {@code source.type} value, or {@code null} if absent / not derivable + */ + @Nullable + public static String extractSourceType( + @Nonnull final ObjectMapper mapper, @Nullable final String recipeJson) { + if (recipeJson == null || recipeJson.isEmpty()) { + return null; + } + try { + // path() returns a missing node when a segment is absent, so the chained lookup handles + // missing source / missing type uniformly without explicit has() checks. + JsonNode type = mapper.readTree(recipeJson).path(SOURCE_FIELD).path(TYPE_FIELD); + return type.isTextual() && !type.asText().isEmpty() ? type.asText() : null; + } catch (JsonProcessingException e) { + log.debug("Could not extract source.type from recipe for version-matrix lookup", e); + return null; + } + } + /** * Injects a pipeline_name into a recipe if there isn't a pipeline_name already there. The * pipeline_name will be the urn of the ingestion source. diff --git a/metadata-utils/src/test/java/com/linkedin/metadata/utils/IngestionUtilsTest.java b/metadata-utils/src/test/java/com/linkedin/metadata/utils/IngestionUtilsTest.java index c037371b717a..c029fe4ad3d5 100644 --- a/metadata-utils/src/test/java/com/linkedin/metadata/utils/IngestionUtilsTest.java +++ b/metadata-utils/src/test/java/com/linkedin/metadata/utils/IngestionUtilsTest.java @@ -1,12 +1,28 @@ package com.linkedin.metadata.utils; import static org.testng.Assert.assertEquals; +import static org.testng.Assert.assertNull; +import com.fasterxml.jackson.databind.ObjectMapper; import org.testng.annotations.Test; public class IngestionUtilsTest { private final String ingestionSourceUrn = "urn:li:ingestionSource:12345"; + private final ObjectMapper mapper = new ObjectMapper(); + + @Test + public void extractSourceTypeFromRecipe() { + assertEquals( + IngestionUtils.extractSourceType( + mapper, "{\"source\":{\"type\":\"snowflake\",\"config\":{\"account_id\":\"abc\"}}}"), + "snowflake"); + // source present but no type, malformed JSON, empty, and null all return null + assertNull(IngestionUtils.extractSourceType(mapper, "{\"source\":{\"config\":{}}}")); + assertNull(IngestionUtils.extractSourceType(mapper, "{not valid json")); + assertNull(IngestionUtils.extractSourceType(mapper, "")); + assertNull(IngestionUtils.extractSourceType(mapper, null)); + } @Test public void injectPipelineNameWhenThere() { From 5b011a82245010e7c19c34fd463b863d23b4b25c Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Sun, 31 May 2026 16:42:25 +0530 Subject: [PATCH 16/20] refactor(ingestion): drop redundant hasVersion() guard on per-source CLI version MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DataHubIngestionSourceConfig.version is an optional PDL field, so the generated getVersion() already returns null when unset. The `hasVersion() ? getVersion() : null` ternary in the manual-execution resolver and the scheduler was redundant — pass getConfig().getVersion() straight to the resolution helper, which already normalizes null / empty / whitespace-only to "unset". No behavior change; matches how the test-connection resolver passes its raw input version. Addresses review feedback on PR #17436. Co-Authored-By: Claude Opus 4.8 --- .../CreateIngestionExecutionRequestResolver.java | 16 +++++----------- .../metadata/ingestion/IngestionScheduler.java | 16 +++++----------- 2 files changed, 10 insertions(+), 22 deletions(-) diff --git a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java index a8cf6d1c9f62..2f6a62fdf5ac 100644 --- a/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java +++ b/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/resolvers/ingest/execution/CreateIngestionExecutionRequestResolver.java @@ -133,19 +133,13 @@ public CompletableFuture get(final DataFetchingEnvironment environment) recipe = injectRunId(recipe, executionRequestUrn.toString()); recipe = IngestionUtils.injectPipelineName(recipe, ingestionSourceUrn.toString()); arguments.put(RECIPE_ARG_NAME, recipe); - // Per-source version may be null, empty, or whitespace-only — bootstrap YAML - // templating renders `version: "{{ config.version }}"` as 3 spaces when the source - // has no pin, and Mustache treats missing keys as empty rather than failing. The - // helper normalizes all three to "unset" so resolution falls through to the matrix - // / application default; without that, the blank would forward to the executor and - // silently pin to its bundled CLI. - final String explicitVersion = - ingestionSourceInfo.getConfig().hasVersion() - ? ingestionSourceInfo.getConfig().getVersion() - : null; + // getVersion() returns null for an unset optional field, so no hasVersion() guard is + // needed. The helper normalizes null / empty / whitespace-only versions (bootstrap + // YAML can render `version: "{{ config.version }}"` as 3 spaces) to "unset", falling + // through to the matrix / application default instead of pinning the bundled CLI. final IngestionCliVersionResolutionHelper.Result resolution = IngestionCliVersionResolutionHelper.resolve( - explicitVersion, + ingestionSourceInfo.getConfig().getVersion(), ingestionSourceInfo.getType(), _versionMatrixService, _ingestionConfiguration.getDefaultCliVersion()); diff --git a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java index b1fbca8290a0..51467a41ef7d 100644 --- a/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java +++ b/ingestion-scheduler/src/main/java/com/datahub/metadata/ingestion/IngestionScheduler.java @@ -417,19 +417,13 @@ public void run() { IngestionUtils.injectPipelineName( ingestionSourceInfo.getConfig().getRecipe(), ingestionSourceUrn.toString()); arguments.put(RECIPE_ARGUMENT_NAME, recipe); - // Per-source version may be null, empty, or whitespace-only — bootstrap YAML templating - // renders `version: "{{ config.version }}"` as 3 spaces when the source has no pin, and - // Mustache treats missing keys as empty rather than failing. The helper normalizes all - // three to "unset" so resolution falls through to the matrix / application default; - // without that, the blank would forward to the executor and silently pin to its bundled - // CLI. - final String explicitVersion = - ingestionSourceInfo.getConfig().hasVersion() - ? ingestionSourceInfo.getConfig().getVersion() - : null; + // getVersion() returns null for an unset optional field, so no hasVersion() guard is + // needed. The helper normalizes null / empty / whitespace-only (bootstrap YAML can render + // `version: "{{ config.version }}"` as 3 spaces) to "unset" so resolution falls through to + // the matrix / application default rather than pinning the executor's bundled CLI. final IngestionCliVersionResolutionHelper.Result resolution = IngestionCliVersionResolutionHelper.resolve( - explicitVersion, + ingestionSourceInfo.getConfig().getVersion(), ingestionSourceInfo.getType(), versionMatrixService, ingestionConfiguration.getDefaultCliVersion()); From 623c9388f2c8d5de08f709249055facf5f26d4b6 Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Sun, 31 May 2026 16:51:35 +0530 Subject: [PATCH 17/20] refactor(ingestion): drop redundant @Scope("singleton") on matrix beans Singleton is the default Spring bean scope, so the explicit @Scope("singleton") on both ingestionCliVersionMatrixSource and ingestionCliVersionMatrixService beans (and the Scope import) added nothing. Remove for consistency with the rest of the codebase. Addresses review feedback on PR #17436. Co-Authored-By: Claude Opus 4.8 --- .../ingestion/IngestionCliVersionMatrixServiceFactory.java | 3 --- 1 file changed, 3 deletions(-) diff --git a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java index 818a80adb092..c47478c3c17c 100644 --- a/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java +++ b/metadata-service/factories/src/main/java/com/linkedin/gms/factory/ingestion/IngestionCliVersionMatrixServiceFactory.java @@ -14,7 +14,6 @@ import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; -import org.springframework.context.annotation.Scope; /** * Wires up the per-connector ingestion CLI version matrix. @@ -58,7 +57,6 @@ public class IngestionCliVersionMatrixServiceFactory { * when {@code http.url} is non-empty, and a no-op source when the URL is empty. */ @Bean(name = "ingestionCliVersionMatrixSource") - @Scope("singleton") @Nonnull protected IngestionCliVersionMatrixSource ingestionCliVersionMatrixSource() { CliVersionMatrixConfiguration matrixConfig = @@ -82,7 +80,6 @@ private static String trim(String s) { } @Bean(name = "ingestionCliVersionMatrixService") - @Scope("singleton") @Nonnull protected IngestionCliVersionMatrixService getInstance( @Qualifier("ingestionCliVersionMatrixSource") From 1844ef2b21b00cb49c2bc0f772c3403bb6e836dc Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Sun, 31 May 2026 16:52:25 +0530 Subject: [PATCH 18/20] docs(ingestion): document non-PyPI version sentinels accepted by matrix validation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The permissive version pattern (^[\w.+!-]+$) already accepts the special non-PyPI sentinels the ingestion executor recognizes (e.g. "bundled", "no-acryl-datahub") — they are alphanumeric/hyphenated so they pass. Make that intent explicit in the Javadoc (noting the list is non-exhaustive by design) and add a regression test (permissiveVersionPatternAcceptsSpecialSentinels) locking in that these sentinels load rather than being rejected as invalid. Addresses review feedback on PR #17436. Co-Authored-By: Claude Opus 4.8 --- ...ttpUrlIngestionCliVersionMatrixSource.java | 11 ++++++++++ ...nCliVersionMatrixSourceValidationTest.java | 20 +++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java index a4234956b094..fc0f7992abc9 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java @@ -88,6 +88,17 @@ public class HttpUrlIngestionCliVersionMatrixSource implements IngestionCliVersi *

    • {@code acryl-1.6.0+acryl.20251031} — internal build with local version identifier *
    * + *

    Also accepts the special non-PyPI version sentinels the ingestion executor recognizes — + * these are not PEP 440 / PyPI versions and must not be rejected. This list is not + * exhaustive; the permissive character class is intentional so future sentinels keep working + * without a code change here. Known examples: + * + *

      + *
    • {@code bundled} — run the CLI bundled in the executor image, skipping the pip install + *
    • {@code no-acryl-datahub} — opt out of installing acryl-datahub (the recipe pulls it in + * transitively) + *
    + * *

    Rejects: whitespace, embedded JSON like {@code {"version":"…"}}, HTML fragments, URLs, * anything containing characters outside the allowed set. */ diff --git a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java index c957ed1a05e5..599615da99c2 100644 --- a/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java +++ b/metadata-service/configuration/src/test/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSourceValidationTest.java @@ -131,6 +131,26 @@ public void permissiveVersionPatternAcceptsRealPyPiVersions() throws Exception { "all real PyPI-style versions should pass the permissive pattern"); } + @Test + public void permissiveVersionPatternAcceptsSpecialSentinels() throws Exception { + // Ingestion supports special non-PyPI version sentinels (e.g. "bundled", "no-acryl-datahub"). + // These are not PEP 440 versions and must not be rejected by the cleanliness check. + JsonNode root = + MAPPER.readTree( + "{\"1.5.0\": {\"snowflake\": {" + + "\"_default\": \"bundled\"," + + "\"cohorts\": [{\"version\": \"no-acryl-datahub\", \"deployments\": [\"acme\"]}]" + + "}}}"); + IngestionCliVersionMatrix m = HttpUrlIngestionCliVersionMatrixSource.parseMatrix(root); + ConnectorEntry snowflake = m.getEntriesForServer("1.5.0").getConnectorEntry("snowflake"); + assertEquals(snowflake.getDefaultVersion(), "bundled", "'bundled' sentinel must be accepted"); + assertEquals(snowflake.getCohorts().size(), 1); + assertEquals( + snowflake.getCohorts().get(0).getVersion(), + "no-acryl-datahub", + "'no-acryl-datahub' sentinel must be accepted"); + } + @Test public void connectorValueNotObjectIsSkippedOthersKept() throws Exception { // "snowflake" got assigned an array by mistake — drop it. "bigquery" is fine — keep it. From 96f7920a9473034eff78aae0c5f77db719d330f1 Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Sun, 31 May 2026 17:24:57 +0530 Subject: [PATCH 19/20] refactor(ingestion): remove superseded resolveIngestionCliVersion util MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit resolveIngestionCliVersion only did blank->default fallback with no matrix lookup. Now that all three call sites (manual, scheduled, test-connection) route through IngestionCliVersionResolutionHelper.resolve — which folds that normalization into the matrix-aware resolution — the old util has no remaining callers. Remove it (and its tests) so there is a single CLI-version resolution path and no method that silently bypasses the matrix. Addresses review feedback on PR #17436. Co-Authored-By: Claude Opus 4.8 --- .../linkedin/metadata/utils/IngestionUtils.java | 14 -------------- .../metadata/utils/IngestionUtilsTest.java | 14 -------------- 2 files changed, 28 deletions(-) diff --git a/metadata-utils/src/main/java/com/linkedin/metadata/utils/IngestionUtils.java b/metadata-utils/src/main/java/com/linkedin/metadata/utils/IngestionUtils.java index fc1a9eeb8551..2ea6d8546366 100644 --- a/metadata-utils/src/main/java/com/linkedin/metadata/utils/IngestionUtils.java +++ b/metadata-utils/src/main/java/com/linkedin/metadata/utils/IngestionUtils.java @@ -18,20 +18,6 @@ public class IngestionUtils { private IngestionUtils() {} - /** - * Returns the CLI version to pass into an ingestion execution when the ingestion source config - * may carry an optional version (including blank strings from templated bootstrap YAML). Blank or - * null configured values fall back to the server default. - */ - @Nonnull - public static String resolveIngestionCliVersion( - @Nullable String configuredVersion, @Nonnull String defaultCliVersion) { - if (configuredVersion != null && !configuredVersion.trim().isEmpty()) { - return configuredVersion.trim(); - } - return defaultCliVersion; - } - /** * Best-effort extraction of {@code source.type} from a recipe JSON document. Returns {@code null} * for any malformed input — callers fall back to a default rather than failing the request, since diff --git a/metadata-utils/src/test/java/com/linkedin/metadata/utils/IngestionUtilsTest.java b/metadata-utils/src/test/java/com/linkedin/metadata/utils/IngestionUtilsTest.java index c029fe4ad3d5..4c89848a854f 100644 --- a/metadata-utils/src/test/java/com/linkedin/metadata/utils/IngestionUtilsTest.java +++ b/metadata-utils/src/test/java/com/linkedin/metadata/utils/IngestionUtilsTest.java @@ -32,20 +32,6 @@ public void injectPipelineNameWhenThere() { assertEquals(recipe, IngestionUtils.injectPipelineName(recipe, ingestionSourceUrn)); } - @Test - public void resolveIngestionCliVersionUsesDefaultForNullBlankOrEmpty() { - String def = "0.12.3"; - assertEquals(def, IngestionUtils.resolveIngestionCliVersion(null, def)); - assertEquals(def, IngestionUtils.resolveIngestionCliVersion("", def)); - assertEquals(def, IngestionUtils.resolveIngestionCliVersion(" ", def)); - } - - @Test - public void resolveIngestionCliVersionUsesConfiguredWhenPresent() { - assertEquals("1.2.3", IngestionUtils.resolveIngestionCliVersion("1.2.3", "0.12.3")); - assertEquals("1.2.3", IngestionUtils.resolveIngestionCliVersion(" 1.2.3 ", "0.12.3")); - } - @Test public void injectPipelineNameWhenNotThere() { String recipe = From 456e90058f51624b72450a637374ffceac2e9abf Mon Sep 17 00:00:00 2001 From: Puneet Agarwal Date: Thu, 4 Jun 2026 18:32:36 +0530 Subject: [PATCH 20/20] feat(ingestion-matrix): send Accept: application/vnd.github.raw on matrix fetch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Lets the matrix URL point at the GitHub contents API (api.github.com/repos///contents/?ref=), the only authenticated way to read a file from a private/internal GitHub repo — raw.githubusercontent.com does not honor the Authorization header for those. With this Accept the contents API returns the raw file body instead of the base64-wrapped JSON envelope (which would otherwise parse to an empty matrix). Plain file hosts (raw URLs, gists, S3, CDNs) ignore the unknown Accept and still return the file, so it's safe for public URLs too — backward compatible with the existing gist setup. Co-Authored-By: Claude Opus 4.8 --- .../ingestion/HttpUrlIngestionCliVersionMatrixSource.java | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java index fc0f7992abc9..690a726de013 100644 --- a/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java +++ b/metadata-service/configuration/src/main/java/com/linkedin/metadata/ingestion/HttpUrlIngestionCliVersionMatrixSource.java @@ -189,6 +189,14 @@ void refresh() { HttpRequest.newBuilder(URI.create(url)) .timeout(Duration.ofMillis(FETCH_TIMEOUT_MS)) .header("User-Agent", "DataHub-GMS") + // Lets the URL point at the GitHub "contents" API + // (https://api.github.com/repos///contents/?ref=) — the only + // authenticated way to read a file from a private/internal GitHub repo, since + // raw.githubusercontent.com does not honor the Authorization header for those. With + // this Accept the contents API returns the raw file body instead of base64 JSON. + // Plain file hosts (raw URLs, gists, S3, CDNs) ignore an unknown Accept and still + // return the file, so sending it unconditionally is safe for public URLs too. + .header("Accept", "application/vnd.github.raw") .GET(); if (authHeader != null && !authHeader.isEmpty()) { reqBuilder.header("Authorization", authHeader);