Connect-V2, SDK acceptance-test framework, and the 2.0/v2 migration#943
Merged
Conversation
6b8dafb to
5c5a659
Compare
Comment on lines
+604
to
+607
| ers := resp.Payload.Data.EdgeRouters | ||
| for _, er := range ers { | ||
| self.sanitizeEdgeRouterUrls(er) | ||
| } |
Member
There was a problem hiding this comment.
Should probably nil check resp.Payload/Data
| @@ -1 +1 @@ | |||
| 1.8 | |||
Member
There was a problem hiding this comment.
I am confused now. 1.9 here but 2.0 in the changelog?
Member
Author
There was a problem hiding this comment.
fixed, it's all v2.0.0 now
andrewpmartinez
previously requested changes
Jun 15, 2026
andrewpmartinez
left a comment
Member
There was a problem hiding this comment.
The 2.0 changelog, vs the 1.9 version file, vs the breaking changes need to be resolved.
Implements the Connect-V2 sessionless SDK dial protocol — dials are
authorized at the router via RDM instead of requiring a controller-issued
service session token.
- Adds `CtrlClient.GetServiceEdgeRouters` wrapping the existing
`GET /edge/client/v1/services/{id}/edge-routers` endpoint for
sessionless edge-router discovery.
- Adds a per-service ER cache on `ContextImpl.serviceEdgeRouters`,
refreshed on the `sessionRefreshTimer` cadence and warming router
connections the same way the session cache does. Invalidates on
service removal and on dial failure.
- Refactors `dialSession` to skip service-session creation on the V2
path; V1 fallback now creates the service session lazily, only when
the V1 wire is actually taken.
- Forces `UseXgressToSdkHeader=true` on Connect-V2 dials (the go-SDK
V2 path is always `edgeConnXgress`); the router continues to support
both flow-control modes for other SDKs.
- Adds `DialOptions.ForceConnectV1` escape hatch to route through the
V1 path even when the router advertises V2.
- Removes `route_circuit` / `pending_dials`. The xgress `CircuitStart`
handshake already serializes data flow, so the race window the
module was guarding against does not exist. Reads the circuit ID
from `state_connected`'s `CircuitIdHeader` instead.
- Registers a buffering sink with the mux before each dial request goes
out (all three dial paths), then atomically swaps in the built conn
via the new `ConnMux.Replace`, replaying anything buffered in order.
The hosting side sends its e2e crypto header the moment it accepts,
and that data could arrive before the dialing goroutine resumed from
`SendForReply` and registered the conn — the mux dropped it, killing
the read side with "failed to receive crypto header bytes".
- Uses the sessionless ER-list endpoint on controllers 1.0.0 and newer
(`0.0.0` dev builds included), gated by
`CtrlClient.supportsServiceEdgeRouterList`. Older controllers don't
expose the endpoint on the client API; for those the SDK falls back
to creating a dial session and using its attached edge routers, with
the session cache as the source of truth. While the controller
version is unknown, dials take the V1 path and the version load is
retried on the next capability check. If the sessionless endpoint
errors despite the version saying it should exist, the dial falls
back to the V1 session path at runtime and drops the cached ER list.
Documents the supported controller set (the 1.6.x and 2.0.0 LTS
releases) in CHANGELOG and bumps the version from 1.7 to 2.0.
- Creates fallback sessions on the dial path via
`createSessionWithBackoff`, so foreground dials get ctx
cancellation, retry with backoff, re-auth on 401 and
service-recreation handling; the background refresh loop uses a
plain cached lookup.
- Restructures `createSessionWithBackoff` to run the retry before
returning the session, removing a dead pre-retry cache call and a
return statement that relied on operand evaluation order.
- Adds `EventDial` / `AddDialListener`: one event per dial attempt,
successful or not, emitted from the dial path where the V1/V2 decision
is made — so the negotiated protocol, target router, forced-V1 flag,
timing, circuit id, and failure cause are observable without tracking
any state on connections.
- Removes `GetRouterId()` from `edge.Conn` and `edge.MsgChannel`;
documents the removal as a breaking change in CHANGELOG.
- Bounds-checks `splitMultipart` length prefixes; returns descriptive
errors instead of panicking on truncated input.
- Fixes `handlePayloadWithNoSink` to send the constructed ack instead
of the original message.
- Removes the `conn.Close()` calls from `setupXgressFlowControl`'s
header-validation error paths so the conn no longer NPEs before
`xg` / `writeAdapter` are populated.
For openziti/ziti#3884.
- adds acceptance-tests.md, the design for correctness-testing the SDK against multiple OpenZiti versions (LTS lines, latest release, branches/commits) - adds the acceptance/ module scaffold with its own go.mod and a replace directive targeting the local SDK - adds versions.yaml (label -> release pointers, source repo) with a strict-decode loader - classifies ZITI_ACCEPTANCE_VERSION selectors into labels, release versions, and git refs - resolves labels and release versions to concrete tags via the GitHub releases API, excluding drafts and prereleases and handling vM.m.x minor wildcards - uses golang.org/x/mod/semver for version comparison - tests resolution against unsorted, prerelease, draft, and paginated release fixtures - ignores local review-tooling files (mercurius, .mcp.json)
- extends the GitHub client with release-by-tag asset lookup and download, resolving asset URLs from the API rather than constructing filenames - extracts the ziti binary from release tar.gz archives at any path depth - caches binaries keyed by immutable id (tag or SHA) with atomic install; ZITI_ACCEPTANCE_CACHE overrides the location - selects platform assets with arch aliases (amd64/x86_64, arm64/aarch64), reporting zip-only matches distinctly - tests download/extract/cache behavior against a counting fake release server - adds an opt-in live test against the real GitHub API, gated on ZITI_ACCEPTANCE_LIVE=1
- adds an open item: promote internal/acquire to a sibling nested module (acquire/vX.Y.Z tag line) once the harness and source-build phases prove its API, so ziti/zititest can replace stageziti's GH-release fetch core with it - records the hard rule that acquire imports no SDK packages, keeping the extraction mechanical
- adds the ziticli exec wrapper: every invocation runs with an isolated ZITI_CONFIG_DIR and a scrubbed ZITI_* environment, so harness logins never collide with the developer's own CLI state - launches `ziti edge quickstart --no-router` as a long-lived controller-only child process with dynamic port allocation and captured logs - gates readiness on the bootstrap contract: HTTPS 200 plus admin login plus a harmless admin operation, reporting an admin-gate failure as a directed bootstrap contract violation naming the version, with a controller log tail - adds the harness package: StartShared/Start, Cli, Version with AtLeast (source builds satisfy every minimum), RequireMinVersion - tags the bring-up test with the acceptance build tag so the default suite stays network-free; verified live against latest (v2.0.0), active-lts (v2.0.x wildcard), and maint-lts (v1.6.17)
- adds an open item to source LTS labels from the ziti repo's lts-versions.json once openziti/ziti#3962 merges, keeping our own resolution logic and versions.yaml for source/repo and overrides
- adds CreateIdentity: creates and enrolls identities via the versioned CLI per the setup contract, with per-test unique names and best-effort cleanup per the isolation contract - adds NewSdkContext: builds an authenticated ziti.Context from a CLI-enrolled identity using the SDK in this tree via the module replace directive - adds the first SDK smoke test: authenticate, list services, and current-identity round trip; verified live against latest (v2.0.0) and maint-lts (v1.6.17)
- adds cmd/matrix: runs the tagged acceptance suite once per version selector with a per-version pass/fail summary and non-zero exit on any failure - defaults the selector list to the versions.yaml labels plus latest, so the matrix has a single source of truth; arguments select a subset and anything after -- passes through to go test (e.g. -run for one test across all versions) - forces -count=1 since go test's cache cannot see controller-side state - supports -fail-fast to stop at the first failing version - exports acquire.FindVersionsFile and reuses it from the harness
- documents quick start, version selection, the matrix runner, the binary cache, opt-in live tests, module layout, and platform notes - links the acceptance module from the top-level README package list
- adds AddRouter: creates, enrolls, and runs an edge router as its own child process via the version-stable CLI sequence, with config generation driven by ZITI_* env vars verified identical on 1.6 and main - gates router readiness on TCP listen plus controller-reported online status, and supports Stop/Start for failover tests without re-enrollment - adds service and policy helpers (CreateService, GrantDial/GrantBind, GrantRouterAccess, GrantServiceRouterAccess) that target entities by name per the isolation contract - adds Test_DialHostEcho: SDK hosts and dials through the router with bounded first-dial retry, exercising half-close and EOF propagation in both directions - verified live against latest (v2.0.0) and maint-lts (v1.6.17); full-matrix runs surfaced an intermittent SDK data-plane race, tracked separately
- skips the API entirely for selectors that pin a concrete tag (explicit versions and non-wildcard label values) when the binary is already cached; the cache entry is proof the release exists, so warm pinned runs make zero API calls - adds acquire.ZitiMemoized, used by the harness: one resolution per selector per process, so a suite whose tests each start a harness stays off the rate limit and on a consistent version; failures are not cached - renames acquire.Acquire to acquire.Ziti so call sites read without stutter (acquire.Ziti, acquire.ZitiMemoized) - directs rate-limit failures: a 403 rate-limit response now names GITHUB_TOKEN as the fix - documents the rate-limit behavior in the README - tests the shortcut and memo against an all-erroring source (proving zero API calls) and the 403 hint against a fake server
- adds acceptance/tests with TestMain bringing up one shared environment per package via StartShared, per the design's Layer 5 model; per-test cost drops from a controller boot each to under a few seconds - StartShared now materializes the default topology (controller plus the edge1 router); AddRouter and Router.Start become thin testing.TB wrappers over error-returning internals so TestMain has a TB-free bring-up path - P0 #1: the dial/host echo smoke gains service-discovery content assertions (each identity sees the service with exactly its granted permissions, lookup by name agrees) alongside half-close/EOF and the dial-event protocol check - P0 #1b: adds the SDK enrollment round-trip test, the one place enroll.Enroll is the system under test; adds CreateUnenrolledIdentity to support it - P0 #3: adds the auth-modes tests: OIDC and forced-legacy contexts each complete an echo round trip with the session type asserted via the new NewLegacySdkContext and ApiSessionType helpers, and ext-JWT works as a primary credential via a fully headless flow (locally generated signer registered through the versioned CLI, locally minted JWT, JwtCredentials login, identity-match and GetExternalSigners assertions) - extracts shared test helpers (echo server, echo round trip, dial retry); migrates the smoke tests from the harness package, which keeps only the per-version bring-up canary - verified live against latest (v2.0.0) and maint-lts (v1.6.17); both lines negotiate OIDC by default and force legacy correctly
- resolves branch/tag refs to a full commit SHA via git ls-remote before any cache interaction, then shallow-fetches exactly that SHA so a moving branch cannot change what gets built - builds the ziti binary from the checkout and installs it into the cache keyed by SHA - adds co-development mode (ZITI_ACCEPTANCE_BUILD_WITH_LOCAL_SDK=true): replaces the ref's pinned sdk-golang with the local SDK tree, for ziti branches developed in lockstep with SDK branches; cache keys then include the local SDK commit, and a dirty SDK tree bypasses the cache so iteration never serves a stale binary - pure-build compile failures carry a directed hint naming the co-development env var - carries SourceBuilt through ResolvedID and Version (short-SHA display; source builds satisfy every version minimum) - verified live: built openziti/ziti@connect-v2 against the local SDK and ran the bring-up canary against it
- adds Test_DialProtocolNegotiation: the SDK must take ConnectV2 exactly when the router advertises the capability and the session is OIDC, else legacy V1, asserted against the observed dial event (never inferred from a version), with the ForceConnectV1 escape hatch checked too - adds ZITI_ACCEPTANCE_REQUIRE_V2: required mode fails if the environment can't exercise ConnectV2, so the dedicated CI job can't go green by adaptively passing on V1 - adds harness support: RouterSupportsConnectV2 (from the new SupportsConnectV2 field in router inspect), RequireV2, dialWithOptionsRetry, expectedDialProtocol - the echo host now advertises SDK-hosted xgress on bind, so dials to it run SDK xgress on both ends where the router supports it (older routers fall back to a legacy terminator); the smoke test's protocol expectation is now capability-driven - adds ZITI_ACCEPTANCE_DEBUG for SDK and router debug logging, and bounded read deadlines so a data-plane stall fails in seconds with a directed message rather than a test timeout - replaces design-doc shorthand in test comments with descriptions of the actual behavior - verified against latest and maint-lts (full suite, legacy terminator fallback) and source-built connect-v2 (required-V2 mode)
- splits startEchoServerFC out of startEchoServer with an explicit sdkXgress parameter, so tests can host a legacy (non-xgress) terminator the router bridges to - keeps startEchoServer defaulting to SDK-hosted xgress, the path the suite primarily exercises
…ixes #952 Moving half-close into xgress replaced the edge FIN flag with the native xgress EOF flag. Peers that don't honor native EOF, an older router bridging to a legacy edge host or an older SDK, never see the half-close, so a host that reads to EOF starves and the circuit stalls. - restores the legacy signal in edgeConnXgress.CloseWrite: when the peer has not negotiated native EOF, half-close rides as a payload header the terminating router maps back onto edge.FlagsHeader, so the host sees an ordinary edge FIN; the native EOF flag is still used when the peer supports it - exports Xgress.PeerSupportsEOF and splits CloseSendBufferWhenEmpty out of CloseRxTimeout so the edge layer can end its send half without emitting the native EOF that would tear the whole circuit down - adds Test_HalfClose_XgressClientToLegacyHost, an acceptance regression for an xgress client half-closing to a router-bridged legacy host
- adds .github/workflows/acceptance.yml, running the SDK acceptance suite on pull requests and pushes to main - runs a compatibility matrix over the active-lts, maint-lts, latest, and main selectors in adaptive mode, guaranteeing V1 compatibility across the supported lines; the label-to-version mapping stays in the versions.yaml manifest - adds a dedicated connect-v2 job that builds the connect-v2 ziti branch against the SDK under test and runs in required mode, so V2 + OIDC coverage can't silently downgrade to a V1 pass - caches acquired ziti binaries by their immutable id and serializes package test binaries so they share one acquisition - records the V2 job's co-development build rationale in acceptance-tests.md
- changes the module path to github.com/openziti/sdk-golang/v2 and rewrites all import paths accordingly, per Go semantic import versioning - repoints the acceptance and example modules' SDK require/replace and the acceptance co-development build to the /v2 path - regenerates edge_client.pb.go for the updated go_package option - bumps the version file to 2.0 - records the /v2 path change as a breaking change and lists the connect-v2, acceptance-framework (#951), and half-close (#952) issues in the changelog
The SDK moved to channel/v5 (the 2.0 line) and shares the sdk-golang/xgress package with ziti, so the co-development build cannot compile ziti connect-v2, which is still on channel/v4, against the SDK under test. ziti's channel/v5 migration is in progress but not yet landed or merged with connect-v2. - disables the connect-v2 V2-coverage job with `if: false`, documenting how and when to re-enable it
I've added the AT framework as well, so starting fresh
andrewpmartinez
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the Connect-V2 sessionless SDK dial protocol — dials are authorized at the router via RDM instead of requiring a controller-issued service session token.
CtrlClient.GetServiceEdgeRouterswrapping the existingGET /edge/client/v1/services/{id}/edge-routersendpoint for sessionless edge-router discovery.ContextImpl.serviceEdgeRouters, refreshed on thesessionRefreshTimercadence and warming router connections the same way the session cache does. Invalidates on service removal and on dial failure.dialSessionto skip service-session creation on the V2 path; V1 fallback now creates the service session lazily, only when the V1 wire is actually taken.UseXgressToSdkHeader=trueon Connect-V2 dials (the go-SDK V2 path is alwaysedgeConnXgress); the router continues to support both flow-control modes for other SDKs.DialOptions.ForceConnectV1escape hatch to route through the V1 path even when the router advertises V2.route_circuit/pending_dials. The xgressCircuitStarthandshake already serializes data flow, so the race window the module was guarding against does not exist. Reads the circuit ID fromstate_connected'sCircuitIdHeaderinstead.0.0.0dev builds included), gated byCtrlClient.supportsServiceEdgeRouterList. Older controllers don't expose the endpoint on the client API; for those the SDK falls back to creating a dial session and using its attached edge routers, with the session cache as the source of truth. While the controller version is unknown, dials take the V1 path and the version load is retried on the next capability check. If the sessionless endpoint errors despite the version saying it should exist, the dial falls back to the V1 session path at runtime and drops the cached ER list. Documents the supported controller set (the 1.6.x and 2.0.0 LTS releases) in CHANGELOG and bumps the version from 1.7 to 2.0.createSessionWithBackoff, so foreground dials get ctx cancellation, retry with backoff, re-auth on 401 and service-recreation handling; the background refresh loop uses a plain cached lookup.createSessionWithBackoffto run the retry before returning the session, removing a dead pre-retry cache call and a return statement that relied on operand evaluation order.GetRouterId()fromedge.Connandedge.MsgChannel; documents the removal as a breaking change in CHANGELOG.splitMultipartlength prefixes; returns descriptive errors instead of panicking on truncated input.handlePayloadWithNoSinkto send the constructed ack instead of the original message.conn.Close()calls fromsetupXgressFlowControl's header-validation error paths so the conn no longer NPEs beforexg/writeAdapterare populated.For openziti/ziti#3884.
Also in this PR (the 2.0 release)
This PR is now the 2.0 release. On top of Connect-V2 it lands the SDK acceptance-test framework, the xgress half-close back-compat fix, and the
/v2module migration.acceptance-tests.md. Fixes Add an SDK acceptance-test framework #951.github.com/openziti/sdk-golang/v2; consumers update their imports to the/v2form. See CHANGELOG.Note: the dedicated Connect-V2 CI job builds the ziti
connect-v2branch against this SDK, so it needs that branch to also importsdk-golang/v2.Fixes #936
Fixes #951
Fixes #952