Skip to content

fix: use singleflight to prevent concurrent map race in ResolveEndpoints#621

Open
bgavrilMS wants to merge 1 commit into
mainfrom
bogavril/fix-resolve-endpoints-race
Open

fix: use singleflight to prevent concurrent map race in ResolveEndpoints#621
bgavrilMS wants to merge 1 commit into
mainfrom
bogavril/fix-resolve-endpoints-race

Conversation

@bgavrilMS
Copy link
Copy Markdown
Member

Problem

In authorityEndpoint.ResolveEndpoints, after addCachedEndpoints returns (releasing m.mu via defer), the code reads m.cache[...].Aliases without holding the lock. If another goroutine is concurrently writing to m.cache inside addCachedEndpoints, the Go runtime terminates the process with fatal error: concurrent map read and map write. This is not a recoverable panic.

Fix

  • Use golang.org/x/sync/singleflight keyed by authority URI so only one goroutine performs discovery per authority at a time. Concurrent callers for the same authority wait and share the result.
  • addCachedEndpoints now returns the aliases map so callers never access m.cache without the lock.
  • Added a double-check inside the singleflight callback to avoid redundant work if the cache was populated while waiting.

This eliminates both the race condition and redundant network calls for the same authority.

Fixes #620

@bgavrilMS bgavrilMS added bug Something isn't working p2 labels May 24, 2026
The code in ResolveEndpoints read m.cache without holding m.mu after
addCachedEndpoints released the lock, causing a fatal concurrent map
read/write when multiple goroutines resolved the same authority.

Use golang.org/x/sync/singleflight to deduplicate concurrent discovery
calls by authority URI. This ensures only one goroutine performs
discovery per authority at a time, eliminating the race and redundant
network calls. addCachedEndpoints now returns the aliases so callers
never access the map without the lock.

Fixes #620

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bgavrilMS bgavrilMS force-pushed the bogavril/fix-resolve-endpoints-race branch from fd6bc7f to e5645ec Compare May 24, 2026 13:54
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a concurrency crash in the internal OAuth authority endpoint resolver by preventing concurrent goroutines from racing on shared endpoint cache state during discovery/validation.

Changes:

  • Deduplicates concurrent endpoint discovery per authority using golang.org/x/sync/singleflight.
  • Refactors caching so issuer-validation alias data is returned from the locked cache update path (avoiding unlocked cache reads).
  • Adds unit tests covering caching, singleflight de-duplication, and basic issuer-validation behavior.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

File Description
go.mod Adds golang.org/x/sync dependency for singleflight.
go.sum Records checksums for the new golang.org/x/sync dependency.
apps/internal/oauth/resolvers.go Introduces singleflight for per-authority discovery de-duplication and avoids unlocked cache reads by returning aliases from the cache write path.
apps/internal/oauth/resolvers_test.go Adds tests for caching behavior and concurrent resolution scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +90
endpoints := authority.NewEndpoints(
strings.Replace(resp.AuthorizationEndpoint, "{tenant}", tenant, -1),
strings.Replace(resp.TokenEndpoint, "{tenant}", tenant, -1),
strings.Replace(resp.Issuer, "{tenant}", tenant, -1),
authorityInfo.Host)

m.addCachedEndpoints(authorityInfo, userPrincipalName, endpoints)
aliases := m.addCachedEndpoints(authorityInfo, userPrincipalName, endpoints)

if err := resp.ValidateIssuerMatchesAuthority(authorityInfo.CanonicalAuthorityURI,
m.cache[authorityInfo.CanonicalAuthorityURI].Aliases); err != nil {
return authority.Endpoints{}, fmt.Errorf("ResolveEndpoints(): %w", err)
if err := resp.ValidateIssuerMatchesAuthority(authorityInfo.CanonicalAuthorityURI,
aliases); err != nil {
return authority.Endpoints{}, fmt.Errorf("ResolveEndpoints(): %w", err)
}
Comment on lines +139 to +142
// Give goroutines time to start and block on singleflight
time.Sleep(50 * time.Millisecond)
close(gate) // release the handler

Comment on lines +275 to +313
func TestResolveEndpoints_ADFS_CachesByDomain(t *testing.T) {
var callCount int32

srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
atomic.AddInt32(&callCount, 1)
w.Header().Set("Content-Type", "application/json")
// Use the test server host in the response so issuer validation passes
fmt.Fprint(w, tenantDiscoveryJSONWithScheme("http", r.Host))
}))
defer srv.Close()

// Extract host from test server URL (e.g. "127.0.0.1:PORT")
srvHost := srv.URL[len("http://"):]

rest := ops.New(srv.Client())
resolver := newAuthorityEndpoint(rest)

// For ADFS, openIDConfigurationEndpoint uses https://<Host>/adfs/.well-known/...
// which won't hit our HTTP test server. Instead, use a non-ADFS authority type
// that still goes through addCachedEndpoints with domain logic by pre-populating
// the endpoint. We'll test the ADFS caching logic by using the ADFS type but
// with ValidateAuthority=true and an untrusted host so it goes through AADInstanceDiscovery path.
// Actually, the simplest fix: use AAD type and test the domain caching indirectly.
// The real ADFS test would need to intercept at the DNS/transport level.
// Let's instead test that ADFS domain caching works by directly exercising
// cachedEndpoints and addCachedEndpoints.

// Use the resolver with an ADFS-like authority that routes through the test server.
// Since ADFS constructs endpoint as https://<host>/adfs/..., we override Host to
// point at the test server but this uses HTTPS vs our HTTP server.
// Instead, let's just test with AAD type since the singleflight and caching
// logic is the same — the ADFS-specific domain filtering is exercised separately.

info := authority.Info{
Host: srvHost,
CanonicalAuthorityURI: srv.URL + "/common/",
AuthorityType: authority.AAD,
Tenant: "common",
}
Comment on lines +286 to +288
// Extract host from test server URL (e.g. "127.0.0.1:PORT")
srvHost := srv.URL[len("http://"):]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working p2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fatal error: concurrent map read and map write

2 participants