Skip to content

fix: race condition in stateMap access during concurrent replication#263

Open
remo-lab wants to merge 1 commit intocontainer-registry:mainfrom
remo-lab:fix/state-map-race-condition
Open

fix: race condition in stateMap access during concurrent replication#263
remo-lab wants to merge 1 commit intocontainer-registry:mainfrom
remo-lab:fix/state-map-race-condition

Conversation

@remo-lab
Copy link

@remo-lab remo-lab commented Jan 29, 2026

Summary

Fixes a critical race condition in FetchAndReplicateStateProcess.Execute() where concurrent goroutines access and mutate f.stateMap without proper synchronization.

Problem

  1. updateStateMap() had no synchronization — directly replaced f.stateMap without holding any lock
  2. Local mutex was scoped per-call — goroutines from different scheduler cycles never synchronized
  3. Unsynchronized reads in goroutines — read f.stateMap[index] without lock during slice mutation

Impact: Panic (index out of bounds) or silent data corruption when Ground Control changes group assignments.

Solution

  1. Protected updateStateMap() with f.mu
  2. Goroutines work on a snapshot taken under lock
  3. URL-based updates instead of index-based
  4. Removed local per-call mutex

Files Changed

  • internal/state/state_process.go

Summary by cubic

Fixes a race condition in concurrent replication by synchronizing stateMap access and processing a locked snapshot in goroutines. Prevents panics and data corruption when group assignments change.

  • Bug Fixes
    • Guarded updateStateMap and all stateMap mutations with f.mu.
    • Goroutines work off a snapshot taken under lock; channel sizes match snapshot length.
    • Updates are matched by URL, not index, to handle reorders safely.
    • Removed the per-call mutex and centralized synchronization at the struct level.

Written for commit 309ccbd. Summary will update on new commits.

Summary by CodeRabbit

  • Refactor

    • Switched to snapshot-driven processing of tracked state to reduce concurrency conflicts and ensure consistent iteration.
    • Consolidated per-entry work into a single, repeatable processing flow for clearer behavior and more reliable logging.
    • Centralized state updates and strengthened synchronization to prevent race conditions.
  • Bug Fixes

    • Improved error/result handling and buffering to ensure all entry results are captured and processed.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Reworks concurrency in internal/state/state_process.go: creates a mutex-protected snapshot of stateMap for goroutine reads, launches per-entry processors over the snapshot, and centralizes per-entry logic into processStateEntry with mutex-guarded URL-based updates and standardized error results.

Changes

Cohort / File(s) Summary
State process refactor
internal/state/state_process.go
Introduce snapshot copy of stateMap under the struct mutex; launch goroutines over the snapshot; add processStateEntry, makeErrorResult, and updateStateMapEntry; change updates to locate-by-URL and update under mutex; remove per-goroutine local mutexes; buffer stateFetcherResults to snapshot length.

Sequence Diagram

sequenceDiagram
    participant Main as Main Process
    participant Mutex as Struct Mutex
    participant Snapshot as State Snapshot
    participant Fetcher as State Fetcher Goroutines
    participant StateMap as Live stateMap

    Main->>Mutex: lock
    Mutex->>StateMap: read entries & copy
    Mutex->>Snapshot: create snapshot
    Main->>Mutex: unlock

    Main->>Fetcher: launch goroutines with snapshot entries
    Fetcher->>Snapshot: read entry (read-only)
    Fetcher->>Fetcher: processStateEntry (fetch, process, decide delete/replicate)
    Fetcher->>Main: send StateFetcherResult

    Main->>Mutex: lock
    Mutex->>StateMap: locate entry by URL and update via updateStateMapEntry
    Main->>Mutex: unlock
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Suggested reviewers

  • bupd
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: race condition in stateMap access during concurrent replication' directly and accurately describes the main objective of the PR—fixing a race condition in the stateMap during concurrent replication, which is the core change addressed by the implementation of snapshot-based synchronization and mutex-guarded updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link

codacy-production bot commented Jan 29, 2026

Codacy's Analysis Summary

0 new issue (≤ 0 issue)
0 new security issue
5 complexity
0 duplications

Review Pull Request in Codacy →

AI Reviewer available: add the codacy-review label to get contextual insights without leaving GitHub.

@remo-lab
Copy link
Author

remo-lab commented Jan 29, 2026

Hey @bupd @Vad1mo ! This fixes a race condition that could cause crashes or data corruption during state sync. Would appreciate a review when you get a chance, thanks!

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@remo-lab remo-lab force-pushed the fix/state-map-race-condition branch from c5f46a8 to 4719082 Compare January 29, 2026 22:59
Signed-off-by: remo-lab <remopanda7@gmail.com>
@remo-lab remo-lab force-pushed the fix/state-map-race-condition branch from 4719082 to 309ccbd Compare January 29, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant