fix(framework): Prevent local SuperLink SQLite bootstrap races#6797
Merged
danieljanes merged 9 commits intomainfrom Mar 23, 2026
Merged
fix(framework): Prevent local SuperLink SQLite bootstrap races#6797danieljanes merged 9 commits intomainfrom
danieljanes merged 9 commits intomainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a real concurrency hazard in local SuperLink when two gRPC worker threads concurrently trigger first-time initialization of SQLite-backed state, leading to Alembic bootstrap races.
Changes:
- Add locking to
ObjectStoreFactory.store()to serialize first-timeSqlObjectStoreinitialization. - Add locking to
LinkStateFactory.state()to serialize first-timeSqlLinkStateinitialization. - Add concurrency regression tests ensuring SQL-backed initialization happens exactly once under concurrent access.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
framework/py/flwr/supercore/object_store/object_store_factory.py |
Adds a per-factory lock to make first-time SQL ObjectStore initialization thread-safe. |
framework/py/flwr/supercore/object_store/object_store_factory_test.py |
Adds a concurrency test asserting only one SQL store initialization occurs under contention. |
framework/py/flwr/server/superlink/linkstate/linkstate_factory.py |
Adds a per-factory lock to make first-time SQL LinkState initialization thread-safe. |
framework/py/flwr/server/superlink/linkstate/linkstate_factory_test.py |
Adds a concurrency test asserting only one SQL state initialization occurs under contention. |
Comments suppressed due to low confidence (2)
framework/py/flwr/server/superlink/linkstate/linkstate_factory_test.py:42
threading.Barrier(9)is tied to creating exactly 8 worker threads plus the main thread. Consider using anum_threadsvariable andBarrier(num_threads + 1)to keep the test from becoming brittle or hanging if the thread count changes.
barrier = threading.Barrier(9)
init_calls = 0
framework/py/flwr/supercore/object_store/object_store_factory_test.py:48
threading.Barrier(9)is coupled to the current thread count (8 worker threads + main). Consider deriving the barrier party count from anum_threadsconstant (e.g.,Barrier(num_threads + 1)) so the test won’t hang if the thread count is adjusted later.
barrier = threading.Barrier(9)
init_calls = 0
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
framework/py/flwr/server/superlink/linkstate/linkstate_factory_test.py
Outdated
Show resolved
Hide resolved
framework/py/flwr/supercore/object_store/object_store_factory_test.py
Outdated
Show resolved
Hide resolved
panh99
commented
Mar 19, 2026
framework/py/flwr/server/superlink/linkstate/linkstate_factory_test.py
Outdated
Show resolved
Hide resolved
msheller
reviewed
Mar 23, 2026
danieljanes
approved these changes
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bug report
Possible root cause
Local SuperLink starts a SQLite-backed simulation SuperLink and then immediately spawns
flower-superexec.flower-superexecpollsListAppsToLaunchwhile the CLI concurrently sendsStartRun. Both request paths lazily initialized the same SQL-backed state without synchronization, so two gRPC worker threads could both enter Alembic bootstrap on a fresh SQLite DB. The losing initializer then failed withsqlite3.OperationalError: table node already exists.