Skip to content

fix(framework): Prevent local SuperLink SQLite bootstrap races#6797

Merged
danieljanes merged 9 commits intomainfrom
codex/fix-local-superlink-sqlite-race
Mar 23, 2026
Merged

fix(framework): Prevent local SuperLink SQLite bootstrap races#6797
danieljanes merged 9 commits intomainfrom
codex/fix-local-superlink-sqlite-race

Conversation

@panh99
Copy link
Member

@panh99 panh99 commented Mar 19, 2026

Summary

  • serialize first-time SQL-backed ObjectStore initialization
  • serialize first-time SQL-backed LinkState initialization
  • add concurrency regression tests for both factories
  • force DB initialization before starting gRPC servers

Bug report

╭─ Error ──────────────────────────────────────────────────────────────────────╮
│ Exception calling application: (sqlite3.OperationalError) table node already │
│ exists                                                                       │
│ [SQL:                                                                        │
│ CREATE TABLE node (                                                          │
│         node_id INTEGER,                                                     │
│         owner_aid VARCHAR,                                                   │
│         owner_name VARCHAR,                                                  │
│         status VARCHAR,                                                      │
│         registered_at VARCHAR,                                               │
│         last_activated_at VARCHAR,                                           │
│         last_deactivated_at VARCHAR,                                         │
│         unregistered_at VARCHAR,                                             │
│         online_until TIMESTAMP,                                              │
│         heartbeat_interval FLOAT,                                            │
│         public_key BLOB,                                                     │
│         UNIQUE (node_id),                                                    │
│         UNIQUE (public_key)                                                  │
│ )                                                                            │
│                                                                              │
│ ]                                                                            │
│ (Background on this error at: https://sqlalche.me/e/20/e3q8)                 │
╰──────────────────────────────────────────────────────────────────────────────╯

Possible root cause

Local SuperLink starts a SQLite-backed simulation SuperLink and then immediately spawns flower-superexec. flower-superexec polls ListAppsToLaunch while the CLI concurrently sends StartRun. Both request paths lazily initialized the same SQL-backed state without synchronization, so two gRPC worker threads could both enter Alembic bootstrap on a fresh SQLite DB. The losing initializer then failed with sqlite3.OperationalError: table node already exists.

Copilot AI review requested due to automatic review settings March 19, 2026 21:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a real concurrency hazard in local SuperLink when two gRPC worker threads concurrently trigger first-time initialization of SQLite-backed state, leading to Alembic bootstrap races.

Changes:

  • Add locking to ObjectStoreFactory.store() to serialize first-time SqlObjectStore initialization.
  • Add locking to LinkStateFactory.state() to serialize first-time SqlLinkState initialization.
  • Add concurrency regression tests ensuring SQL-backed initialization happens exactly once under concurrent access.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
framework/py/flwr/supercore/object_store/object_store_factory.py Adds a per-factory lock to make first-time SQL ObjectStore initialization thread-safe.
framework/py/flwr/supercore/object_store/object_store_factory_test.py Adds a concurrency test asserting only one SQL store initialization occurs under contention.
framework/py/flwr/server/superlink/linkstate/linkstate_factory.py Adds a per-factory lock to make first-time SQL LinkState initialization thread-safe.
framework/py/flwr/server/superlink/linkstate/linkstate_factory_test.py Adds a concurrency test asserting only one SQL state initialization occurs under contention.
Comments suppressed due to low confidence (2)

framework/py/flwr/server/superlink/linkstate/linkstate_factory_test.py:42

  • threading.Barrier(9) is tied to creating exactly 8 worker threads plus the main thread. Consider using a num_threads variable and Barrier(num_threads + 1) to keep the test from becoming brittle or hanging if the thread count changes.
        barrier = threading.Barrier(9)
        init_calls = 0

framework/py/flwr/supercore/object_store/object_store_factory_test.py:48

  • threading.Barrier(9) is coupled to the current thread count (8 worker threads + main). Consider deriving the barrier party count from a num_threads constant (e.g., Barrier(num_threads + 1)) so the test won’t hang if the thread count is adjusted later.
        barrier = threading.Barrier(9)
        init_calls = 0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added the Maintainer Used to determine what PRs (mainly) come from Flower maintainers. label Mar 19, 2026
@danieljanes danieljanes enabled auto-merge (squash) March 23, 2026 18:51
@danieljanes danieljanes merged commit 71fdaa6 into main Mar 23, 2026
70 checks passed
@danieljanes danieljanes deleted the codex/fix-local-superlink-sqlite-race branch March 23, 2026 19:02
@panh99 panh99 restored the codex/fix-local-superlink-sqlite-race branch March 23, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Maintainer Used to determine what PRs (mainly) come from Flower maintainers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants