PSMDB-1846 More improvements for shutdown handling during FCBIS#1697
PSMDB-1846 More improvements for shutdown handling during FCBIS#1697
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adjusts the file-copy-based initial sync flow when switching the storage location, aiming to make storage switching more robust during initial sync and shutdown races.
Changes:
- Allow untimestamped writes (after abandoning any active snapshot) before aborting index builds during storage location switching.
- Wrap
_switchStorageLocation()in a function-leveltry/catchto convertDBExceptions intoStatusand add debug logging on failure. - Move/add shutdown checks to run after the storage switch in the download/dummy switch callbacks to better handle shutdown initiation during the unlocked window.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // modifies _mdb_catalog.wt which requires untimestamped writes to be allowed. | ||
| // abandonSnapshot() first to ensure no active WT transaction (required by the invariant | ||
| // in allowAllUntimestampedWrites). | ||
| shard_role_details::getRecoveryUnit(opCtx)->abandonSnapshot(); | ||
| shard_role_details::getRecoveryUnit(opCtx)->allowAllUntimestampedWrites(); |
There was a problem hiding this comment.
_switchStorageLocation() now enables allowAllUntimestampedWrites() for the index-build abort path, but it doesn’t also disable replicated writes. In this same file, the other allowAllUntimestampedWrites() usage is immediately followed by UnreplicatedWritesBlock to ensure these local catalog writes don’t attempt to generate oplog entries / validation (see _truncateOplogAndDropReplicatedDatabases). Consider adding an UnreplicatedWritesBlock (or equivalent) here around the index-build abort/closeCatalog sequence so the behavior is consistent and safe during initial sync.
| const boost::optional<startup_recovery::StartupRecoveryMode> recoveryMode) try { | ||
| LOGV2_DEBUG(128469, 1, "Switching storage location", "newLocation"_attr = newLocation); | ||
| invariant(shard_role_details::getLocker(opCtx)->isW()); |
There was a problem hiding this comment.
The PR description still contains the default placeholder text (“Anything in this description will be included… Replace or delete this text before merging… Add links to testing…”). Please update the PR description (and include testing links) before merging so the commit message and review context are accurate.
This PR adjusts the file-copy-based initial sync flow when switching the storage location, aiming to make storage switching more robust during initial sync and shutdown races.
Changes:
_switchStorageLocation()in a function-leveltry/catchto convertDBExceptions intoStatusand add debug logging on failure.