Skip to content

feat: Graceful Shutdown with State Persistence#258

Merged
bupd merged 11 commits intocontainer-registry:mainfrom
intojhanurag:feat/graceful-shutdown-persistance
Feb 10, 2026
Merged

feat: Graceful Shutdown with State Persistence#258
bupd merged 11 commits intocontainer-registry:mainfrom
intojhanurag:feat/graceful-shutdown-persistance

Conversation

@intojhanurag
Copy link
Contributor

@intojhanurag intojhanurag commented Jan 29, 2026

Description

I fixed the issue as described in the issue description.

Closed:

#229

Summary by CodeRabbit

  • New Features

    • Added a shutdown-timeout option (defaults to 30s) to configure graceful shutdown duration.
  • Improvements

    • Shutdown now performs a bounded graceful sequence, waiting for in-progress work and forcing exit on timeout.
    • Background tasks and schedulers now respect cancellation and are waited on during shutdown.
    • Shutdown includes richer observable logging for clearer diagnostics.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a CLI/env shutdown-timeout and implements a bounded graceful shutdown: main creates a timeout context, runs satellite shutdown via Satellite.Stop(ctx), schedulers wait for tracked goroutines to finish, and replicator loops short-circuit on context cancellation; run now calls gracefulShutdown(...).

Changes

Cohort / File(s) Summary
Main CLI / Orchestration
cmd/main.go
Adds --shutdown-timeout / SHUTDOWN_TIMEOUT, threads shutdownTimeout into run(...), and introduces gracefulShutdown(...) to coordinate a bounded shutdown using a timeout context.
Satellite Shutdown Logic
internal/satellite/satellite.go
Changes Stop()Stop(ctx context.Context), adds context-aware shutdown logging, per-scheduler stop calls with error logging, and overall shutdown result reporting.
Scheduler goroutine coordination
internal/scheduler/scheduler.go
Adds wg sync.WaitGroup to Scheduler; Run/launchProcess increment/decrement the waitgroup; Stop(ctx) waits for tracked goroutines to finish and returns ctx error if cancelled/timed out.
Replicator context checks
internal/state/replicator.go
Adds context cancellation checks at the start of per-item loops in Replicate and DeleteReplicationEntity, logging and returning early if context is done.
API / Signatures
cmd/main.go, internal/...
Updates run(...) signature to accept shutdownTimeout string; introduces exported gracefulShutdown(ctx, log, s, wg, shutdownTimeout string) error.

Sequence Diagram(s)

sequenceDiagram
    participant OS as OS
    participant Main as cmd/main.go
    participant Satellite as Satellite
    participant Scheduler as Scheduler(s)
    participant Replicator as Replicator

    OS->>Main: termination signal
    Main->>Main: parse --shutdown-timeout / create timeout ctx
    Main->>Satellite: Satellite.Stop(ctx)
    Satellite->>Satellite: log scheduler_count
    Satellite->>Scheduler: Stop(ctx) for each scheduler
    Scheduler->>Scheduler: stop scheduling new tasks
    Scheduler->>Scheduler: wait for wg (running goroutines) (rgba(0,128,0,0.5))
    Replicator->>Replicator: per-item loops check ctx.Done()
    alt within timeout
        Scheduler->>Satellite: stopped
        Satellite->>Main: return success
    else timeout
        Scheduler->>Satellite: returns ctx error
        Satellite->>Main: return error -> exit non-zero
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • Vad1mo
🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is incomplete—it references issue #229 but lacks details about what was changed, why, or how the changes address the issue. Expand the description to explain the problem being solved, key changes made (graceful shutdown logic, context handling, waitgroup coordination), and how they address issue #229.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: graceful shutdown implementation with state persistence support across multiple components.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link

codacy-production bot commented Jan 29, 2026

Codacy's Analysis Summary

2 new issues (≤ 0 issue)
0 new security issue
41 complexity
5 duplications

Review Pull Request in Codacy →

AI Reviewer available: add the codacy-review label to get contextual insights without leaving GitHub.

@intojhanurag intojhanurag force-pushed the feat/graceful-shutdown-persistance branch from c1dfbb4 to 0c340f8 Compare January 29, 2026 13:36
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

@intojhanurag
Copy link
Contributor Author

intojhanurag commented Jan 29, 2026

Hey @bupd , how we can stop this spam pr reviewer bot 😭 , Codacy . I think this bot is not by harbor

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cmd/main.go`:
- Around line 191-211: The shutdown path currently waits only on wg.Wait()
(errgroup) but not on scheduler goroutines started in satellite.Run; make
Scheduler provide explicit synchronization and block Stop until all its internal
goroutines and spawned process goroutines complete. Add a WaitGroup (or a done
channel) to the Scheduler struct, increment it in Scheduler.Run and around
launchProcess() invocations, and decrement when those goroutines finish; change
Scheduler.Stop to signal cancellation and then wait on that internal WaitGroup
(or close/wait on the done channel) before returning. Finally, update main
shutdown logic to call Scheduler.Stop() (or Scheduler.Wait()) and only consider
shutdownDone after the scheduler's wait has returned so that wg.Wait,
Scheduler.Stop/Wait, and shutdownCtx are all coordinated.
🧹 Nitpick comments (1)
cmd/main.go (1)

99-215: Consider extracting graceful shutdown into a helper to satisfy funlen.

run now exceeds the configured function-length threshold. A small helper for shutdown logic should keep the linter happy and simplify run.

♻️ Proposed refactor (extract graceful shutdown)
 func run(jsonLogging bool, token, groundControlURL string, shutdownTimeout string) error {
@@
-	// Wait until context is cancelled
-	<-ctx.Done()
-
-	// Graceful shutdown with timeout
-	shutdownDuration, err := time.ParseDuration(shutdownTimeout)
-	if err != nil {
-		log.Warn().Err(err).Str("shutdownTimeout", shutdownTimeout).
-			Msg("Invalid shutdown timeout, defaulting to 30s")
-		shutdownDuration = 30 * time.Second
-	}
-
-	log.Info().Dur("timeout", shutdownDuration).
-		Msg("Received shutdown signal, initiating graceful shutdown")
-
-	// Create a shutdown context with timeout
-	shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), shutdownDuration)
-	defer shutdownCancel()
-
-	// Stop schedulers to prevent new tasks from being accepted
-	log.Info().Msg("Stopping schedulers to prevent new replication tasks")
-	s.Stop(ctx)
-
-	// Wait for in-progress tasks with timeout
-	log.Info().Msg("Waiting for in-progress replication tasks to complete")
-	shutdownDone := make(chan struct{})
-	go func() {
-		err := wg.Wait()
-		if err != nil {
-			log.Error().Err(err).Msg("Error waiting for goroutines during shutdown")
-		}
-		close(shutdownDone)
-	}()
-
-	select {
-	case <-shutdownDone:
-		log.Info().Msg("Graceful shutdown completed successfully")
-	case <-shutdownCtx.Done():
-		log.Warn().Msg("Shutdown timeout exceeded, forcing exit")
-		return fmt.Errorf("graceful shutdown timeout exceeded")
-	}
-
-	return nil
+	return gracefulShutdown(ctx, log, s, wg, shutdownTimeout)
 }
+
+func gracefulShutdown(ctx context.Context, log *zerolog.Logger, s *satellite.Satellite, wg *errgroup.Group, shutdownTimeout string) error {
+	// Wait until context is cancelled
+	<-ctx.Done()
+
+	shutdownDuration, err := time.ParseDuration(shutdownTimeout)
+	if err != nil {
+		log.Warn().Err(err).Str("shutdownTimeout", shutdownTimeout).
+			Msg("Invalid shutdown timeout, defaulting to 30s")
+		shutdownDuration = 30 * time.Second
+	}
+
+	log.Info().Dur("timeout", shutdownDuration).
+		Msg("Received shutdown signal, initiating graceful shutdown")
+
+	shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), shutdownDuration)
+	defer shutdownCancel()
+
+	log.Info().Msg("Stopping schedulers to prevent new replication tasks")
+	s.Stop(ctx)
+
+	log.Info().Msg("Waiting for in-progress replication tasks to complete")
+	shutdownDone := make(chan struct{})
+	go func() {
+		if err := wg.Wait(); err != nil {
+			log.Error().Err(err).Msg("Error waiting for goroutines during shutdown")
+		}
+		close(shutdownDone)
+	}()
+
+	select {
+	case <-shutdownDone:
+		log.Info().Msg("Graceful shutdown completed successfully")
+	case <-shutdownCtx.Done():
+		log.Warn().Msg("Shutdown timeout exceeded, forcing exit")
+		return fmt.Errorf("graceful shutdown timeout exceeded")
+	}
+
+	return nil
+}

As per coding guidelines: **/*.go: Use strict golangci-lint with limits on function length (100 lines, 50 statements).

@intojhanurag intojhanurag force-pushed the feat/graceful-shutdown-persistance branch from 6c28cd4 to 840a5cf Compare January 29, 2026 16:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cmd/main.go`:
- Around line 194-196: The code calls s.Stop(ctx) after waiting on <-ctx.Done(),
so ctx is already cancelled; change the call to use the shutdown context created
for teardown (shutdownCtx) instead: locate the scheduler stop call s.Stop(ctx)
and replace it with s.Stop(shutdownCtx) so Stop() receives the live shutdownCtx
(not the cancelled ctx) for deadline/timeout-aware shutdown operations.

@intojhanurag intojhanurag force-pushed the feat/graceful-shutdown-persistance branch 2 times, most recently from bae21a3 to 028f788 Compare January 29, 2026 17:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cmd/main.go`:
- Around line 194-215: s.Stop(shutdownCtx) can block forever and is not guarded
by the shutdown timeout; run Stop in its own goroutine (or change Scheduler.Stop
to accept/respect a context) and include its completion in the same select that
races wg.Wait() against shutdownCtx. Concretely: start a goroutine that calls
s.Stop(shutdownCtx) and signals a new channel (e.g., stopDone), keep the
existing goroutine that waits on wg.Wait() signaling shutdownDone, then select
over both stopDone and shutdownDone vs shutdownCtx to determine graceful vs
timed-out shutdown and log/return appropriately; alternatively modify
Scheduler.Stop(ctx context.Context) to return promptly when ctx is done and then
the existing logic can keep calling s.Stop(shutdownCtx).
🧹 Nitpick comments (1)
internal/scheduler/scheduler.go (1)

127-131: Stop() blocks indefinitely without timeout or cancellation.

The method waits for all goroutines via wg.Wait() but has no timeout mechanism or context awareness. If Run() hasn't exited (e.g., context not cancelled), Stop() will block forever. Consider accepting a context parameter to allow timeout-based waiting, or document that the caller must cancel the scheduler's context before calling Stop().

♻️ Proposed refactor to add context-aware waiting
-// Stop signals the scheduler to stop and waits for all goroutines to complete
-func (s *Scheduler) Stop() {
-	// Wait for the scheduler's main Run goroutine to complete
-	s.wg.Wait()
+// Stop signals the scheduler to stop and waits for all goroutines to complete.
+// The caller must cancel the context passed to Run() before calling Stop(),
+// otherwise this will block indefinitely.
+func (s *Scheduler) Stop(ctx context.Context) error {
+	done := make(chan struct{})
+	go func() {
+		s.wg.Wait()
+		close(done)
+	}()
+
+	select {
+	case <-done:
+		return nil
+	case <-ctx.Done():
+		return ctx.Err()
+	}
 }

@intojhanurag intojhanurag force-pushed the feat/graceful-shutdown-persistance branch from 6dd63fe to 95126a6 Compare January 29, 2026 17:45
@intojhanurag
Copy link
Contributor Author

Hey @bupd I applied all suggestion of coderabbit mostly . Now can you take a look , if you have a moment :)

@intojhanurag intojhanurag force-pushed the feat/graceful-shutdown-persistance branch from 979b0a3 to 84b46d0 Compare January 29, 2026 18:20
Copy link
Member

@bupd bupd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve conflicts

@intojhanurag intojhanurag force-pushed the feat/graceful-shutdown-persistance branch from 84b46d0 to 3a38c4e Compare February 8, 2026 10:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/satellite/satellite.go`:
- Around line 110-115: The current logging is misleading: change the warn log
inside the loop that calls sched.Stop(ctx) to a generic failure message (e.g.,
"Scheduler stop failed") and include the actual error and scheduler name using
sched.Name() and err; do not claim it "timed out" unless you check the error
type. Also track whether any schedulers returned an error (e.g., a bool anyErr
or an error slice) while iterating over sched.Stop(ctx), and replace the
unconditional final log.Info().Msg("All schedulers stopped") with conditional
logging—if anyErr log a warning/info like "Some schedulers failed to stop"
(include count or details), otherwise log "All schedulers stopped". Ensure you
reference sched.Stop, sched.Name, ctx, and err when making these changes.

@intojhanurag
Copy link
Contributor Author

resolve conflicts

Resolved :)
If anything lmk.

@bupd bupd force-pushed the feat/graceful-shutdown-persistance branch 2 times, most recently from 3f0cbb6 to 7a02bb7 Compare February 10, 2026 18:46
intojhanurag and others added 9 commits February 11, 2026 00:40
Signed-off-by: intojhanurag <aojharaj2004@gmail.com>
Signed-off-by: intojhanurag <aojharaj2004@gmail.com>
Signed-off-by: intojhanurag <aojharaj2004@gmail.com>
Signed-off-by: intojhanurag <aojharaj2004@gmail.com>
Signed-off-by: intojhanurag <aojharaj2004@gmail.com>
Signed-off-by: intojhanurag <aojharaj2004@gmail.com>
Signed-off-by: intojhanurag <aojharaj2004@gmail.com>
Signed-off-by: intojhanurag <aojharaj2004@gmail.com>
Signed-off-by: bupd <bupdprasanth@gmail.com>
bupd added 2 commits February 11, 2026 00:42
Signed-off-by: bupd <bupdprasanth@gmail.com>
Signed-off-by: bupd <bupdprasanth@gmail.com>
@bupd bupd force-pushed the feat/graceful-shutdown-persistance branch from 7a02bb7 to 033ca05 Compare February 10, 2026 19:13
@bupd bupd merged commit c16a76f into container-registry:main Feb 10, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants