Skip to content

Reduce number of Account Controller retries#5365

Open
jmwample wants to merge 8 commits into
developfrom
jmwample/ac-retry-reduce
Open

Reduce number of Account Controller retries#5365
jmwample wants to merge 8 commits into
developfrom
jmwample/ac-retry-reduce

Conversation

@jmwample
Copy link
Copy Markdown
Contributor

@jmwample jmwample commented May 19, 2026

Ticket

JIRA-NYM-XXXX

Description

The account controller is retrying way to many times way to quickly.

This PR reduces the number of retries down to 2 -- which is actually 9 HTTP request attempts as the HTTP client is configured with 3 retries per attempt, and inital + 2 sync retry attempts.

Also the wait between attempts now starts at a 2s delay.

This PR removes the retry attempts from the account controller state machine. If the account controller encounters a failed step it transitions to the error state directly.

Also adds a check for stored sufficient zk_nyms before transitioning to the RequestingZkNyms state so we skip an API access if it is not necessary.


This change is Reviewable

Summary by CodeRabbit

  • Bug Fixes
    • Removed repeated retry/backoff during account sync so failures progress faster to pending, error, or ready states.
    • Improved handling of zk-nym and sync failures to reduce stuck states and speed recovery.
    • Reduced verbosity of account debug output for cleaner logs.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed attempt-based retry/backoff from SyncingState and RequestingZkNymsState; simplified failure/finished routing to PendingSubscriptionState, ErrorState, ReadyState, or RequestingZkNymsState; updated function signatures and all state-machine call sites; adjusted logging and imports.

Changes

Remove attempts-based retry and simplify sync flow

Layer / File(s) Summary
SyncingState core refactor and event transitions
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs
Removed attempts field and parameter, deleted backoff/get_delay and related config/imports, updated spawned sync task and sync_account signature, changed SyncEvent::Failure/Finished routing, and changed account-summary logging format.
RequestingZkNymsState retry and error handling
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs
Removed ZK_NYM_MAX_FAILS and attempts, simplified handle_retrieved_zk_nym error branching to directly choose ReadyState/ErrorState when appropriate, and derived thiserror::Error for ZkNymError variants.
State-machine call-site updates
nym-vpn-core/crates/nym-vpn-account-controller/src/{controller.rs,state_machine/*}
Updated all call sites (controller, error_state, logged_out_state, offline_state, pending_subscription_state, ready_state, upgrade_mode_state) to use SyncingState::enter(shared_state) or RequestingZkNymsState::enter(shared_state, ...) with the new signatures.

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers:

  • neacsu

"I hopped through code with ears held high,
Removed the counters that tried and tried,
Simpler paths now guide the sync away,
Tickets checked — then ready, or request we relay.
🐇✨"

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is partially complete but lacks the required 'Changelog' checklist item completion and the 'JIRA-NYM-XXXX' placeholder should reference an actual ticket number. Complete the Changelog checklist item and replace 'JIRA-NYM-XXXX' with the actual JIRA ticket identifier.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: removing retry attempts from the account controller state machine.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jmwample/ac-retry-reduce

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs`:
- Around line 32-36: The retry-budget check has an off-by-one: change the stop
condition that currently reads `self.attempts > MAX_SYNCING_ATTEMPTS` to use
`>=` (i.e. `self.attempts >= MAX_SYNCING_ATTEMPTS`) so the allowed attempts
match the declared MAX_SYNCING_ATTEMPTS, and if you prefer to keep the original
comparison change the constant instead (adjust `MAX_SYNCING_ATTEMPTS` to the
intended value); update the check that references `self.attempts` and
`MAX_SYNCING_ATTEMPTS` in the syncing state logic accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 06d1e439-a4a2-482b-a1b4-55116de497d1

📥 Commits

Reviewing files that changed from the base of the PR and between e54c7a2 and 5652311.

📒 Files selected for processing (1)
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs (1)

224-229: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t restart SyncingState on tokio::task::JoinError from the zk-nym fetch task

JoinHandle returns Err(JoinError) when the spawned task panicked or was cancelled; the current code treats that as recoverable by re-entering SyncingState, bypassing the state machine’s “API-retry only” behavior and risking repeated restart cycles on deterministic failures. Surface it as ErrorState::enter(AccountControllerErrorStateReason::Internal { ... }) instead.

Suggested fix
         let join_result = match zknym_result {
             Ok(join_result) => join_result,
             Err(err) => {
                 error!("Failed to join on the fetching task : {err}");
-                return NextAccountControllerState::NewState(SyncingState::enter(shared_state));
+                return NextAccountControllerState::NewState(ErrorState::enter(
+                    AccountControllerErrorStateReason::Internal {
+                        context: ZK_NYM_STATE_CONTEXT.to_string(),
+                        details: format!("failed to join zk-nym fetching task: {err}"),
+                    },
+                ));
             }
         };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs`
around lines 224 - 229, The match arm handling zknym_result currently treats any
Err (a tokio::task::JoinError) as recoverable and restarts SyncingState; instead
detect the JoinError and transition into the error state: replace the Err(err)
branch that returns
NextAccountControllerState::NewState(SyncingState::enter(shared_state)) with a
call that surfaces the join error to the state machine, e.g. return
NextAccountControllerState::NewState(ErrorState::enter(AccountControllerErrorStateReason::Internal
{ source: format!("{err}") })) (or equivalent construction used by
ErrorState::enter), so JoinError from the zk-nym fetch task is reported as an
internal ErrorState rather than re-entering SyncingState. Ensure you reference
zknym_result, NextAccountControllerState, SyncingState::enter, ErrorState::enter
and AccountControllerErrorStateReason::Internal when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs`:
- Around line 224-229: The match arm handling zknym_result currently treats any
Err (a tokio::task::JoinError) as recoverable and restarts SyncingState; instead
detect the JoinError and transition into the error state: replace the Err(err)
branch that returns
NextAccountControllerState::NewState(SyncingState::enter(shared_state)) with a
call that surfaces the join error to the state machine, e.g. return
NextAccountControllerState::NewState(ErrorState::enter(AccountControllerErrorStateReason::Internal
{ source: format!("{err}") })) (or equivalent construction used by
ErrorState::enter), so JoinError from the zk-nym fetch task is reported as an
internal ErrorState rather than re-entering SyncingState. Ensure you reference
zknym_result, NextAccountControllerState, SyncingState::enter, ErrorState::enter
and AccountControllerErrorStateReason::Internal when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 651ca686-9c67-4825-8c29-28066a69e592

📥 Commits

Reviewing files that changed from the base of the PR and between 5652311 and 3c6fd34.

📒 Files selected for processing (9)
  • nym-vpn-core/crates/nym-vpn-account-controller/src/controller.rs
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/error_state.rs
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/logged_out_state.rs
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/offline_state.rs
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/pending_subscription_state.rs
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/ready_state.rs
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/upgrade_mode_state.rs

@jmwample jmwample added this to the v2026.10 - Tatry milestone May 21, 2026
@jmwample jmwample requested a review from simonwicky May 21, 2026 18:22
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs (2)

36-37: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Stale comment — the ZK_NYM_MAX_FAILS constant it described has been removed.

The comment on line 36 ("The maximum number of zk-nym requests that can fail in a row") no longer matches the constant immediately below it, which is the unrelated ZK_NYM_STATE_CONTEXT. Drop the orphaned comment to avoid misleading readers.

♻️ Proposed fix
-// The maximum number of zk-nym requests that can fail in a row
 const ZK_NYM_STATE_CONTEXT: &str = "ZK_NYM_STATE";
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs`
around lines 36 - 37, Remove the stale/incorrect comment above the constant
ZK_NYM_STATE_CONTEXT: the line "The maximum number of zk-nym requests that can
fail in a row" no longer applies and should be dropped (or replaced with a
short, accurate comment describing ZK_NYM_STATE_CONTEXT if you prefer). Locate
the constant named ZK_NYM_STATE_CONTEXT in requesting_zknym_state.rs and either
delete the orphaned comment immediately above it or update the comment text to
reflect that this constant is a context string for the zk-nym state.

43-43: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Update the rustdoc — retry mechanism description is now incorrect.

The struct doc still claims "A retry mechanism is in place, if the error is in the API requests. Other errors lead a single retry, then to the ErrorState since they are not recoverable." After this PR, no retries occur in this state: failures route directly to ReadyState (when sufficient tickets are stored) or ErrorState. Please update to match the new behavior so the contract documented here matches the implementation.

♻️ Suggested doc update
-/// A retry mechanism is in place, if the error is in the API requests. Other errors lead a single retry, then to the ErrorState since they are not recoverable.  
+/// On any zk-nym fetch failure, the state inspects local credential storage:
+/// if all ticket types are above the minimal threshold (or, for `BandwidthExceeded`,
+/// simply non-empty), it transitions to `ReadyState`; otherwise it transitions to
+/// `ErrorState`. There is no in-state retry/backoff.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs`
at line 43, The rustdoc for the RequestingZkNyMState struct is stale: it still
describes a retry mechanism. Update the doc comment in requesting_zknym_state.rs
for the RequestingZkNyMState struct to state that this state does not perform
retries; on failure it transitions directly to ReadyState when sufficient
tickets are stored or to ErrorState otherwise, and remove the previous text
about retrying API requests. Ensure the doc references the RequestingZkNyMState,
ReadyState, and ErrorState symbols so the documented contract matches the
current implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs`:
- Around line 36-37: Remove the stale/incorrect comment above the constant
ZK_NYM_STATE_CONTEXT: the line "The maximum number of zk-nym requests that can
fail in a row" no longer applies and should be dropped (or replaced with a
short, accurate comment describing ZK_NYM_STATE_CONTEXT if you prefer). Locate
the constant named ZK_NYM_STATE_CONTEXT in requesting_zknym_state.rs and either
delete the orphaned comment immediately above it or update the comment text to
reflect that this constant is a context string for the zk-nym state.
- Line 43: The rustdoc for the RequestingZkNyMState struct is stale: it still
describes a retry mechanism. Update the doc comment in requesting_zknym_state.rs
for the RequestingZkNyMState struct to state that this state does not perform
retries; on failure it transitions directly to ReadyState when sufficient
tickets are stored or to ErrorState otherwise, and remove the previous text
about retrying API requests. Ensure the doc references the RequestingZkNyMState,
ReadyState, and ErrorState symbols so the documented contract matches the
current implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0298a3a-3345-43b0-9709-2360de0dc56a

📥 Commits

Reviewing files that changed from the base of the PR and between 3c6fd34 and 55d41d4.

📒 Files selected for processing (1)
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs

Copy link
Copy Markdown
Collaborator

@pronebird pronebird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pronebird made 1 comment.
Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on jmwample, neacsu, simonwicky, and trojanfoe).


a discussion (no related file):
As noted on Zulip. I am not entirely sure this will yield better results. It would be better to persist account controller state and reduce number of requests for account summary.

@github-actions github-actions Bot requested a review from pronebird May 22, 2026 15:45
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs (2)

54-54: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale documentation: retry mechanism has been removed.

This comment states "A retry mechanism is in place" but this PR removes retry attempts from the state machine. The documentation should be updated to reflect the new behavior.

📝 Suggested doc update
-/// A retry mechanism is in place, if the error is in the API requests. Other errors lead to the ErrorState since they are not recoverable.
+/// Errors in API requests or other non-recoverable errors lead to the ErrorState.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs`
at line 54, Update the module-level comment in syncing_state/mod.rs to remove
the stale line "A retry mechanism is in place" and instead document the current
behavior: that retry attempts have been removed, API request errors are not
retried by the state machine, and non-recoverable errors transition to
ErrorState; locate the comment near the top of the syncing_state module (the
existing sentence describing retry behavior) and replace it with a brief note
reflecting the new non-retrying behavior and error transition semantics.

259-259: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Misleading log message: no retry occurs.

The log says "waiting before retrying" but this PR removes retry logic. The state machine transitions to PendingSubscriptionState and does not retry from here.

📝 Suggested fix
-                                tracing::debug!("Subscription is pending, waiting before retrying");
+                                tracing::debug!("Subscription is pending, transitioning to PendingSubscriptionState");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs`
at line 259, The log message "Subscription is pending, waiting before retrying"
is misleading because no retry occurs; update the tracing::debug! call (the one
that logs when transitioning to PendingSubscriptionState) to accurately reflect
the behavior, e.g., state transition or that the machine has entered
PendingSubscriptionState and will await external events rather than retrying.
Ensure the new message mentions PendingSubscriptionState so it's easy to locate
in logs and keep the tracing::debug! call at the same point in the transition
code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs`:
- Line 54: Update the module-level comment in syncing_state/mod.rs to remove the
stale line "A retry mechanism is in place" and instead document the current
behavior: that retry attempts have been removed, API request errors are not
retried by the state machine, and non-recoverable errors transition to
ErrorState; locate the comment near the top of the syncing_state module (the
existing sentence describing retry behavior) and replace it with a brief note
reflecting the new non-retrying behavior and error transition semantics.
- Line 259: The log message "Subscription is pending, waiting before retrying"
is misleading because no retry occurs; update the tracing::debug! call (the one
that logs when transitioning to PendingSubscriptionState) to accurately reflect
the behavior, e.g., state transition or that the machine has entered
PendingSubscriptionState and will await external events rather than retrying.
Ensure the new message mentions PendingSubscriptionState so it's easy to locate
in logs and keep the tracing::debug! call at the same point in the transition
code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: edc1f169-b5c9-4ef2-b1e5-f0442b007def

📥 Commits

Reviewing files that changed from the base of the PR and between c676737 and 17bcab9.

📒 Files selected for processing (1)
  • nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants