Reduce number of Account Controller retries#5365
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoved attempt-based retry/backoff from SyncingState and RequestingZkNymsState; simplified failure/finished routing to PendingSubscriptionState, ErrorState, ReadyState, or RequestingZkNymsState; updated function signatures and all state-machine call sites; adjusted logging and imports. ChangesRemove attempts-based retry and simplify sync flow
🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers:
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs`:
- Around line 32-36: The retry-budget check has an off-by-one: change the stop
condition that currently reads `self.attempts > MAX_SYNCING_ATTEMPTS` to use
`>=` (i.e. `self.attempts >= MAX_SYNCING_ATTEMPTS`) so the allowed attempts
match the declared MAX_SYNCING_ATTEMPTS, and if you prefer to keep the original
comparison change the constant instead (adjust `MAX_SYNCING_ATTEMPTS` to the
intended value); update the check that references `self.attempts` and
`MAX_SYNCING_ATTEMPTS` in the syncing state logic accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 06d1e439-a4a2-482b-a1b4-55116de497d1
📒 Files selected for processing (1)
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs (1)
224-229:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t restart
SyncingStateontokio::task::JoinErrorfrom the zk-nym fetch task
JoinHandlereturnsErr(JoinError)when the spawned task panicked or was cancelled; the current code treats that as recoverable by re-enteringSyncingState, bypassing the state machine’s “API-retry only” behavior and risking repeated restart cycles on deterministic failures. Surface it asErrorState::enter(AccountControllerErrorStateReason::Internal { ... })instead.Suggested fix
let join_result = match zknym_result { Ok(join_result) => join_result, Err(err) => { error!("Failed to join on the fetching task : {err}"); - return NextAccountControllerState::NewState(SyncingState::enter(shared_state)); + return NextAccountControllerState::NewState(ErrorState::enter( + AccountControllerErrorStateReason::Internal { + context: ZK_NYM_STATE_CONTEXT.to_string(), + details: format!("failed to join zk-nym fetching task: {err}"), + }, + )); } };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs` around lines 224 - 229, The match arm handling zknym_result currently treats any Err (a tokio::task::JoinError) as recoverable and restarts SyncingState; instead detect the JoinError and transition into the error state: replace the Err(err) branch that returns NextAccountControllerState::NewState(SyncingState::enter(shared_state)) with a call that surfaces the join error to the state machine, e.g. return NextAccountControllerState::NewState(ErrorState::enter(AccountControllerErrorStateReason::Internal { source: format!("{err}") })) (or equivalent construction used by ErrorState::enter), so JoinError from the zk-nym fetch task is reported as an internal ErrorState rather than re-entering SyncingState. Ensure you reference zknym_result, NextAccountControllerState, SyncingState::enter, ErrorState::enter and AccountControllerErrorStateReason::Internal when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs`:
- Around line 224-229: The match arm handling zknym_result currently treats any
Err (a tokio::task::JoinError) as recoverable and restarts SyncingState; instead
detect the JoinError and transition into the error state: replace the Err(err)
branch that returns
NextAccountControllerState::NewState(SyncingState::enter(shared_state)) with a
call that surfaces the join error to the state machine, e.g. return
NextAccountControllerState::NewState(ErrorState::enter(AccountControllerErrorStateReason::Internal
{ source: format!("{err}") })) (or equivalent construction used by
ErrorState::enter), so JoinError from the zk-nym fetch task is reported as an
internal ErrorState rather than re-entering SyncingState. Ensure you reference
zknym_result, NextAccountControllerState, SyncingState::enter, ErrorState::enter
and AccountControllerErrorStateReason::Internal when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 651ca686-9c67-4825-8c29-28066a69e592
📒 Files selected for processing (9)
nym-vpn-core/crates/nym-vpn-account-controller/src/controller.rsnym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/error_state.rsnym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/logged_out_state.rsnym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/offline_state.rsnym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/pending_subscription_state.rsnym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/ready_state.rsnym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rsnym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rsnym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/upgrade_mode_state.rs
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs (2)
36-37: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winStale comment — the
ZK_NYM_MAX_FAILSconstant it described has been removed.The comment on line 36 ("The maximum number of zk-nym requests that can fail in a row") no longer matches the constant immediately below it, which is the unrelated
ZK_NYM_STATE_CONTEXT. Drop the orphaned comment to avoid misleading readers.♻️ Proposed fix
-// The maximum number of zk-nym requests that can fail in a row const ZK_NYM_STATE_CONTEXT: &str = "ZK_NYM_STATE";🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs` around lines 36 - 37, Remove the stale/incorrect comment above the constant ZK_NYM_STATE_CONTEXT: the line "The maximum number of zk-nym requests that can fail in a row" no longer applies and should be dropped (or replaced with a short, accurate comment describing ZK_NYM_STATE_CONTEXT if you prefer). Locate the constant named ZK_NYM_STATE_CONTEXT in requesting_zknym_state.rs and either delete the orphaned comment immediately above it or update the comment text to reflect that this constant is a context string for the zk-nym state.
43-43: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winUpdate the rustdoc — retry mechanism description is now incorrect.
The struct doc still claims "A retry mechanism is in place, if the error is in the API requests. Other errors lead a single retry, then to the ErrorState since they are not recoverable." After this PR, no retries occur in this state: failures route directly to
ReadyState(when sufficient tickets are stored) orErrorState. Please update to match the new behavior so the contract documented here matches the implementation.♻️ Suggested doc update
-/// A retry mechanism is in place, if the error is in the API requests. Other errors lead a single retry, then to the ErrorState since they are not recoverable. +/// On any zk-nym fetch failure, the state inspects local credential storage: +/// if all ticket types are above the minimal threshold (or, for `BandwidthExceeded`, +/// simply non-empty), it transitions to `ReadyState`; otherwise it transitions to +/// `ErrorState`. There is no in-state retry/backoff.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs` at line 43, The rustdoc for the RequestingZkNyMState struct is stale: it still describes a retry mechanism. Update the doc comment in requesting_zknym_state.rs for the RequestingZkNyMState struct to state that this state does not perform retries; on failure it transitions directly to ReadyState when sufficient tickets are stored or to ErrorState otherwise, and remove the previous text about retrying API requests. Ensure the doc references the RequestingZkNyMState, ReadyState, and ErrorState symbols so the documented contract matches the current implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs`:
- Around line 36-37: Remove the stale/incorrect comment above the constant
ZK_NYM_STATE_CONTEXT: the line "The maximum number of zk-nym requests that can
fail in a row" no longer applies and should be dropped (or replaced with a
short, accurate comment describing ZK_NYM_STATE_CONTEXT if you prefer). Locate
the constant named ZK_NYM_STATE_CONTEXT in requesting_zknym_state.rs and either
delete the orphaned comment immediately above it or update the comment text to
reflect that this constant is a context string for the zk-nym state.
- Line 43: The rustdoc for the RequestingZkNyMState struct is stale: it still
describes a retry mechanism. Update the doc comment in requesting_zknym_state.rs
for the RequestingZkNyMState struct to state that this state does not perform
retries; on failure it transitions directly to ReadyState when sufficient
tickets are stored or to ErrorState otherwise, and remove the previous text
about retrying API requests. Ensure the doc references the RequestingZkNyMState,
ReadyState, and ErrorState symbols so the documented contract matches the
current implementation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b0298a3a-3345-43b0-9709-2360de0dc56a
📒 Files selected for processing (1)
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/requesting_zknym_state.rs
pronebird
left a comment
There was a problem hiding this comment.
@pronebird made 1 comment.
Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on jmwample, neacsu, simonwicky, and trojanfoe).
a discussion (no related file):
As noted on Zulip. I am not entirely sure this will yield better results. It would be better to persist account controller state and reduce number of requests for account summary.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs (2)
54-54:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale documentation: retry mechanism has been removed.
This comment states "A retry mechanism is in place" but this PR removes retry attempts from the state machine. The documentation should be updated to reflect the new behavior.
📝 Suggested doc update
-/// A retry mechanism is in place, if the error is in the API requests. Other errors lead to the ErrorState since they are not recoverable. +/// Errors in API requests or other non-recoverable errors lead to the ErrorState.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs` at line 54, Update the module-level comment in syncing_state/mod.rs to remove the stale line "A retry mechanism is in place" and instead document the current behavior: that retry attempts have been removed, API request errors are not retried by the state machine, and non-recoverable errors transition to ErrorState; locate the comment near the top of the syncing_state module (the existing sentence describing retry behavior) and replace it with a brief note reflecting the new non-retrying behavior and error transition semantics.
259-259:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMisleading log message: no retry occurs.
The log says "waiting before retrying" but this PR removes retry logic. The state machine transitions to
PendingSubscriptionStateand does not retry from here.📝 Suggested fix
- tracing::debug!("Subscription is pending, waiting before retrying"); + tracing::debug!("Subscription is pending, transitioning to PendingSubscriptionState");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs` at line 259, The log message "Subscription is pending, waiting before retrying" is misleading because no retry occurs; update the tracing::debug! call (the one that logs when transitioning to PendingSubscriptionState) to accurately reflect the behavior, e.g., state transition or that the machine has entered PendingSubscriptionState and will await external events rather than retrying. Ensure the new message mentions PendingSubscriptionState so it's easy to locate in logs and keep the tracing::debug! call at the same point in the transition code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs`:
- Line 54: Update the module-level comment in syncing_state/mod.rs to remove the
stale line "A retry mechanism is in place" and instead document the current
behavior: that retry attempts have been removed, API request errors are not
retried by the state machine, and non-recoverable errors transition to
ErrorState; locate the comment near the top of the syncing_state module (the
existing sentence describing retry behavior) and replace it with a brief note
reflecting the new non-retrying behavior and error transition semantics.
- Line 259: The log message "Subscription is pending, waiting before retrying"
is misleading because no retry occurs; update the tracing::debug! call (the one
that logs when transitioning to PendingSubscriptionState) to accurately reflect
the behavior, e.g., state transition or that the machine has entered
PendingSubscriptionState and will await external events rather than retrying.
Ensure the new message mentions PendingSubscriptionState so it's easy to locate
in logs and keep the tracing::debug! call at the same point in the transition
code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: edc1f169-b5c9-4ef2-b1e5-f0442b007def
📒 Files selected for processing (1)
nym-vpn-core/crates/nym-vpn-account-controller/src/state_machine/syncing_state/mod.rs



Ticket
JIRA-NYM-XXXX
Description
The account controller is retrying way to many times way to quickly.
This PR reduces the number of retries down to 2 -- which is actually 9 HTTP request attempts as the HTTP client is configured with 3 retries per attempt, and inital + 2 sync retry attempts.Also the wait between attempts now starts at a 2s delay.This PR removes the retry attempts from the account controller state machine. If the account controller encounters a failed step it transitions to the error state directly.
Also adds a check for stored sufficient zk_nyms before transitioning to the RequestingZkNyms state so we skip an API access if it is not necessary.
This change is
Summary by CodeRabbit