Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sweep: start tracking input spending status in the fee bumper #9447

Merged
merged 19 commits into from
Feb 21, 2025

Conversation

yyforyongyu
Copy link
Member

@yyforyongyu yyforyongyu commented Jan 26, 2025

There are two places tracking the spending status of a given input - one in the sweeper, the other in the fee bumper. We now move the tracking to be handled in the fee bumper so we always have a single source of truth. By the end of this fix, we should see that,

  • the fee func will be kept on the original line when retrying sweeps.
  • both the sweeper and the fee bumper can recover their state from a restart.
  • for the neutrino backend, the initial sweeping tx is now always RBF-compliant.

The fix is made of two PRs to keep the size small - the first PR will enable tracking the spending status of inputs in the fee bumper, and the second will fix the rest.

Depends on,


This change is Reviewable

@yyforyongyu yyforyongyu added utxo sweeping no-itest no-changelog size/micro small bug fix or feature, less than 15 mins of review, less than 250 labels Jan 26, 2025
Copy link
Contributor

coderabbitai bot commented Jan 26, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, one thing that wasn't immediately obvious to me is: where do we fix the issue that the the state of the fee function is properly carried over into the new batch (instead of reset) when one of the inputs in a cohort is spent?

continue
}

log.Warnf("Detected third party spent of output=%v "+
"in tx=%v", op, spend.SpendingTx.TxHash())
spendingTx := spend.SpendingTx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should actually block here, even if just for a moment, to allow the scheduler to run the goroutine that does the dispatch.

Spent a bit of time to re-familiarize myself with the notifier after the latest set of refactors, and I don't see an area where we'll insta dispatch the response before exiting the initial method call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean we perform a tiny sleep (sth likecase <- time.After(100ms)) instead?

We did refactor this area a bit here 1200b75, which makes sure the block is always sent before the tx, but we cannot guarantee the order is maintained since pipeline is a bit deep we cannot be sure they are read in this order.

A previous attempt was to implement a method HasOutpointSpent on Blockbeat - the idea is that, whenever we are notified of a block, we can easily access the block data to see if the watched outputs are spent or not, hence eliminating the race, which guarantees we won't miss a spending event. However there was some difficult involved when implementing it in neutrino, as discussed here. Now that you mention it I think it's still worthy to keep it as a TODO, since we can just register spend when it's neutrino to avoid fetching full blocks, and read the block data when it's a full node.

@yyforyongyu yyforyongyu force-pushed the yy-prepare-fee-replace branch from a738e7f to b98542b Compare February 5, 2025 11:53
@yyforyongyu yyforyongyu force-pushed the yy-sweeper-fix branch 2 times, most recently from 0ca8914 to 18df4fb Compare February 5, 2025 12:49
@yyforyongyu
Copy link
Member Author

Note to reviewers - the itest is disabled here and the CI should pass in #9448. I think once approved we can merge #9448 to this branch, and merge this PR to master.

@yyforyongyu yyforyongyu changed the base branch from yy-prepare-fee-replace to master February 5, 2025 14:49
Copy link
Collaborator

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code itself looks pretty good to me. Only nits there.

The first commit dcd5119 looks like it doesn't belong in this PR, since the added test isn't fixed until the next PR.

Also TxUnknownSpend is currently unhandled in UtxoSweeper, which means things would break if we merged this PR alone. We should really add a minimal switch case to UtxoSweeper.handleBumpEvent that keeps the existing behavior for TxUnknownSpend (i.e. same as TxFailed case).

@yyforyongyu yyforyongyu force-pushed the yy-sweeper-fix branch 2 times, most recently from cb67094 to e2a7210 Compare February 7, 2025 14:14
@yyforyongyu
Copy link
Member Author

The first commit dcd5119 looks like it doesn't belong in this PR, since the added test isn't fixed until the next PR.

Moved to the next PR.

Also TxUnknownSpend is currently unhandled in UtxoSweeper, which means things would break if we merged this PR alone. We should really add a minimal switch case to UtxoSweeper.handleBumpEvent that keeps the existing behavior for TxUnknownSpend (i.e. same as TxFailed case).

We can't merge this PR back to the master alone tho. The plan is to merge #9448 to this one, and then merge this one to the master, otherwise the itests would fail. Maybe I should've just created one PR instead - was thinking about reducing each PR's size, but the split could've done better I guess.

Copy link
Collaborator

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. Will wait to approve until tests pass.

I'll start looking at the next PR today.

@Roasbeef Roasbeef requested a review from morehouse February 12, 2025 02:46
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @morehouse and @yyforyongyu)

@yyforyongyu yyforyongyu force-pushed the yy-sweeper-fix branch 2 times, most recently from cfbf023 to 53f84ed Compare February 13, 2025 15:14
@lightninglabs-deploy
Copy link

@morehouse: review reminder
@yyforyongyu, remember to re-request review from reviewers when ready

To track the input and its spending tx, which will be used later to
detect unknown spends.
This commit refactors the `processRecords` to always handle the inputs
spent when processing the records. We now make sure to handle unknown
spends for all backends (previously only neutrino), and rely solely on
the spending notification to give us the onchain status of inputs.
We now rename "third party" to "unknown" as the inputs can be spent via
an older sweeping tx, a third party (anchor), or a remote party (pin).
In fee bumper we don't have the info to distinguish the above cases, and
leave them to be further handled by the sweeper as it has more context.
This commit adds a new field `InputsSpent` to the `BumpResult` so they
can be used to track inputs spent by txns not recoginized by the fee
bumper.
We now start handling `TxUnknownSpend` in our sweeper to make sure the
failed inputs are retried when possible.
This is a minor refactor so the `createAndPublishTx` flow becomes more
clear, also prepares for the following commit where we start to handle
missing inputs.
A minor refactor to break the method `handleUnknownSpent` into two
steps, which prepares the following commit where we start handling
missing inputs.
This commit refactors `handleInitialTxError` and `createAndCheckTx` to
take a `monitorRecord` param, which prepares for the following commit
where we start handling missing inputs.
This commit handles the case when the input is missing during the RBF
process, which could happen when the bumped tx has inputs being spent by
a third party. Normally we should be able to catch the spend early via
the spending notification and never attempt to fee bump the record.
However, due to the possible race between block notification and spend
notification, this cannot be guaranteed. Thus, we need to handle the
case during the RBF when seeing a `ErrMissingInputs`, which can only
happen when the inputs are spent by others.
This commit adds the failed tx to the result when marking the input as
fatal, which is used in the commit resolver when handling revoked
outputs.
Previously, when a given input is found spent in the mempool, we'd mark
it as Published and never offer it to the fee bumper. This is dangerous
as the input will never be fee bumped. We now fix it by always
initializing the input with state Init, and only use mempool to check
for fee and fee rate.

This changes the current restart behavior - as previously when a
sweeping tx is broadcast, the node shuts down, when it starts again, the
input will be offered to the sweeper again, but not to the fee bumper,
which means the sweeping tx will stay in the mempool with the last-tried
fee rate. After this change, after a restart, the input will be swept
again, and the fee bumper will monitor its status. The restart will also
behave like a fee bump if there's already an existing sweeping tx in the
mempool.
So we can focus on testing normal flow vs persistence flow.
Before this commit, the only error returned from `IsOurTx` is when the
root bucket was not created. In that case, we should consider the tx to
be not found in our db, since technically our db is empty.

A future PR may consider treating our wallet as the single source of
truth and query the wallet instead to check for past sweeping txns.
Copy link
Collaborator

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yyforyongyu
Copy link
Member Author

check commits failed with no space left error again, weird.

@morehouse
Copy link
Collaborator

It would be good to figure out why it keeps doing that.

But IMO this PR is still good to go. I did a cursory check that commits appear in the same order as on #9448, and the only diff is the final commit added to satisfy the linter: 9f7e2bf.

@guggero
Copy link
Collaborator

guggero commented Feb 20, 2025

It would be good to figure out why it keeps doing that.

I think it's just that the available space on the GitHub runners is very low. So it's probably just the build cache getting too large with all the different commits being compiled one-by-one.
Perhaps we shouldn't also use the GitHub cache feature, as that will stack up even more as it then combines the build caches from multiple runs.

I cleaned the GitHub cache and re-ran the step.

@guggero
Copy link
Collaborator

guggero commented Feb 20, 2025

It would be good to figure out why it keeps doing that.

I think it's just that the available space on the GitHub runners is very low. So it's probably just the build cache getting too large with all the different commits being compiled one-by-one. Perhaps we shouldn't also use the GitHub cache feature, as that will stack up even more as it then combines the build caches from multiple runs.

I cleaned the GitHub cache and re-ran the step.

Actually, turns out we're caching things twice, since actions/setup-go now automatically caches the build and module cache.
Fixing that in #9535, so this should become even less likely.

@Roasbeef Roasbeef requested a review from morehouse February 21, 2025 00:53
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewed 6 of 12 files at r3.
Reviewable status: 7 of 13 files reviewed, 9 unresolved discussions (waiting on @morehouse and @yyforyongyu)

@Roasbeef Roasbeef merged commit 553899b into lightningnetwork:master Feb 21, 2025
31 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/micro small bug fix or feature, less than 15 mins of review, less than 250 utxo sweeping
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants