Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DryRunApi client-facing XCM versions #7438

Merged
merged 25 commits into from
Feb 24, 2025

Conversation

mrshiposha
Copy link
Contributor

@mrshiposha mrshiposha commented Feb 3, 2025

Description

Fixes #7413

Integration

This PR updates the DryRunApi. The signature of the dry_run_call is changed, and the XCM version of the return values of dry_run_xcm now follows the version of the input XCM program.

Review Notes

  • The DryRunApi is modified
  • Added the Router::clear_messages to dry_run_xcm common implementation
  • Fixed the xcmp-queue's Router's clear_messages: channels details' first_index and last_index are reset when clearing
  • The MIN_XCM_VERSION is added
  • The common implementation in the pallet-xcm is modified accordingly
  • The DryRunApi tests are modified to account for testing old XCM versions
  • The implementation from the pallet-xcm is used where it was not used (including the DryRunApi tests)
  • All the runtime implementations are modified according to the Runtime API change

@mrshiposha mrshiposha requested a review from a team as a code owner February 3, 2025 15:30
@bkchr
Copy link
Member

bkchr commented Feb 7, 2025

This PR makes breaking changes to the DryRunApi. The signature of the dry_run_call is changed, and the XCM version of the return values of dry_run_xcm now follows the version of the input XCM program.

Can you not just determine the XCM version of input xcm program and use this for the output? Otherwise you should at least bump the version of the runtime api.

@mrshiposha
Copy link
Contributor Author

@bkchr

Can you not just determine the XCM version of input xcm program and use this for the output?

I did this for the dry_run_xcm. It was possible there. As for the dry_run_call, there is no input XCM program, just an arbitrary RuntimeCall.

Otherwise you should at least bump the version of the runtime API.

Sure, will do.

@bkchr
Copy link
Member

bkchr commented Feb 7, 2025

As for the dry_run_call, there is no input XCM program, just an arbitrary RuntimeCall.

Somewhere in this RuntimeCall there will be a XCM? If not, you could just assume the minimum XCM version?

@mrshiposha
Copy link
Contributor Author

We could examine the call and see if it is, say, transfer_assets from pallet-xcm. It seems to be a good enough default for common implementation.

I only worried about alternative XCM frontends like ORML pallet-xtokens. If the common implementation only considers pallet-xcm, the pallet-xtokens users will always receive the minimum XCM version.

CC @xlc

@acatangiu
Copy link
Contributor

Assuming minimum is brittle, some new instructions do not have equivalents in older versions and conversion will fail. You could start with minimum and go through them until one successfully converts, but feels like overkill.

I'm personally happy with the current PR state where an explicit version is requested by the caller.

@@ -64,7 +64,7 @@ sp_api::decl_runtime_apis! {
OriginCaller: Encode
{
/// Dry run call.
fn dry_run_call(origin: OriginCaller, call: Call) -> Result<CallDryRunEffects<Event>, Error>;
fn dry_run_call(origin: OriginCaller, xcms_version: XcmVersion, call: Call) -> Result<CallDryRunEffects<Event>, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this param between origin and call makes one think it is somehow related to the call.

I would move it at the back, and also rename it to something more explicit like result_xcm_version/output_xcm_version/desired_xcm_version.

Suggested change
fn dry_run_call(origin: OriginCaller, xcms_version: XcmVersion, call: Call) -> Result<CallDryRunEffects<Event>, Error>;
fn dry_run_call(origin: OriginCaller, call: Call, result_xcms_version: XcmVersion) -> Result<CallDryRunEffects<Event>, Error>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

frame_system::Pallet::<Runtime>::reset_events(); // To make sure we only record events from current call.

// To make sure we only record events from current call.
Router::clear_messages();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! I see this was done for dry_run_call but not for dry_run_xcm 🤦‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet, this fix breaks penpal runtime somehow...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the xcmp-queue's clear_messages doesn't fix the page indices in the OutboundXcmpStatus and this causes penpal to break when using the common DryRunApi::dry_run_xcm implementation from pallet-xcm (see CI, broken tests::xcm_fee_estimation::multi_hop_works and tests::xcm_fee_estimation::multi_hop_pay_fees_works).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it influence prod, though? I have a feeling that these storages should be empty either way when running Runtime API properly.

Maybe the issue is with the test itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it influence prod, though? I have a feeling that these storages should be empty either way when running Runtime API properly.

Why should it be empty? Calling clear_messages here is correct, as otherwise the function will return invalid messages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That got send when building block the runtime api function is called on)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, calling clear_messages() here is correct. Can you look into the tests and find out why exactly they are failing? Is it penpal runtime config issue or test issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should it be empty?

I just thought that maybe Runtime API runs with the state of the last finalized block. And I remembered that ParachainSystem's on_finalize clears the XCMP storages. However, it seems it does it partially anyway. So yeah, manual clearing before dry-running is needed anyway.

I will look into that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned out it wasn't the Penpal. The bug appeared in the Westend/Rococo AssetHub runtimes.

See the fix: 98dea1d

As to why it appeared:

  1. The xcmp-queue's OutboundXcmpMessages and OutboundXcmpStatus are supposed to be in sync. It isn't enough to clear OutboundXcmpMessages without syncing the indices from the OutboundXcmpStatus. If the storages are out-of-sync, it is possible (and that is what happened in the tests) that the have_active flag is erroneously set and the execution goes on the wrong branch. See here.
  2. How they got out-of-sync: The integration tests use an emulated environment and call runtime APIs directly as Rust functions without things like with_transaction. This way, the dry run turns into an actual run, and all the networks process the resulting messages. The AssetHub is the reserve chain in the test scenario and has its storages modified after the PenpalA chain "dry-runs" a transfer call. When we try to "dry-run" a forwarded XCM program, we run into the issue of the storage being out-of-sync if we use the clear_messages implementation without the provided fix. You can verify this reasoning by looking at this code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find!

@acatangiu acatangiu added T6-XCM This PR/Issue is related to XCM. T4-runtime_API This PR/Issue is related to runtime APIs. labels Feb 10, 2025
Copy link
Contributor

@acatangiu acatangiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@acatangiu acatangiu added this pull request to the merge queue Feb 24, 2025
@acatangiu acatangiu added the A4-needs-backport Pull request must be backported to all maintained releases. label Feb 24, 2025
Merged via the queue into paritytech:master with commit 963f0d7 Feb 24, 2025
238 of 266 checks passed
@paritytech-cmd-bot-polkadot-sdk
Copy link
Contributor

Created backport PR for stable2407:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7438-to-stable2407
git worktree add --checkout .worktree/backport-7438-to-stable2407 backport-7438-to-stable2407
cd .worktree/backport-7438-to-stable2407
git reset --hard HEAD^
git cherry-pick -x 963f0d734e386023957354add6bb1dba62b2776c
git push --force-with-lease

@paritytech-cmd-bot-polkadot-sdk
Copy link
Contributor

Created backport PR for stable2409:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7438-to-stable2409
git worktree add --checkout .worktree/backport-7438-to-stable2409 backport-7438-to-stable2409
cd .worktree/backport-7438-to-stable2409
git reset --hard HEAD^
git cherry-pick -x 963f0d734e386023957354add6bb1dba62b2776c
git push --force-with-lease

@paritytech-cmd-bot-polkadot-sdk
Copy link
Contributor

Created backport PR for stable2412:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7438-to-stable2412
git worktree add --checkout .worktree/backport-7438-to-stable2412 backport-7438-to-stable2412
cd .worktree/backport-7438-to-stable2412
git reset --hard HEAD^
git cherry-pick -x 963f0d734e386023957354add6bb1dba62b2776c
git push --force-with-lease

@paritytech-cmd-bot-polkadot-sdk
Copy link
Contributor

Created backport PR for stable2503:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7438-to-stable2503
git worktree add --checkout .worktree/backport-7438-to-stable2503 backport-7438-to-stable2503
cd .worktree/backport-7438-to-stable2503
git reset --hard HEAD^
git cherry-pick -x 963f0d734e386023957354add6bb1dba62b2776c
git push --force-with-lease

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases. T4-runtime_API This PR/Issue is related to runtime APIs. T6-XCM This PR/Issue is related to XCM.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DryRunApi.dryRunCall always produces the latest XCM version report
5 participants