Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DryRunApi client-facing XCM versions #7438

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

mrshiposha
Copy link
Contributor

@mrshiposha mrshiposha commented Feb 3, 2025

Description

Fixes #7413

Integration

This PR updates the DryRunApi. The signature of the dry_run_call is changed, and the XCM version of the return values of dry_run_xcm now follows the version of the input XCM program.

Review Notes

  • The DryRunApi is modified
  • Added the Router::clear_messages to dry_run_xcm common implementation
  • Fixed the xcmp-queue's Router's clear_messages: channels details' first_index and last_index are reset when clearing
  • The MIN_XCM_VERSION is added
  • The common implementation in the pallet-xcm is modified accordingly
  • The DryRunApi tests are modified to account for testing old XCM versions
  • The implementation from the pallet-xcm is used where it was not used (including the DryRunApi tests)
  • All the runtime implementations are modified according to the Runtime API change

@mrshiposha mrshiposha requested a review from a team as a code owner February 3, 2025 15:30
@bkchr
Copy link
Member

bkchr commented Feb 7, 2025

This PR makes breaking changes to the DryRunApi. The signature of the dry_run_call is changed, and the XCM version of the return values of dry_run_xcm now follows the version of the input XCM program.

Can you not just determine the XCM version of input xcm program and use this for the output? Otherwise you should at least bump the version of the runtime api.

@mrshiposha
Copy link
Contributor Author

@bkchr

Can you not just determine the XCM version of input xcm program and use this for the output?

I did this for the dry_run_xcm. It was possible there. As for the dry_run_call, there is no input XCM program, just an arbitrary RuntimeCall.

Otherwise you should at least bump the version of the runtime API.

Sure, will do.

@bkchr
Copy link
Member

bkchr commented Feb 7, 2025

As for the dry_run_call, there is no input XCM program, just an arbitrary RuntimeCall.

Somewhere in this RuntimeCall there will be a XCM? If not, you could just assume the minimum XCM version?

@mrshiposha
Copy link
Contributor Author

We could examine the call and see if it is, say, transfer_assets from pallet-xcm. It seems to be a good enough default for common implementation.

I only worried about alternative XCM frontends like ORML pallet-xtokens. If the common implementation only considers pallet-xcm, the pallet-xtokens users will always receive the minimum XCM version.

CC @xlc

@acatangiu
Copy link
Contributor

Assuming minimum is brittle, some new instructions do not have equivalents in older versions and conversion will fail. You could start with minimum and go through them until one successfully converts, but feels like overkill.

I'm personally happy with the current PR state where an explicit version is requested by the caller.

@@ -64,7 +64,7 @@ sp_api::decl_runtime_apis! {
OriginCaller: Encode
{
/// Dry run call.
fn dry_run_call(origin: OriginCaller, call: Call) -> Result<CallDryRunEffects<Event>, Error>;
fn dry_run_call(origin: OriginCaller, xcms_version: XcmVersion, call: Call) -> Result<CallDryRunEffects<Event>, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this param between origin and call makes one think it is somehow related to the call.

I would move it at the back, and also rename it to something more explicit like result_xcm_version/output_xcm_version/desired_xcm_version.

Suggested change
fn dry_run_call(origin: OriginCaller, xcms_version: XcmVersion, call: Call) -> Result<CallDryRunEffects<Event>, Error>;
fn dry_run_call(origin: OriginCaller, call: Call, result_xcms_version: XcmVersion) -> Result<CallDryRunEffects<Event>, Error>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

frame_system::Pallet::<Runtime>::reset_events(); // To make sure we only record events from current call.

// To make sure we only record events from current call.
Router::clear_messages();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! I see this was done for dry_run_call but not for dry_run_xcm 🤦‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet, this fix breaks penpal runtime somehow...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the xcmp-queue's clear_messages doesn't fix the page indices in the OutboundXcmpStatus and this causes penpal to break when using the common DryRunApi::dry_run_xcm implementation from pallet-xcm (see CI, broken tests::xcm_fee_estimation::multi_hop_works and tests::xcm_fee_estimation::multi_hop_pay_fees_works).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it influence prod, though? I have a feeling that these storages should be empty either way when running Runtime API properly.

Maybe the issue is with the test itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it influence prod, though? I have a feeling that these storages should be empty either way when running Runtime API properly.

Why should it be empty? Calling clear_messages here is correct, as otherwise the function will return invalid messages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That got send when building block the runtime api function is called on)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, calling clear_messages() here is correct. Can you look into the tests and find out why exactly they are failing? Is it penpal runtime config issue or test issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should it be empty?

I just thought that maybe Runtime API runs with the state of the last finalized block. And I remembered that ParachainSystem's on_finalize clears the XCMP storages. However, it seems it does it partially anyway. So yeah, manual clearing before dry-running is needed anyway.

I will look into that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned out it wasn't the Penpal. The bug appeared in the Westend/Rococo AssetHub runtimes.

See the fix: 98dea1d

As to why it appeared:

  1. The xcmp-queue's OutboundXcmpMessages and OutboundXcmpStatus are supposed to be in sync. It isn't enough to clear OutboundXcmpMessages without syncing the indices from the OutboundXcmpStatus. If the storages are out-of-sync, it is possible (and that is what happened in the tests) that the have_active flag is erroneously set and the execution goes on the wrong branch. See here.
  2. How they got out-of-sync: The integration tests use an emulated environment and call runtime APIs directly as Rust functions without things like with_transaction. This way, the dry run turns into an actual run, and all the networks process the resulting messages. The AssetHub is the reserve chain in the test scenario and has its storages modified after the PenpalA chain "dry-runs" a transfer call. When we try to "dry-run" a forwarded XCM program, we run into the issue of the storage being out-of-sync if we use the clear_messages implementation without the provided fix. You can verify this reasoning by looking at this code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find!

@acatangiu acatangiu added T6-XCM This PR/Issue is related to XCM. T4-runtime_API This PR/Issue is related to runtime APIs. labels Feb 10, 2025
Copy link
Contributor

@acatangiu acatangiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T4-runtime_API This PR/Issue is related to runtime APIs. T6-XCM This PR/Issue is related to XCM.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DryRunApi.dryRunCall always produces the latest XCM version report
5 participants