Skip to content

feat(ymax-planner): Fallback RPC endpoint support#12624

Open
usmanmani1122 wants to merge 1 commit into
masterfrom
fallback-rpc-in-ymax-planner
Open

feat(ymax-planner): Fallback RPC endpoint support#12624
usmanmani1122 wants to merge 1 commit into
masterfrom
fallback-rpc-in-ymax-planner

Conversation

@usmanmani1122

Copy link
Copy Markdown
Contributor

closes: #XXXX
refs: #XXXX

Description

Adds fallback RPC support to the ymax planner service. When a primary RPC node is unavailable or falling behind, requests automatically fail over to a configured fallback node

Key changes:

  • src/main.ts: Adds overrideNetworkConfigFromEnvironmentVariables to support OVERRIDE_RPC_ADDRESS, FALLBACK_RPC_ADDRESS, and OVERRIDE_WEBSOCKET_URL env vars. Switches makeSmartWalletKit and makeSigningSmartWalletKit imports from @agoric/client-utils to local wrappers that are fallback-aware
  • src/wrappers/stargate-client.ts (new): Wraps SigningStargateClient so that broadcastTx, sign, and signAndBroadcast each verify the primary node's health before calling it, falling back to the secondary node on failure
  • src/wrappers/vstorage-kit.ts (new): Wraps vstorage's readStorage and readStorageMeta with the same primary/fallback pattern
  • src/wrappers/smart-wallet.ts (new): Thin adapter that constructs a SmartWalletKit using the fallback-aware vstorage kit
  • src/wrappers/signing-smart-wallet.ts (new): Thin adapter that constructs a SigningSmartWalletKit using the fallback-aware stargate client
  • src/wrappers/utils.ts (new): verifyStatus helper that hits the RPC /status endpoint with a 5-second cache TTL, rejecting if the node is catching up

Security Considerations

No new authorities are introduced. The fallback RPC address is configured via a server-side environment variable FALLBACK_RPC_ADDRESS, not user input. The status check uses AbortSignal.timeout to prevent hanging requests. Trust boundaries remain the same — both primary and fallback nodes are expected to be trusted RPC endpoints configured by the operator

Scaling Considerations

Each RPC call now incurs an additional /status health check against the primary node, cached for 5 seconds (roughly one block interval). This adds one lightweight HTTP request per ~5 seconds under load. When a fallback client is configured, an additional SigningStargateClient and VStorage connection are maintained, doubling the connection footprint to RPC nodes

Documentation Considerations

N/A

Testing Considerations

N/A

Upgrade Considerations

This is an additive change to the ymax planner service. Existing deployments without the new environment variables will behave identically to before — no fallback client is created unless FALLBACK_RPC_ADDRESS is set. To adopt, operators set the env vars and redeploy; no data migration or on-chain changes required

@gibson042 gibson042 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely do not want to be in the business of maintaining this collection of parallel client-utils infrastructure. If changes are worth making, they're worth making in the source package.

Comment on lines +77 to +94
/**
* Parts of the network config fetched from the chain
* can be overriden using env vars starting from
* `OVERRIDE_`
*/
const overrideNetworkConfigFromEnvironmentVariables = (
{ chainName, rpcAddrs }: MinimalNetworkConfig,
{ env }: { env: NodeJS.ProcessEnv },
) => {
if (env.OVERRIDE_RPC_ADDRESS) rpcAddrs[0] = env.OVERRIDE_RPC_ADDRESS;
if (env.FALLBACK_RPC_ADDRESS) rpcAddrs.push(env.FALLBACK_RPC_ADDRESS);

return {
chainName,
rpcAddrs,
websocketUrl: env.OVERRIDE_WEBSOCKET_URL || rpcAddrs.find(Boolean)!,
};
};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it will interact poorly with AGORIC_NET. I think I'd rather expand that to support e.g. "$fqdn1,…,$chainId".

return makeVstorageKitFromVstorage({ vstorage, networkConfig });
};

const makeVstoragetWrapper = (

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Comment on lines +48 to +49
import makeSmartWalletKit from './wrappers/smart-wallet.ts';
import makeSigningSmartWalletKit from './wrappers/signing-smart-wallet.ts';

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name reüse makes it super hard to keep track of what's going on, and will even hinder following likes in client-utils.

Comment on lines +110 to +118
try {
if (fallbackVstorage) await verifyStatus({ now }, priorityRpcAddress);

curr = await vstorage.readStorageMeta(path, {
height: config?.height,
kind: config?.kind,
});
} catch (err) {
if (fallbackVstorage) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much redundancy; too much forward maintenance burden.

await null;

try {
if (fallbackVstorage) await verifyStatus({ now }, priorityRpcAddress);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An extra query per block?

await null;

try {
if (fallbackClient) await verifyStatus({ now }, priorityRpcAddress);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should the tx signer care whether or not the receiving RPC node is catching up?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should the tx signer care whether or not the receiving RPC node is catching up?

It's likely/possible the signed message will be using an out of sync seq num if the RPC is behind.

@mhofman mhofman left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully reviewed but I was hoping there would be a way to more directly hook into the startgate client to avoid patching on top of it.

Comment on lines +181 to +183
client.broadcastTx = broadcastTx;
client.sign = sign;
client.signAndBroadcast = signAndBroadcast;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to override signAndBroadcastSync too for completeness ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But really I'm worried this is "patching on top" and that all the other StargateClient methods wouldn't end up using the fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants