section | title | lang | permalink | excerpt |
---|---|---|---|---|
sdk |
Security & Recovery in Scroll SDK |
en |
sdk/operation/security-and-recovery |
Learn more about security and recovery in Scroll SDK |
import Aside from "../../../../../components/Aside.astro" import ToggleElement from "../../../../../components/ToggleElement.astro" import Steps from "../../../../../components/Steps/Steps.astro"
This article will cover many aspects about the security of the protocol, but attack vectors against you, your network and your users go well beyond your Scroll SDK configuration. Consider if you pass [the Rekt test](https://blog.trailofbits.com/2023/08/14/can-you-pass-the-rekt-test/).The well-being of your users is your responsibility, and we urge you to consider all aspects of security when running a chain. Hiring a security professional is advised.
For a comprehensive overview of the security of the protocol, L2Beat's overview of Scroll is a great place to understand the risks, centralization points and permissioned operators on Scroll chain. Because Scroll is a single entity (who also built the tech), the risk factors may increase as you coordinate with external parties (ie RaaS providers).
For any discoveries of critical vulnerabilities outside of the scope of the bug bounty program, please also send reports to [email protected].For a list of independent audits of the Scroll protocol, see Audits & Bug Bounty.
Additionally, Scroll SDK has undergone the following audits:
- Alternative Gas Token Contracts and Gas Oracle
- Trail of Bits (Report to be released)
{/* TODO: Add audit report URL */}
Audits don't guarantee the absence of security vulnerabilities. Using blockchains comes with risk, and Scroll is no exception. We encourage users to use the protocol with caution and at their own risk.Because the Owner Role has the ability to upgrade smart contracts, it can compromise the bridge and user funds. This account should be a multi-sig wallet, and we encourage you to review the best practices for creating a Security Council.
If a RaaS provider is used, create a plan for multi-sig upgrades where the provider cannot arbitrarily upgrade the contracts.
The following accounts are given roles that have special permissions and should be managed with extra care:
DEPLOYER
- Used to deploy initial contracts and has permissions to set the initial
OWNER
- Private key held in
contracts
service
- Used to deploy initial contracts and has permissions to set the initial
OWNER
- Can upgrade contracts, set important parameters, whitelist accounts to grant them roles.
- Should be a multi-sig wallet, with the RaaS provider having no more signing authority than the other signers.
L1_GAS_ORACLE_SENDER
- Permissioned to report L2 gas prices to L1
L1_SCROLL_MESSENGER
contract - Private key held in
gas-oracle
service (unless using Web3Signer)
- Permissioned to report L2 gas prices to L1
L2_GAS_ORACLE_SENDER
- Permissioned to report L1 gas prices to L2
L1_GAS_PRICE_ORACLE
contract - Private key held in
gas-oracle
service (unless using Web3Signer)
- Permissioned to report L1 gas prices to L2
L1_COMMIT_SENDER_ADDR
- Permissioned to submit batches to the L1
ScrollChain
contract - Private key held in
rollup-node
service (unless using Web3Signer)
- Permissioned to submit batches to the L1
L1_FINALIZE_SENDER
- Permissioned to submit proofs and finalize batches on L1
ScrollChain
contract - Private key held in
rollup-node
service (unless using Web3Signer)
- Permissioned to submit proofs and finalize batches on L1
For additional assessments on protocol permissions and to see how Scroll manages multisigs and timelocks, see L2Beat's Scroll permissions.
By default, Scroll SDK's production deployments are configured to store "hot" private keys in the service and a secret manager service. We use ExternalSecrets to support a variety of secret manager services, but by default, the CLI tool only automates AWS Secrets Manager and an insecure, development-only deployment of HashiCorp Vault.
We intend to add support for Web3Signer in the future as well, allowing more restricted access to apply to a single service.
For more information on implementing access control to specific parts of your cluster, see Kubernetes: Using RBAC Authorization.
In extreme security instances, you may need to pause the bridge. The easiest way to do this quickly from the infrastructure operator is to bring the rollup node offline. This way, even if blocks continue to be produced, finalization (and thus new withdrawals) will not be processed until the rollup-node
is back online.
Rotating the keys for the gas-oracle
and rollup-node
accounts is a manual process requiring involvement from the OWNER
role.
At a high level, you simply need to add the new key to the whitelist, restart your services, and then remove the old key from the whitelist.
{/* TODO: Provide cast commands for doing this process. */}
Rotating sequencer keys requires careful coordination to ensure continuous block production. The process involves running two sequencer nodes temporarily - the active sequencer and a new sequencer with the new keys.
1. Update your L2 Geth nodes to the latest version 2. Prepare a second value file for the new sequencer with: - New keystore and password - New nodekey - Updated peer list 3. Ensure all L2 Geth services have both sequencers in their `L2GETH_PEER_LIST` 1. Deploy the new sequencer node with the new keys 2. Verify the new sequencer is fully synced and connected to peers 3. On the active sequencer, connect to the Geth console: ```bash geth attach /l2geth/data/geth.ipc ```or, if using `kubectl`:
```bash
kubectl exec -it l2-sequencer-0 -- geth attach /l2geth/data/geth.ipc
```
-
Check current active signer:
clique.getSigners()
-
Propose the new signer (replace with your new signer address):
clique.propose("0xNEW_SIGNER_ADDRESS", true)
-
Wait for one block to be generated, then verify both signers are active:
clique.getSigners() // Should show both addresses
-
Remove the old signer from both nodes:
- On the old sequencer:
clique.propose("0xOLD_SIGNER_ADDRESS", false)
- On the new sequencer:
clique.propose("0xOLD_SIGNER_ADDRESS", false)
-
After two blocks are generated, verify only the new signer remains:
clique.getSigners() // Should show only new signer
Recovering from an infrastructure failure will depend on what components are affected.
For a managed database recovery, we recommend maintaining backups, ideally in an alternate region. If you operate your own database, be sure to take snapshots, and consider backups to alternate cloud providers. We plan to provide further guidance for database recovery in the future.
If your sequencer host goes down:
We recommend having at least one hot standby sequencer to take its place. This sequencer can be configured with different keys than the original sequencer (and be fully synced in case you need to rotate the sequencer keys), but a simple configuration change will allow it to reboot using the original sequencer's keys to immediately resume block production.
If all of your sequencer machines are lost:
You will need either:
- Sync a new full node from genesis (assuming there are full nodes remaining somewhere in your p2p network).
- Repurpose a synced RPC node. "Converting" it to be the sequencer by creating a new sequencer chart that takes over the RPC node's Persistent Volume Claim.
If all full nodes in the network are lost:
If you cannot sync from other network nodes, you will need to sync from L1 data. As of version 0.1.0, this is unsupported, but we plan to add support for this in the near future.
Please reach out to the Scroll team if you need assistance reviewing your recovery plan.
It is important to plan for your incident response and recovery before an incident occurs. Here is a list of potential issues, their implications, and things to consider as a team.
- Symptoms: Delays in block production or finalization
- Impact:
- Write operations may be temporarily unavailable
- Bridge withdrawals may be delayed
- Read operations remain functional
- Response:
- Monitor block production metrics
- Engage backup systems if necessary
- Communicate status to users
- Symptoms: RPC nodes rejecting blocks
- Impact: Chain appears offline for writes
- Response:
- Investigate sequencer logs
- Prepare for potential rollback
- Maintain read-only access
- Symptoms: Proof generation failures
- Response:
- Coordinate with Scroll team
- Potential prover upgrade
- Possible L1 batch revocation
- Prepare for L2 reorg
- Highest Risk Scenario
- Required Actions:
- Immediate escalation to Scroll team
- Potential emergency shutdown
- Review of all recent proofs
- External party verification
- Monitoring: Track gas price anomalies
- Impact Assessment:
- Transaction cost implications
- Potential chain usability issues
- Resolution Steps:
- Oracle parameter adjustment
- Emergency price override if necessary
-
Backup Infrastructure
- Maintain 1-2 fullnodes in alternate regions
- Regular database snapshots
- Off-site backup storage
- Cross-region K8s cluster capability
-
Recovery Procedures
- Sequencer Role Recovery:
- Deploy new sequencer with original keys
- Verify chain sync status
- Resume block production
- Signer Change Process:
- Follow documented key rotation
- Update necessary configurations
- Verify new signer functionality
- Sequencer Role Recovery:
-
Temporary Outages
- Maintain hot standby in alternate region
- Automated DNS failover configuration
- Regular failover testing
- Document recovery procedures
-
Permanent Migration
- Platform-agnostic deployment readiness
- Alternative cloud provider prerequisites:
- Pre-configured K8s clusters
- Network configuration templates
- DNS management strategy
- Migration checklist:
- Sequencer deployment
- RPC node setup
- Database migration
- DNS updates
- Security configuration verification
- Monitor all privileged key usage
- Track gas oracle values for anomalies
- Watch for unusual block proposal patterns
- Monitor bridge activity for suspicious patterns
- Track system resource utilization
- Monitor network latency and availability
- Maintain an up-to-date incident response plan
- Document escalation procedures
- Keep backup RaaS provider details readily available
- Regular testing of recovery procedures
- Maintain communication templates for various scenarios