Skip to content

Commit 29f421c

Browse files
committed
Added doc/guides/rollback.md
1 parent 38af795 commit 29f421c

File tree

2 files changed

+156
-6
lines changed

2 files changed

+156
-6
lines changed

README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ The Pocket Core application will allow anyone to spin up a Pocket Network full n
4646
- [Contributing](#contributing)
4747
- [Seeds (MainNet \& TestNet)](#seeds-mainnet--testnet)
4848
- [Docker Image](#docker-image)
49+
- [Rollback Recovery Process](#rollback-recovery-process)
4950
- [Support \& Contact](#support--contact)
50-
- [GPokT](#gpokt)
5151
- [License](#license)
5252

5353
## Installation
@@ -102,7 +102,6 @@ For more detailed command information, see the [usage section](doc/specs/cli/).
102102
## Documentation
103103
104104
- Visit [our user documentation](https://docs.pokt.network) for tutorials and technical information on the Pocket Network.
105-
- Visit [pocket-core-doc](https://github.com/pokt-network/pocket-core-doc) repo for operations such as protocol upgrade, chainhalt recovery, etc.
106105
107106
## Portal
108107
@@ -154,6 +153,10 @@ The latest image can be pulled like so:
154153
docker pull ghcr.io/pokt-network/pocket-v0:latest
155154
```
156155
156+
## Rollback Recovery Process
157+
158+
An example of a rollback recovery process can be found [here](doc/guides/rollback.md).
159+
157160
## Support & Contact
158161
159162
<div>
@@ -162,10 +165,6 @@ docker pull ghcr.io/pokt-network/pocket-v0:latest
162165
<a href="https://research.pokt.network"><img src="https://img.shields.io/discourse/https/research.pokt.network/posts.svg"></a>
163166
</div>
164167
165-
### GPokT
166-
167-
You can also use our chatbot, [GPokT](https://gpoktn.streamlit.app), to ask questions about Pocket Network. As of updating this documentation, please note that it may require you to provide your own LLM API token. If the deployed version of GPokT is down, you can deploy your own version by following the instructions [here](https://github.com/pokt-network/gpokt).
168-
169168
## License
170169
171170
This project is licensed under the MIT License; see the [LICENSE.md](LICENSE.md) file for details

doc/guides/rollback.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# [WIP] Chain Halt Rollback Recovery Guide <!-- omit in toc -->
2+
3+
:::warning
4+
5+
This rollback guide is a WIP.
6+
7+
Is is a port of this [notion doc](https://www.notion.so/Recovery-guide-641ec21aead74cae806166f5f9e61394)
8+
by @msmania while managing a chain halt on a previous test and should only
9+
be treated as a reference, not a definitive general purpose guide.
10+
11+
:::
12+
13+
- [Issue description](#issue-description)
14+
- [Hotfix](#hotfix)
15+
- [How to check if my data is correct or not](#how-to-check-if-my-data-is-correct-or-not)
16+
- [Recovery steps](#recovery-steps)
17+
18+
## Issue description
19+
20+
[The test scenario](https://www.notion.so/RC-0-11-0-Release-Plan-a76509e73f854d0b8ca91ea62f52ca9e?pvs=21) set a new reward
21+
delegator set on `128746` to the node `5b18a8c268ffcbf0530f61f70e3ee14f064bdf0f`, and updated it with another delegator set on `128747` as below.
22+
23+
```text
24+
Address: 5b18a8c268ffcbf0530f61f70e3ee14f064bdf0f
25+
Public Key: 0b787e54e66b3db3a3396c2322d8314287e08990f7696c490153b078bada7e94
26+
Jailed: false
27+
Status: Staked
28+
Tokens: 18000000000
29+
ServiceUrl: https://poktt1698102386.c0d3r.org:443
30+
Chains: [0001 0002 005A 005B 005C 005D]
31+
Unstaking Completion Time: 0001-01-01 00:00:00 +0000 UTC
32+
Output Address: 42846261e1798fc08e1dfd97325af7b280f815b0
33+
Reward Delegators: {"54751ae3431c015a6e24d711c9d1ed4e5a276479":20,"8147ed5182da6e7dea33f36d78db6327f9df6ba0":10}
34+
```
35+
36+
Since the marshaling order of a map is not deterministic, the `RewardDelegators` field is considered as either `{"54751ae3431c015a6e24d711c9d1ed4e5a276479": 20, "8147ed5182da6e7dea33f36d78db6327f9df6ba0": 10}` or `{"8147ed5182da6e7dea33f36d78db6327f9df6ba0": 10, "54751ae3431c015a6e24d711c9d1ed4e5a276479": 20}`and this may fork the world state into two versions depending on the order of its fields, `54751ae3` comes first or `8147ed51`comes first.
37+
38+
During the upgrade, the network was stuck in round 13 on 128748. The problem is the proposer (validator1; 77E608D8AE4CD7B812F122DC82537E79DD3565CB) proposed a block on top of block 128747 where apphash was 624553DE, but the majority of the validators have a block 128747 where apphash is c407eb25.
39+
40+
All nodes in the network, including non-validators and seeds, can be stuck due to this bug. It explains why we had peer connection issues so often during upgrade.
41+
42+
## Hotfix
43+
44+
The fix is to sort the `RewardDelegators` field when marshaling. We decided to use [protobuf’s standard marshaler](https://pkg.go.dev/github.com/gogo/protobuf/plugin/marshalto?utm_source=godoc) to marshal a validator, which adopts the reverse alphabetical order.
45+
46+
https://github.com/pokt-network/pocket-core/pull/1591
47+
48+
## How to check if my data is correct or not
49+
50+
During the upgrade, the world state of Testnet was forked into two versions. With the fix, on block 128746, the delegator address `8147ed51` should have come first, which means AppHash 84cbe5d0 is correct. In the actual blockchain, however, the other incorrect one was chosen. Therefore we need to roll back Testnet.
51+
52+
I created a small tool https://github.com/msmania/pocket-appdb-parser.git to print the latest AppHash in application.db. Here are the steps to see the state of your node.
53+
54+
1. Clone and build a tool
55+
56+
```bash
57+
git clone https://github.com/msmania/pocket-appdb-parser.git
58+
cd pocket-appdb-parser
59+
go build -o pocket-appdb-parser .
60+
```
61+
62+
1. Stop the pocket and run the following command.
63+
You cannot run it when pocket is running. GoLevelDB does not allow access from multiple processes.
64+
65+
```bash
66+
./pocket-appdb-parser <path to application.db>
67+
```
68+
69+
1. If a node is on 128747, the output will be either
70+
71+
1. 128747: c407eb25 (wrong)
72+
73+
```text
74+
128747: c407eb25c5e5192a67514132838217741011991625e410e7f16233dad7d8705c
75+
main:128747 ea6e1849b4bf587027401a8f105901b134f8be94d7ecab2ea170c7b5d96e4cf5
76+
pocketcore:128747 85c3aae78d51de66ca88a2f7d61c88a6a076013f2ca385eeee820e5d1bca2859
77+
auth:128747 5faa7669ef6aa9d393a584e03041c42772cf43ccf861326ca2a70544c97ca844
78+
pos:128747 87bc3da27011f645ae3e856e44cbec2a1691ebe2b0b3f98464d5c184a57265ac
79+
application:128747 2db9f5e3c2aa8fa064eb284beee5189e66344e48cb06f51b53d984af0ec2dbe7
80+
gov:128747
81+
params:128747 3c77022618e6d32441a3de1d22092f3d2e5a4221ea569cca7a7adfa22d08131c
82+
```
83+
84+
2. 128747: 624553de (wrong)
85+
86+
```text
87+
128747: 624553de014c6546f56167b47f4d92e46f72f18ae2e08e3ae254981f9914c95e
88+
gov:128747
89+
application:128747 2db9f5e3c2aa8fa064eb284beee5189e66344e48cb06f51b53d984af0ec2dbe7
90+
params:128747 3c77022618e6d32441a3de1d22092f3d2e5a4221ea569cca7a7adfa22d08131c
91+
pos:128747 56114cdc51d8217075255106c4f067e1c117d1cc3876d3da3fbb9e1a2d6689f7
92+
auth:128747 5faa7669ef6aa9d393a584e03041c42772cf43ccf861326ca2a70544c97ca844
93+
pocketcore:128747 85c3aae78d51de66ca88a2f7d61c88a6a076013f2ca385eeee820e5d1bca2859
94+
main:128747 ea6e1849b4bf587027401a8f105901b134f8be94d7ecab2ea170c7b5d96e4cf5
95+
```
96+
97+
1. If a node is on 128746, the output will be either
98+
99+
1. 128746: 84cbe5d0 (correct; no need to resync)
100+
101+
```
102+
128746: 84cbe5d012fbd52c34775351d56762afb738888d33cfb80c96d84900ed3f3a82
103+
pocketcore:128746 3ecb0f4b97339e0918a57902b48b46ddbb1e5f221b905cbb09349e830ce64f21
104+
params:128746 3c77022618e6d32441a3de1d22092f3d2e5a4221ea569cca7a7adfa22d08131c
105+
pos:128746 9b6de08f1d3f4eb724fe3f9ee04dd7116103060da8755088fba8582914fc4e67
106+
application:128746 2db9f5e3c2aa8fa064eb284beee5189e66344e48cb06f51b53d984af0ec2dbe7
107+
gov:128746
108+
auth:128746 3c5fea92e0ec2846a21acc4faee6e78d7b6ff4ef9303c60b5152ebcee1216b3d
109+
main:128746 ea6e1849b4bf587027401a8f105901b134f8be94d7ecab2ea170c7b5d96e4cf5
110+
```
111+
112+
2. 128746: dca3d2fb (wrong)
113+
114+
```
115+
128746: dca3d2fb8848e6915b3745bd4db22003cfce09659436b39d289e9e5d51cabbc5
116+
pocketcore:128746 3ecb0f4b97339e0918a57902b48b46ddbb1e5f221b905cbb09349e830ce64f21
117+
gov:128746
118+
auth:128746 3c5fea92e0ec2846a21acc4faee6e78d7b6ff4ef9303c60b5152ebcee1216b3d
119+
pos:128746 a8b7de0316b42f13e0e39d872301ad7139aec20ec84269e354e9d65866784c7c
120+
application:128746 2db9f5e3c2aa8fa064eb284beee5189e66344e48cb06f51b53d984af0ec2dbe7
121+
main:128746 ea6e1849b4bf587027401a8f105901b134f8be94d7ecab2ea170c7b5d96e4cf5
122+
params:128746 3c77022618e6d32441a3de1d22092f3d2e5a4221ea569cca7a7adfa22d08131c
123+
```
124+
125+
## Recovery steps
126+
127+
Everyone needs to run the patched version on **all nodes, not only validators but also non-validators like servicers and seeds**.
128+
129+
The commands vary depending on your environment.
130+
131+
1. Stop the pocket
132+
133+
```bash
134+
sudo systemctl stop pocket
135+
```
136+
137+
2. Upgrade the binary
138+
139+
```bash
140+
cd <path to pocket_core repo>
141+
git pull origin staging
142+
go build -o <path to pocket> app/cmd/pocket_core/main.go
143+
```
144+
145+
3. Apply the snapshot https://link.storjshare.io/s/jxzmjzjz4dzkalgwxlyxzzuzb6sa/pocket-snapshots/[email protected] to all managed nodes
146+
1. Managed nodes (seeds and validators) need to be isolated..?
147+
4. Start the pocket and pray
148+
149+
```bash
150+
sudo systemctl start pocket
151+
```

0 commit comments

Comments
 (0)