@@ -8,7 +8,14 @@ Title: Storage migration
8
8
- [ But we have storage\_ mux.ml] ( #but-we-have-storage_muxml )
9
9
- [ Thought experiments on an alternative design] ( #thought-experiments-on-an-alternative-design )
10
10
- [ Design] ( #design )
11
- - [ SMAPIv1 Migration] ( #smapiv1-migration )
11
+ - [ SMAPIv1 migration] ( #smapiv1-migration )
12
+ - [ SMAPIv3 migration] ( #smapiv3-migration )
13
+ - [ Error Handling] ( #error-handling )
14
+ - [ Preparation (SMAPIv1 and SMAPIv3)] ( #preparation-smapiv1-and-smapiv3 )
15
+ - [ Snapshot and mirror failure (SMAPIv1)] ( #snapshot-and-mirror-failure-smapiv1 )
16
+ - [ Mirror failure (SMAPIv3)] ( #mirror-failure-smapiv3 )
17
+ - [ Copy failure (SMAPIv1)] ( #copy-failure-smapiv1 )
18
+ - [ SMAPIv1 Migration implementation detail] ( #smapiv1-migration-implementation-detail )
12
19
- [ Receiving SXM] ( #receiving-sxm )
13
20
- [ Xapi code] ( #xapi-code )
14
21
- [ Storage code] ( #storage-code )
@@ -113,8 +120,100 @@ Note that later on storage_smapi{v1,v3}_migrate.ml will still have the flexibili
113
120
to call remote SMAPIv2 functions, such as ` Remote.VDI.attach dest_sr vdi ` , and
114
121
it will be handled just as before.
115
122
123
+ ## SMAPIv1 migration
116
124
117
- ## SMAPIv1 Migration
125
+ At a high level, mirror establishment for SMAPIv1 works as follows:
126
+
127
+ 1 . Take a snapshot of a VDI that is attached to VM1. This gives us an immutable
128
+ copy of the current state of the VDI, with all the data until the point we took
129
+ the snapshot. This is illustrated in the diagram as a VDI and its snapshot connecting
130
+ to a shared parent, which stores the shared content for the snapshot and the writable
131
+ VDI from which we took the snapshot (snapshot)
132
+ 2 . Mirror the writable VDI to the server hosts: this means that all writes that goes to the
133
+ client VDI will also be written to the mirrored VDI on the remote host (mirror)
134
+ 3 . Copy the immutable snapshot from our local host to the remote (copy)
135
+ 4 . Compose the mirror and the snapshot to form a single VDI
136
+ 5 . Destroy the snapshot on the local host (cleanup)
137
+
138
+
139
+ more detail to come...
140
+
141
+ ## SMAPIv3 migration
142
+
143
+ More detail to come...
144
+
145
+ ## Error Handling
146
+
147
+ Storage migration is a long-running process, and is prone to failures in each
148
+ step. Hence it is important specifying what errors could be raised at each step
149
+ and their significance. This is beneficial both for the user and for triaging.
150
+
151
+ There are two general cleanup functions in SXM: ` MIRROR.receive_cancel ` and
152
+ ` MIRROR.stop ` . The former is for cleaning up whatever has been created by ` MIRROR.receive_start `
153
+ on the destination host (such as VDIs for receiving mirrored data). The latter is
154
+ a more comprehensive function that attempts to "undo" all the side effects that
155
+ was done during the SXM, and also calls ` receive_cancel ` as part of its operations.
156
+
157
+ Currently error handling was done by building up a list of cleanup functions in
158
+ the ` on_fail ` list ref as the function executes. For example, if the ` receive_start `
159
+ has been completed successfully, add ` receive_cancel ` to the list of cleanup functions.
160
+ And whenever an exception is encountered, just execute whatever has been added
161
+ to the ` on_fail ` list ref. This is convenient, but does entangle all the error
162
+ handling logic with the core SXM logic itself, making the code rather than hard
163
+ to understand and maintain.
164
+
165
+ The idea to fix this is to introduce explicit "stages" during the SXM and define
166
+ explicitly what error handling should be done if it fails at a certain stage. This
167
+ helps separate the error handling logic into the ` with ` part of a ` try with ` block,
168
+ which is where they are supposed to be. Since we need to accommodate the existing
169
+ SMAPIv1 migration (which has more stages than SMAPIv3), the following stages are
170
+ introduced: preparation (v1,v3), snapshot(v1), mirror(v1, v3), copy(v1). Note that
171
+ each stage also roughly corresponds to a helper function that is called within ` MIRROR.start ` ,
172
+ which is the wrapper function that initiates storage migration. And each helper
173
+ functions themselves would also have error handling logic within themselves as
174
+ needed (e.g. see `Storage_smapiv1_migrate.receive_start) to deal with exceptions
175
+ that happen within each helper functions.
176
+
177
+ ### Preparation (SMAPIv1 and SMAPIv3)
178
+
179
+ The preparation stage generally corresponds to what is done in ` receive_start ` , and
180
+ this function itself will handle exceptions when there are partial failures within
181
+ the function itself, such as an exception after the receiving VDI is created.
182
+ It will use the old-style ` on_fail ` function but only with a limited scope.
183
+
184
+ There is nothing to be done at a higher level (i.e within ` MIRROR.start ` which
185
+ calls ` receive_start ` ) if preparation has failed.
186
+
187
+ ### Snapshot and mirror failure (SMAPIv1)
188
+
189
+ For SMAPIv1, the mirror is done in a bit cumbersome way. The end goal is to establish
190
+ connections between two tapdisk processes on the source and destination hosts.
191
+ To achieve this goal, xapi will do two main jobs: 1. create a connection between two
192
+ hosts and pass the connection to tapdisk; 2. create a snapshot as a starting point
193
+ of the mirroring process.
194
+
195
+ Therefore handling of failures at these two stages are similar: clean up what was
196
+ done in the preparation stage by calling ` receive_cancel ` , and that is almost it.
197
+ Again, we will leave whatever is needed for partial failure handling within those
198
+ functions themselves and only clean up at a stage-level in ` storage_migrate.ml `
199
+
200
+ Note that ` receive_cancel ` is a multiplexed function for SMAPIv1 and SMAPIv3, which
201
+ means different clean up logic will be executed depending on what type of SR we
202
+ are migrating from.
203
+
204
+ ### Mirror failure (SMAPIv3)
205
+
206
+ To be filled...
207
+
208
+ ### Copy failure (SMAPIv1)
209
+
210
+ The final step of storage migration for SMAPIv1 is to copy the snapshot from the
211
+ source to the destination. At this stage, most of the side effectful work has been
212
+ done, so we do need to call ` MIRROR.stop ` to clean things up if we experience an
213
+ failure during copying.
214
+
215
+
216
+ ## SMAPIv1 Migration implementation detail
118
217
119
218
``` mermaid
120
219
sequenceDiagram
@@ -1877,3 +1976,4 @@ let pre_deactivate_hook ~dbg ~dp ~sr ~vdi =
1877
1976
s.failed <- true
1878
1977
)
1879
1978
```
1979
+
0 commit comments