You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+45-23
Original file line number
Diff line number
Diff line change
@@ -3,9 +3,15 @@ Author: David Meredith + JK
3
3
4
4
[this ascii file is coded in "markdown" and is best viewed in a markdown enabled browser, see https://en.wikipedia.org/wiki/Markdown for more details]
5
5
6
-
This repo contains the service and cron scripts used to run a failover gocdb instance, includes:
7
-
* A Cron script (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the failover DB. This runs separtely from the autoEngageFailover process.
8
-
* A Service script (```gocdb-autofailover.sh```) that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
6
+
This repo contains the service and cron scripts used to run a failover gocdb instance, includes the following dirs:
7
+
* autoEngageFailover/
8
+
* Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
9
+
* importDBdmpFile/
10
+
* Contains a sript that should be invoked by cron hourly (```1_runDbUpdate.sh```) to download and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process.
11
+
* nsupdate_goc/
12
+
* Scripts for switching the DNS to/from the production/failover instance.
13
+
* archiveDmpDownload/
14
+
* Contains a script to download/archive dmp files in a separate process
9
15
10
16
# Packages
11
17
* The following scripts needs to be installed and configuired for your installation:
@@ -15,8 +21,8 @@ This repo contains the service and cron scripts used to run a failover gocdb ins
15
21
|_ gocdb-autofailover.sh# MAIN SERVICE SCRIPT to mon production instance
16
22
|_ engageFailover.sh # Child script, run if prolonged outage is detected
17
23
18
-
importDBdmpFile/ # Cron scripts download/install a .dmp of the prod data
19
-
|_ 1_runDbUpdate.sh # MAIN CRON SCRIPT, invokes scripts below
24
+
importDBdmpFile/ # Scripts download/install a .dmp of the prod data
25
+
|_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below
20
26
|_ ora11gEnvVars.sh # Setup oracle env
21
27
|_ getDump.sh # Download a .dmp of the production data
22
28
|_ dropGocdbUser.sh # Drops the current DB schema
@@ -28,6 +34,10 @@ This repo contains the service and cron scripts used to run a failover gocdb ins
28
34
nsupdate_goc/ # Scripts for switching the DNS to the failover
29
35
|_ goc_failover.sh # Points DNS to failover instance
30
36
|_ goc_production.sh # Points DNS to production instance
37
+
38
+
archiveDmpDownload/ # Contains script to download/archive dmp files in a separate process e.g from cron.daily
39
+
|_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir
40
+
|_ archive/ # Contains archive/dmp files
31
41
```
32
42
33
43
## /root/autoEngageFailover/
@@ -43,8 +53,8 @@ following:
43
53
checked to see that it is current,
44
54
* the hourly cron that downloads the dmp file is stopped (see
45
55
importDBdmpFile below),
46
-
* symbolic links to the server cert/key are updated so they
47
-
point to the 'goc.egi.eu' cert/key
56
+
*<strike>symbolic links to the server cert/key are updated so they
57
+
point to the 'goc.egi.eu' cert/key</strike> (note, no longer needed as cert contains dual SAN)
48
58
* the dnscripts are invoked to change the dns (see
49
59
nsupdate_goc below).
50
60
@@ -72,6 +82,13 @@ the 'goc.egi.eu' top level DNS alias to point to either the
72
82
production instance or the failover.
73
83
74
84
85
+
## /root/archiveDmpDownload/
86
+
Contains a script that downloads the dmp file and stores the file in the archive/ sub-dir.
87
+
The script also deletes archived files that are older than 'x' days.
88
+
This script can be called in a separate process, e.g. from cron.daily to build a
89
+
set of backups.
90
+
91
+
75
92
#Failover Instructions
76
93
* Choose from options 1) 2) 3)
77
94
@@ -81,17 +98,20 @@ instance and engage the failover automatically during prolonged outages
81
98
82
99
Run as a service:
83
100
84
-
```
101
+
```bash
85
102
chkconfig --list | grep gocdb-auto
86
103
/sbin/service gocdb-autofailover stop
87
104
/sbin/service gocdb-autofailover start
88
105
/sbin/service gocdb-autofailover status
106
+
89
107
```
90
108
91
109
Directly (not as a service):
92
-
```
110
+
111
+
```bash
93
112
cd /root/autoEngageFailover
94
113
./gocdb-autofailover.sh {start|stop|restart}
114
+
95
115
```
96
116
97
117
## To manually engage the failover immediately
@@ -116,8 +136,8 @@ Engage the failover now:
116
136
You will need to manually revert the steps executed by the
117
137
failover so the dns points back to the production instance
118
138
and restore/restart the failover process. This includes:
119
-
* restore the symlinks to the goc.dl.ac.uk server cert and key
120
-
(see details below)
139
+
*<strike>restore the symlinks to the goc.dl.ac.uk server cert and key
140
+
(see details below)</strike> (no longer needed as cert contains dual SAN)
121
141
* restore the hourly cron to download the dmp of the DB
122
142
* run nsupdate procedure to repoint 'goc.egi.eu' back to
123
143
'gocdb-base.esc.rl.ac.uk'
@@ -126,16 +146,18 @@ and restore/restart the failover process. This includes:
126
146
127
147
####Restore Walkthrough
128
148
At end of downtime (production instance ready to be restored) first re-point DNS:
129
-
```
149
+
150
+
```bash
130
151
echo We first switch dns to point to production instance
131
152
cd /root/nsupdate_goc
132
153
./goc_production.sh
154
+
133
155
```
134
156
135
157
Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will
136
158
swtich between the failover instance and the production instance. You should monitor this using nsupdate:
137
159
138
-
```
160
+
```bash
139
161
nslookup goc.egi.eu
140
162
# check this returns the following output referring to
141
163
# next.gocdb.eu
@@ -148,21 +170,21 @@ nslookup goc.egi.eu
148
170
After DNS has become stable the production instance will now be serving requests.
149
171
Only after this ~2hr period should we re-start failover service:
150
172
151
-
```
173
+
```bash
152
174
echo First go check production instance and confirm it is up
153
175
echo running ok and that dns is stable
154
176
rm /root/autoEngageFailover/engage.lock
155
177
mv cronRunDbUpdate.sh /etc/cron.hourly
156
178
157
-
echo Change server certificate and key back for goc.dl.ac.uk
0 commit comments