Skip to content

Commit b1f25ec

Browse files
authored
Merge pull request #8 from GOCDB/hartree
Replace old failover references with new failover
2 parents 6b637cd + d2f43df commit b1f25ec

File tree

4 files changed

+104
-106
lines changed

4 files changed

+104
-106
lines changed

README.md

+42-42
Original file line numberDiff line numberDiff line change
@@ -5,59 +5,59 @@ Author: David Meredith + JK
55

66
This repo contains the service and cron scripts used to run a failover gocdb instance, includes the following dirs:
77
* autoEngageFailover/
8-
* Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
8+
* Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
99
* importDBdmpFile/
10-
* Contains a script that should be invoked by cron hourly (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process.
10+
* Contains a script that should be invoked by cron hourly (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process.
1111
* nsupdate_goc/
12-
* Scripts for switching the DNS to/from the production/failover instance.
12+
* Scripts for switching the DNS to/from the production/failover instance.
1313
* archiveDmpDownload/
14-
* Contains a script to download/archive dmp files in a separate process
14+
* Contains a script to download/archive dmp files in a separate process
1515

1616
# Packages
17-
* The following scripts needs to be installed and configuired for your installation:
17+
* The following scripts needs to be installed and configuired for your installation:
1818
```
1919
/root/
2020
autoEngageFailover/ # Scripts to mon the production instance and engage failover
2121
|_ gocdb-autofailover.sh# MAIN SERVICE SCRIPT to mon production instance
2222
|_ engageFailover.sh # Child script, run if prolonged outage is detected
23-
23+
2424
importDBdmpFile/ # Scripts fetch/install a .dmp of the prod data
25-
|_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below
25+
|_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below
2626
|_ ora11gEnvVars.sh # Setup oracle env
27-
|_ getDump.sh # Fetch a .dmp of the production data
27+
|_ getDump.sh # Fetch a .dmp of the production data
2828
|_ dropGocdbUser.sh # Drops the current DB schema
2929
|_ loadData.sh # Load the last successfully fetched DB dmp into the RDBMS
3030
|_ gatherStats.sh # Oracle gathers stats to re-index
3131
|_ pass_file_exemplar.txt # Sample pwd file for DB (rename to pass_file)
3232
3333
nsupdate_goc/ # Scripts for switching the DNS to the failover
3434
|_ goc_failover.sh # Points DNS to failover instance
35-
|_ goc_production.sh # Points DNS to production instance
35+
|_ goc_production.sh # Points DNS to production instance
3636
3737
archiveDmpDownload/ # Contains script to download/archive dmp files in a separate process e.g from cron.daily
38-
|_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir
39-
|_ archive/ # Contains archive/dmp files
38+
|_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir
39+
|_ archive/ # Contains archive/dmp files
4040
```
4141

42-
## /root/autoEngageFailover/
42+
## /root/autoEngageFailover/
4343
Start in this dir. Dir contains the 'gocdb-autofailover.sh'
4444
service script which should be installed as a service in
4545
'/etc/init.d/gocdb-autofailover'. This service invokes
4646
'engageFailover.sh' which monitors the production instance
4747
with a ping-check. If a continued outage is detected;
4848
the script starts the failover procedure which includes the
49-
following:
50-
* the gocdb admins are emailed,
49+
following:
50+
* the gocdb admins are emailed,
5151
* the age of the last successfully imported dmp file is
52-
checked to see that it is current,
52+
checked to see that it is current,
5353
* the hourly cron that fetches the dmp file is stopped (see
54-
importDBdmpFile below),
54+
importDBdmpFile below),
5555
* <strike>symbolic links to the server cert/key are updated so they
56-
point to the 'goc.egi.eu' cert/key</strike> (note, no longer needed as cert contains dual SAN)
56+
point to the 'goc.egi.eu' cert/key</strike> (note, no longer needed as cert contains dual SAN)
5757
* the dnscripts are invoked to change the dns (see
5858
nsupdate_goc below).
5959

60-
## /root/importDBdmpFile/
60+
## /root/importDBdmpFile/
6161
Contains scripts that fetches the .dmp file and install this
6262
dmp file into the local Oracle XE instance. The master script
6363
is '1_runDbUpdate.sh' which needs to be invoked from an hourly
@@ -70,29 +70,29 @@ cron:
7070
/root/importDBdmpFile/1_runDbUpdate.sh
7171
```
7272

73-
You will also need to:
73+
You will also need to:
7474
* generate a public/private key pair using `ssh-keygen` and ensure the public
7575
key is present on the host with the database dmp file.
7676
* populate `importDBdmpFile/failover_TEMPLATE.sh` with
7777
appropriate values and copy it to `/etc/gocdb/failover.sh`
78-
78+
7979
## /root/nsupdate_goc/
8080
Contains the nsupdate keys and nsupdate scripts for switching
8181
the 'goc.egi.eu' top level DNS alias to point to either the
82-
production instance or the failover.
82+
production instance or the failover.
8383

8484

8585
## /root/archiveDmpDownload/
8686
Contains a script that downloads the dmp file and stores the file in the archive/ sub-dir.
87-
The script also deletes archived files that are older than 'x' days.
88-
This script can be called in a separate process, e.g. from cron.daily to build a
89-
set of backups.
87+
The script also deletes archived files that are older than 'x' days.
88+
This script can be called in a separate process, e.g. from cron.daily to build a
89+
set of backups.
9090

9191

92-
#Failover Instructions
92+
#Failover Instructions
9393
* Choose from options 1) 2) 3)
9494

95-
## To start/stop the auto failover service
95+
## To start/stop the auto failover service
9696
This will continuously monitor the production
9797
instance and engage the failover automatically during prolonged outages
9898

@@ -105,24 +105,24 @@ chkconfig --list | grep gocdb-auto
105105
/sbin/service gocdb-autofailover status
106106

107107
```
108-
109-
Directly (not as a service):
108+
109+
Directly (not as a service):
110110

111111
```bash
112112
cd /root/autoEngageFailover
113113
./gocdb-autofailover.sh {start|stop|restart}
114114

115115
```
116116

117-
## To manually engage the failover immediately
117+
## To manually engage the failover immediately
118118
E.g. for known/scheduled outages, run the following
119119
passing 'now' as the first command-line argument:
120120

121-
Stop the service:
121+
Stop the service:
122122
```
123123
service gocdb-autofailover stop
124124
```
125-
Or to stop if running manually:
125+
Or to stop if running manually:
126126
```
127127
cd /root/autoEngageFailover
128128
./gocdb-autofailover.sh stop
@@ -136,16 +136,16 @@ Engage the failover now:
136136
You will need to manually revert the steps executed by the
137137
failover so the dns points back to the production instance
138138
and restore/restart the failover process. This includes:
139-
* <strike>restore the symlinks to the goc.dl.ac.uk server cert and key
140-
(see details below)</strike> (no longer needed as cert contains dual SAN)
139+
* <strike>restore the symlinks to the gocdb.hartree.stfc.ac.uk server cert and key
140+
(see details below)</strike> (no longer needed as cert contains dual SAN)
141141
* restore the hourly cron to fetch the dmp of the DB
142142
* run nsupdate procedure to repoint 'goc.egi.eu' back to
143143
'gocdb-base.esc.rl.ac.uk'
144-
MUST read /root/nsupdate_goc/nsupdateReadme.txt.
144+
MUST read /root/nsupdate_goc/nsupdateReadme.txt.
145145
* restart the failover service
146146

147147
####Restore Walkthrough
148-
At end of downtime (production instance ready to be restored) first re-point DNS:
148+
At end of downtime (production instance ready to be restored) first re-point DNS:
149149

150150
```bash
151151
echo We first switch dns to point to production instance
@@ -154,7 +154,7 @@ cd /root/nsupdate_goc
154154

155155
```
156156

157-
Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will
157+
Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will
158158
swtich between the failover instance and the production instance. You should monitor this using nsupdate:
159159

160160
```bash
@@ -167,7 +167,7 @@ nslookup goc.egi.eu
167167
Address: 130.246.143.160
168168
```
169169

170-
After DNS has become stable the production instance will now be serving requests.
170+
After DNS has become stable the production instance will now be serving requests.
171171
Only after this ~2hr period should we re-start failover service:
172172

173173
```bash
@@ -177,14 +177,14 @@ rm /root/autoEngageFailover/engage.lock
177177
mv cronRunDbUpdate.sh /etc/cron.hourly
178178

179179
# Below server cert change no longer needed as cert contains dual SAN
180-
# This means a server restart is no longer needed.
181-
#echo Change server certificate and key back for goc.dl.ac.uk
182-
#ln -sf /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
183-
#ln -sf /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem
180+
# This means a server restart is no longer needed.
181+
#echo Change server certificate and key back for gocdb.hartree.stfc.ac.uk
182+
#ln -sf /etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
183+
#ln -sf /etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem /etc/grid-security/hostcert.pem
184184
#service httpd restart
185185
#service gocdb-autofailover start
186186
#service gocdb-autofailover status
187-
# gocdb-autofailover is running...
187+
# gocdb-autofailover is running...
188188
```
189189

190190
Now check the '/root/autoEngageFailover/pingCheckLog.txt' and

0 commit comments

Comments
 (0)