Skip to content

Commit 2c78197

Browse files
committed
Comment out swap server cert and swap dns script calls
1 parent 286c73a commit 2c78197

File tree

4 files changed

+126
-70
lines changed

4 files changed

+126
-70
lines changed

README.md

+45-23
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,15 @@ Author: David Meredith + JK
33

44
[this ascii file is coded in "markdown" and is best viewed in a markdown enabled browser, see https://en.wikipedia.org/wiki/Markdown for more details]
55

6-
This repo contains the service and cron scripts used to run a failover gocdb instance, includes:
7-
* A Cron script (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the failover DB. This runs separtely from the autoEngageFailover process.
8-
* A Service script (```gocdb-autofailover.sh```) that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
6+
This repo contains the service and cron scripts used to run a failover gocdb instance, includes the following dirs:
7+
* autoEngageFailover/
8+
* Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
9+
* importDBdmpFile/
10+
* Contains a sript that should be invoked by cron hourly (```1_runDbUpdate.sh```) to download and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process.
11+
* nsupdate_goc/
12+
* Scripts for switching the DNS to/from the production/failover instance.
13+
* archiveDmpDownload/
14+
* Contains a script to download/archive dmp files in a separate process
915

1016
# Packages
1117
* The following scripts needs to be installed and configuired for your installation:
@@ -15,8 +21,8 @@ This repo contains the service and cron scripts used to run a failover gocdb ins
1521
|_ gocdb-autofailover.sh# MAIN SERVICE SCRIPT to mon production instance
1622
|_ engageFailover.sh # Child script, run if prolonged outage is detected
1723
18-
importDBdmpFile/ # Cron scripts download/install a .dmp of the prod data
19-
|_ 1_runDbUpdate.sh # MAIN CRON SCRIPT, invokes scripts below
24+
importDBdmpFile/ # Scripts download/install a .dmp of the prod data
25+
|_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below
2026
|_ ora11gEnvVars.sh # Setup oracle env
2127
|_ getDump.sh # Download a .dmp of the production data
2228
|_ dropGocdbUser.sh # Drops the current DB schema
@@ -28,6 +34,10 @@ This repo contains the service and cron scripts used to run a failover gocdb ins
2834
nsupdate_goc/ # Scripts for switching the DNS to the failover
2935
|_ goc_failover.sh # Points DNS to failover instance
3036
|_ goc_production.sh # Points DNS to production instance
37+
38+
archiveDmpDownload/ # Contains script to download/archive dmp files in a separate process e.g from cron.daily
39+
|_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir
40+
|_ archive/ # Contains archive/dmp files
3141
```
3242

3343
## /root/autoEngageFailover/
@@ -43,8 +53,8 @@ following:
4353
checked to see that it is current,
4454
* the hourly cron that downloads the dmp file is stopped (see
4555
importDBdmpFile below),
46-
* symbolic links to the server cert/key are updated so they
47-
point to the 'goc.egi.eu' cert/key
56+
* <strike>symbolic links to the server cert/key are updated so they
57+
point to the 'goc.egi.eu' cert/key</strike> (note, no longer needed as cert contains dual SAN)
4858
* the dnscripts are invoked to change the dns (see
4959
nsupdate_goc below).
5060

@@ -72,6 +82,13 @@ the 'goc.egi.eu' top level DNS alias to point to either the
7282
production instance or the failover.
7383

7484

85+
## /root/archiveDmpDownload/
86+
Contains a script that downloads the dmp file and stores the file in the archive/ sub-dir.
87+
The script also deletes archived files that are older than 'x' days.
88+
This script can be called in a separate process, e.g. from cron.daily to build a
89+
set of backups.
90+
91+
7592
#Failover Instructions
7693
* Choose from options 1) 2) 3)
7794

@@ -81,17 +98,20 @@ instance and engage the failover automatically during prolonged outages
8198

8299
Run as a service:
83100

84-
```
101+
```bash
85102
chkconfig --list | grep gocdb-auto
86103
/sbin/service gocdb-autofailover stop
87104
/sbin/service gocdb-autofailover start
88105
/sbin/service gocdb-autofailover status
106+
89107
```
90108

91109
Directly (not as a service):
92-
```
110+
111+
```bash
93112
cd /root/autoEngageFailover
94113
./gocdb-autofailover.sh {start|stop|restart}
114+
95115
```
96116

97117
## To manually engage the failover immediately
@@ -116,8 +136,8 @@ Engage the failover now:
116136
You will need to manually revert the steps executed by the
117137
failover so the dns points back to the production instance
118138
and restore/restart the failover process. This includes:
119-
* restore the symlinks to the goc.dl.ac.uk server cert and key
120-
(see details below)
139+
* <strike>restore the symlinks to the goc.dl.ac.uk server cert and key
140+
(see details below)</strike> (no longer needed as cert contains dual SAN)
121141
* restore the hourly cron to download the dmp of the DB
122142
* run nsupdate procedure to repoint 'goc.egi.eu' back to
123143
'gocdb-base.esc.rl.ac.uk'
@@ -126,16 +146,18 @@ and restore/restart the failover process. This includes:
126146

127147
####Restore Walkthrough
128148
At end of downtime (production instance ready to be restored) first re-point DNS:
129-
```
149+
150+
```bash
130151
echo We first switch dns to point to production instance
131152
cd /root/nsupdate_goc
132153
./goc_production.sh
154+
133155
```
134156

135157
Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will
136158
swtich between the failover instance and the production instance. You should monitor this using nsupdate:
137159

138-
```
160+
```bash
139161
nslookup goc.egi.eu
140162
# check this returns the following output referring to
141163
# next.gocdb.eu
@@ -148,21 +170,21 @@ nslookup goc.egi.eu
148170
After DNS has become stable the production instance will now be serving requests.
149171
Only after this ~2hr period should we re-start failover service:
150172

151-
```
173+
```bash
152174
echo First go check production instance and confirm it is up
153175
echo running ok and that dns is stable
154176
rm /root/autoEngageFailover/engage.lock
155177
mv cronRunDbUpdate.sh /etc/cron.hourly
156178

157-
echo Change server certificate and key back for goc.dl.ac.uk
158-
ln -sf /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
159-
160-
ln -sf /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem
161-
162-
service httpd restart
163-
service gocdb-autofailover start
164-
service gocdb-autofailover status
165-
gocdb-autofailover is running...
179+
# Below server cert change no longer needed as cert contains dual SAN
180+
# This means a server restart is no longer needed.
181+
#echo Change server certificate and key back for goc.dl.ac.uk
182+
#ln -sf /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
183+
#ln -sf /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem
184+
#service httpd restart
185+
#service gocdb-autofailover start
186+
#service gocdb-autofailover status
187+
# gocdb-autofailover is running...
166188
```
167189

168190
Now check the '/root/autoEngageFailover/pingCheckLog.txt' and

autoEngageFailover/engageFailover.sh

+34-46
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ updateLog=/root/autoEngageFailover/pingCheckLog.txt
2222
errorEngageFailoverLog=/root/autoEngageFailover/errorEngageFailoverLog.txt
2323
lockFile=/root/autoEngageFailover/engage.lock
2424

25+
# Dir containing the import DB scripts and log file
26+
importDBdmpFile=/root/importDBdmpFile
27+
2528
# maintainthe current fail count
2629
failcount=0
2730

@@ -173,7 +176,8 @@ email "Detected successive failures. Attempting to engage the failover - please
173176

174177

175178
# - Test that the last goc.dmp imported ok by parsing /root/importDBdmpFile/updateLog.txt
176-
cd /root/importDBdmpFile
179+
#cd /root/importDBdmpFile
180+
cd $importDBdmpFile
177181
if [ "$(tail -1 ./updateLog.txt)" != "completed ok" ]; then
178182
errorLogger "Last import of dmp file did not complete ok, exiting auto-failover early before cert and dns switch "
179183
exit 1
@@ -186,54 +190,38 @@ mv /etc/cron.hourly/cronRunDbUpdate.sh /root
186190

187191
errorLogger "Swapping server certs"
188192

189-
# - Swap server cert
190-
unlink /etc/grid-security/hostcert.pem
191-
unlink /etc/pki/tls/private/hostkey.pem
192-
ln -s /etc/grid-security/goc.egi.eu.cert.pem /etc/grid-security/hostcert.pem
193-
ln -s /etc/pki/tls/private/goc.egi.eu.key.pem /etc/pki/tls/private/hostkey.pem
194-
# note, after the main instance has been restored, you will need to revert this swap:
195-
# ln -s /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
196-
# ln -s /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem
193+
## - Swap server cert
194+
## Not needed e.g. if your server cert has a dual SAN
195+
#unlink /etc/grid-security/hostcert.pem
196+
#unlink /etc/pki/tls/private/hostkey.pem
197+
#ln -s /etc/grid-security/goc.egi.eu.cert.pem /etc/grid-security/hostcert.pem
198+
#ln -s /etc/pki/tls/private/goc.egi.eu.key.pem /etc/pki/tls/private/hostkey.pem
199+
## note, after the main instance has been restored, you will need to revert this swap:
200+
## ln -s /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
201+
## ln -s /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem
197202

198-
errorLogger "After server cert swap"
199-
cd /root/nsupdate_goc
200-
201-
# Old nsupdate scripts when domain was hosted at nikhef
202-
# Run 1st nsupdate script to delete the goc.egi.eu domain
203-
#errorLogger "Running 1st nsupdate script"
204-
#deleteGocDomainOut=$(./nsupdate-goc.sh ./nsupdateLiveGocFiles/delete_goc.egi.eu 2>&1)
205-
#deleteGocDomainCode=$?
206-
#errorLogger "$deleteGocDomainOut"
207-
#if [ $deleteGocDomainCode != 0 ]; then
208-
# errorLogger "Deleting GOC Domain failed: $deleteGocDomainCode"
209-
# exit $deleteGocDomainCode
210-
#fi
203+
#errorLogger "After server cert swap"
211204

212-
# Run 2nd nsupdate script to point goc.egi.eu to goc.dl.ac.uk
213-
#errorLogger "Running 2nd nsupdate script"
214-
#addGocDomainOut=$(./nsupdate-goc.sh ./nsupdateLiveGocFiles/point_goc.egi.eu_To_goc.dl.ac.uk 2>&1)
215-
#addGocDomainCode=$?
216-
#errorLogger "$addGocDomainCode"
217-
#if [ $addGocDomainCode != 0 ]; then
218-
# errorLogger "Adding GOC Domain failed: $addGocDomainCode"
219-
# exit $addGocDomainCode
220-
#fi
221-
222-
errorLogger "Running nsupdate script"
223-
nsupdateOut=$(./goc_failover.sh 2>&1)
224-
nsupdateCode=$?
225-
errorLogger "nspdateCode was: $nsupdateCode"
226-
errorLogger "nsupdateOut was: $nsupdateOut"
227-
if [ $nsupdateCode != 0 ]; then
228-
errorLogger "nsupdate Failed"
229-
# dont need to exit, restart apache won't hurt anyway
230-
#exit $nsupdateCode
231-
fi
232205

233-
234-
# Restart apache
235-
errorLogger "Restarting apache"
236-
service httpd restart
206+
#errorLogger "Changing DNS"
207+
#cd /root/nsupdate_goc
208+
#
209+
#
210+
#errorLogger "Running nsupdate script"
211+
#nsupdateOut=$(./goc_failover.sh 2>&1)
212+
#nsupdateCode=$?
213+
#errorLogger "nspdateCode was: $nsupdateCode"
214+
#errorLogger "nsupdateOut was: $nsupdateOut"
215+
#if [ $nsupdateCode != 0 ]; then
216+
# errorLogger "nsupdate Failed"
217+
# # dont need to exit, restart apache won't hurt anyway
218+
# #exit $nsupdateCode
219+
#fi
220+
#
221+
#
222+
## Restart apache
223+
#errorLogger "Restarting apache"
224+
#service httpd restart
237225

238226

239227
# Finally create the lockFile to indicate the failover ran ok

importDBdmpFile/1_runDbUpdate.sh

+4-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ fi
5454

5555
# Drop gocdb user (gocdb5 user is recreated when doing the impdb)
5656
####################################
57-
dropGocdbUserOutput=`./dropGocdbUser.sh 2>&1`
57+
dropGocdbUserOutput=`./dropGocdbUser2.sh 2>&1`
5858
dropGocdbExitCode=$?
5959
#echo $dropGocdbUserOutput
6060
if [ $dropGocdbExitCode != 0 ]; then
@@ -138,6 +138,9 @@ fi
138138
# Create a copy of the last successfully imported dmp file and
139139
# store this in the lastImportedDmpFile dir with the time and date appended
140140
# to the file name
141+
if [ ! -d lastImportedDmpFile ]; then
142+
mkdir lastImportedDmpFile
143+
fi
141144

142145
# cd into the dir (this directory must exist)
143146
cd lastImportedDmpFile

importDBdmpFile/dropGocdbUser2.sh

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
#!/bin/bash
2+
3+
#service oracle-xe restart
4+
5+
# Script below will drop the gocdb5 user only if it exists
6+
# rather than using a simple drop in the script: drop user gocdb5 cascade;
7+
#
8+
# We drop the user because impdb will create the user for us (most DBAs let the impdb
9+
# create the user - note, we need to use the imported user's pw).
10+
# If the user already exists when we call impdb we get an error reporting the user already
11+
# exists as described at:
12+
# http://www.dba-oracle.com/t_ora_31684_import_impdp.htm
13+
# Or, we can use 'exclude=user' in the impdb command as described at same link.
14+
#
15+
# http://www.oracle-base.com/articles/misc/oracle-shell-scripting.php
16+
#
17+
# Here doc: The minus sign after the << will ignore tab characters (Note: not all whitespace!)
18+
# at the start of a line, so you can indent your data to increase the readability
19+
#
20+
# RESULT=$(sqlplus system/XXXXXXX <<- ENDSQL
21+
22+
# PASSFILE is in "parfile" format for impdp
23+
PASSFILE=/root/gocdb-failover-scripts/importDBdmpFile/pass_file
24+
USER_PASS=$(cat "${PASSFILE}" | grep "^userid=" | cut -d "=" -f 2)
25+
26+
# Need to exit gracefully if pass_file doesn't exist or
27+
# above filter fails
28+
29+
# /nolog runs sqlplus without connecting to anything
30+
# Having CONNECT in script keeps ${USER_PASS} off the commandline
31+
# And hence not visible to "ps"
32+
RESULT=$(sqlplus /nolog <<- ENDSQL
33+
CONNECT ${USER_PASS};
34+
create or replace directory dmpdir as '/tmp';
35+
DROP USER GOCDB5 CASCADE;
36+
EXIT;
37+
ENDSQL
38+
)
39+
echo $RESULT
40+
41+
# sample successfull $RESULT
42+
#SQL*Plus: Release 11.2.0.2.0 Production on Fri Sep 27 14:28:37 2013 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production SQL> Directory created. SQL> 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 PL/SQL procedure successfully completed. SQL> Disconnected from Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
43+

0 commit comments

Comments
 (0)