diff --git a/README.md b/README.md
index 8a337c1..31a8751 100644
--- a/README.md
+++ b/README.md
@@ -5,26 +5,26 @@ Author: David Meredith + JK
This repo contains the service and cron scripts used to run a failover gocdb instance, includes the following dirs:
* autoEngageFailover/
- * Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
+ * Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
* importDBdmpFile/
- * Contains a script that should be invoked by cron hourly (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process.
+ * Contains a script that should be invoked by cron hourly (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process.
* nsupdate_goc/
- * Scripts for switching the DNS to/from the production/failover instance.
+ * Scripts for switching the DNS to/from the production/failover instance.
* archiveDmpDownload/
- * Contains a script to download/archive dmp files in a separate process
+ * Contains a script to download/archive dmp files in a separate process
# Packages
-* The following scripts needs to be installed and configuired for your installation:
+* The following scripts needs to be installed and configuired for your installation:
```
/root/
autoEngageFailover/ # Scripts to mon the production instance and engage failover
|_ gocdb-autofailover.sh# MAIN SERVICE SCRIPT to mon production instance
|_ engageFailover.sh # Child script, run if prolonged outage is detected
-
+
importDBdmpFile/ # Scripts fetch/install a .dmp of the prod data
- |_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below
+ |_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below
|_ ora11gEnvVars.sh # Setup oracle env
- |_ getDump.sh # Fetch a .dmp of the production data
+ |_ getDump.sh # Fetch a .dmp of the production data
|_ dropGocdbUser.sh # Drops the current DB schema
|_ loadData.sh # Load the last successfully fetched DB dmp into the RDBMS
|_ gatherStats.sh # Oracle gathers stats to re-index
@@ -32,32 +32,32 @@ This repo contains the service and cron scripts used to run a failover gocdb ins
nsupdate_goc/ # Scripts for switching the DNS to the failover
|_ goc_failover.sh # Points DNS to failover instance
- |_ goc_production.sh # Points DNS to production instance
+ |_ goc_production.sh # Points DNS to production instance
archiveDmpDownload/ # Contains script to download/archive dmp files in a separate process e.g from cron.daily
- |_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir
- |_ archive/ # Contains archive/dmp files
+ |_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir
+ |_ archive/ # Contains archive/dmp files
```
-## /root/autoEngageFailover/
+## /root/autoEngageFailover/
Start in this dir. Dir contains the 'gocdb-autofailover.sh'
service script which should be installed as a service in
'/etc/init.d/gocdb-autofailover'. This service invokes
'engageFailover.sh' which monitors the production instance
with a ping-check. If a continued outage is detected;
the script starts the failover procedure which includes the
-following:
-* the gocdb admins are emailed,
+following:
+* the gocdb admins are emailed,
* the age of the last successfully imported dmp file is
- checked to see that it is current,
+ checked to see that it is current,
* the hourly cron that fetches the dmp file is stopped (see
- importDBdmpFile below),
+ importDBdmpFile below),
* symbolic links to the server cert/key are updated so they
- point to the 'goc.egi.eu' cert/key (note, no longer needed as cert contains dual SAN)
+ point to the 'goc.egi.eu' cert/key (note, no longer needed as cert contains dual SAN)
* the dnscripts are invoked to change the dns (see
nsupdate_goc below).
-## /root/importDBdmpFile/
+## /root/importDBdmpFile/
Contains scripts that fetches the .dmp file and install this
dmp file into the local Oracle XE instance. The master script
is '1_runDbUpdate.sh' which needs to be invoked from an hourly
@@ -70,29 +70,29 @@ cron:
/root/importDBdmpFile/1_runDbUpdate.sh
```
-You will also need to:
+You will also need to:
* generate a public/private key pair using `ssh-keygen` and ensure the public
key is present on the host with the database dmp file.
* populate `importDBdmpFile/failover_TEMPLATE.sh` with
appropriate values and copy it to `/etc/gocdb/failover.sh`
-
+
## /root/nsupdate_goc/
Contains the nsupdate keys and nsupdate scripts for switching
the 'goc.egi.eu' top level DNS alias to point to either the
-production instance or the failover.
+production instance or the failover.
## /root/archiveDmpDownload/
Contains a script that downloads the dmp file and stores the file in the archive/ sub-dir.
-The script also deletes archived files that are older than 'x' days.
-This script can be called in a separate process, e.g. from cron.daily to build a
-set of backups.
+The script also deletes archived files that are older than 'x' days.
+This script can be called in a separate process, e.g. from cron.daily to build a
+set of backups.
-#Failover Instructions
+#Failover Instructions
* Choose from options 1) 2) 3)
-## To start/stop the auto failover service
+## To start/stop the auto failover service
This will continuously monitor the production
instance and engage the failover automatically during prolonged outages
@@ -105,8 +105,8 @@ chkconfig --list | grep gocdb-auto
/sbin/service gocdb-autofailover status
```
-
-Directly (not as a service):
+
+Directly (not as a service):
```bash
cd /root/autoEngageFailover
@@ -114,15 +114,15 @@ cd /root/autoEngageFailover
```
-## To manually engage the failover immediately
+## To manually engage the failover immediately
E.g. for known/scheduled outages, run the following
passing 'now' as the first command-line argument:
-Stop the service:
+Stop the service:
```
service gocdb-autofailover stop
```
-Or to stop if running manually:
+Or to stop if running manually:
```
cd /root/autoEngageFailover
./gocdb-autofailover.sh stop
@@ -136,16 +136,16 @@ Engage the failover now:
You will need to manually revert the steps executed by the
failover so the dns points back to the production instance
and restore/restart the failover process. This includes:
-* restore the symlinks to the goc.dl.ac.uk server cert and key
- (see details below) (no longer needed as cert contains dual SAN)
+* restore the symlinks to the gocdb.hartree.stfc.ac.uk server cert and key
+ (see details below) (no longer needed as cert contains dual SAN)
* restore the hourly cron to fetch the dmp of the DB
* run nsupdate procedure to repoint 'goc.egi.eu' back to
'gocdb-base.esc.rl.ac.uk'
- MUST read /root/nsupdate_goc/nsupdateReadme.txt.
+ MUST read /root/nsupdate_goc/nsupdateReadme.txt.
* restart the failover service
####Restore Walkthrough
-At end of downtime (production instance ready to be restored) first re-point DNS:
+At end of downtime (production instance ready to be restored) first re-point DNS:
```bash
echo We first switch dns to point to production instance
@@ -154,7 +154,7 @@ cd /root/nsupdate_goc
```
-Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will
+Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will
swtich between the failover instance and the production instance. You should monitor this using nsupdate:
```bash
@@ -167,7 +167,7 @@ nslookup goc.egi.eu
Address: 130.246.143.160
```
-After DNS has become stable the production instance will now be serving requests.
+After DNS has become stable the production instance will now be serving requests.
Only after this ~2hr period should we re-start failover service:
```bash
@@ -177,14 +177,14 @@ rm /root/autoEngageFailover/engage.lock
mv cronRunDbUpdate.sh /etc/cron.hourly
# Below server cert change no longer needed as cert contains dual SAN
-# This means a server restart is no longer needed.
-#echo Change server certificate and key back for goc.dl.ac.uk
-#ln -sf /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
-#ln -sf /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem
+# This means a server restart is no longer needed.
+#echo Change server certificate and key back for gocdb.hartree.stfc.ac.uk
+#ln -sf /etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
+#ln -sf /etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem /etc/grid-security/hostcert.pem
#service httpd restart
#service gocdb-autofailover start
#service gocdb-autofailover status
-# gocdb-autofailover is running...
+# gocdb-autofailover is running...
```
Now check the '/root/autoEngageFailover/pingCheckLog.txt' and
diff --git a/autoEngageFailover/engageFailover.sh b/autoEngageFailover/engageFailover.sh
index 845425c..6468aac 100644
--- a/autoEngageFailover/engageFailover.sh
+++ b/autoEngageFailover/engageFailover.sh
@@ -1,23 +1,23 @@
#!/bin/bash
-# Usage: ./autoEnageFailover.sh [now]
-# where now is optional. If 'now' is specified as the first cmd line arg, then
-# the failover is engaged immediately rather than on detection of a prolongued outage.
-#
-# Script will fail early if the lockFile from previous engage is present.
+# Usage: ./autoEnageFailover.sh [now]
+# where now is optional. If 'now' is specified as the first cmd line arg, then
+# the failover is engaged immediately rather than on detection of a prolongued outage.
#
-# Note, after the main instance has been restored, you will need to manually
-# do the following steps:
+# Script will fail early if the lockFile from previous engage is present.
+#
+# Note, after the main instance has been restored, you will need to manually
+# do the following steps:
# Revert this swap:
-# ln -s /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
-# ln -s /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem
+# ln -s /etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
+# ln -s /etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem /etc/grid-security/hostcert.pem
#
-# Restore hourly cron job:
-# mv /root/cronRunDbUpdate.sh /etc/cron.hourly/
+# Restore hourly cron job:
+# mv /root/cronRunDbUpdate.sh /etc/cron.hourly/
# ====================Setup Variables===========================
-# setup log files
+# setup log files
updateLog=/root/autoEngageFailover/pingCheckLog.txt
errorEngageFailoverLog=/root/autoEngageFailover/errorEngageFailoverLog.txt
lockFile=/root/autoEngageFailover/engage.lock
@@ -28,11 +28,11 @@ importDBdmpFile=/root/importDBdmpFile
# maintainthe current fail count
failcount=0
-# server certificate / key
-# note, in production we will use the goc.dl.ac.uk server/host cert and key which has no
-# password protecting the private key.
-userkey="/etc/pki/tls/private/goc.dl.ac.uk.key.pem"
-usercert="/etc/grid-security/goc.dl.ac.uk.cert.pem"
+# server certificate / key
+# note, in production we will use the gocdb.hartree.stfc.ac.uk server/host cert and key which has no
+# password protecting the private key.
+userkey="/etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem"
+usercert="/etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem"
# URL to monitor for the main production instance
pingUrl="https://goc.egi.eu/portal/GOCDB_monitor/ops_monitor_check.php"
@@ -40,17 +40,17 @@ pingUrl="https://goc.egi.eu/portal/GOCDB_monitor/ops_monitor_check.php"
# An external url to check that local network can reach outside
externalPingUrl="http://google.co.uk"
-# number of secs between re-pings (600secs = 10mins)
+# number of secs between re-pings (600secs = 10mins)
sleepTime=600s
# number of successive fails before invoking failover (30 * 10mins = 300mins = 5hrs)
failCountLimit=30
-# email subject and to address for notification that failover is engaged
+# email subject and to address for notification that failover is engaged
SUBJECT="gocdb failover warning"
TO="some.body@world.com,a.n.other@world.com"
-# Determine whether to engage the failover immediately
+# Determine whether to engage the failover immediately
ENGAGENOW="false"
# =====================================================
@@ -65,7 +65,7 @@ if [ -n "$1" ] ; then
fi
-# email all given args to $TO
+# email all given args to $TO
function email {
/bin/mail -s "$SUBJECT" "$TO" <> $updateLog
}
@@ -132,28 +132,28 @@ fi
-# Create the log if it don't already exist
-touch $updateLog
+# Create the log if it don't already exist
+touch $updateLog
touch $errorEngageFailoverLog
logger "==============================Starting up $(date)====================================="
errorLogger "===================================Starting up $(date)=================================="
# loop if not engaging now
if [ $ENGAGENOW == "false" ] ; then
- # loop while global failcount is less than x
+ # loop while global failcount is less than x
while [ $failcount -lt $failCountLimit ]
do
- pingCode=$(pingcheck)
+ pingCode=$(pingcheck)
if [ $pingCode != 0 ]; then
- # if ping failed then increment failcount
+ # if ping failed then increment failcount
(( failcount++ ))
else
# else if ping worked re-set failcount (back) to zero
failcount=0
- #logger "ping ok $(date) : $pingUrl"
+ #logger "ping ok $(date) : $pingUrl"
fi
-
- #echo "failcount is: $failcount, pingcode is: $pingCode"
+
+ #echo "failcount is: $failcount, pingcode is: $pingCode"
sleep $sleepTime
done
fi
@@ -162,20 +162,20 @@ fi
# 'N' consecutive failures encountered. Next invoke failover script
# =================================================================
-# - log the date
+# - log the date
errorLogger "=============Start Failover Swtich================="
errorLogger "Detected successive failues on $(date)"
errorLogger "Starting engage failover"
email "Detected successive failures. Attempting to engage the failover - please see the logs: $updateLog $errorEngageFailoverLog"
-# While developing, force an exit here (will have to practice below using
-# the provided test.egi.eu goc domain)
+# While developing, force an exit here (will have to practice below using
+# the provided test.egi.eu goc domain)
#exit 0
-# - Test that the last goc.dmp imported ok by parsing /root/importDBdmpFile/updateLog.txt
+# - Test that the last goc.dmp imported ok by parsing /root/importDBdmpFile/updateLog.txt
#cd /root/importDBdmpFile
cd $importDBdmpFile
if [ "$(tail -1 ./updateLog.txt)" != "completed ok" ]; then
@@ -185,20 +185,20 @@ fi
errorLogger "Attempting to move cron"
-# - Move hourly cron job to disable (don't want this to execute while in failover mode)
-mv /etc/cron.hourly/cronRunDbUpdate.sh /root
+# - Move hourly cron job to disable (don't want this to execute while in failover mode)
+mv /etc/cron.hourly/cronRunDbUpdate.sh /root
errorLogger "Swapping server certs"
-## - Swap server cert
+## - Swap server cert
## Not needed e.g. if your server cert has a dual SAN
-#unlink /etc/grid-security/hostcert.pem
+#unlink /etc/grid-security/hostcert.pem
#unlink /etc/pki/tls/private/hostkey.pem
#ln -s /etc/grid-security/goc.egi.eu.cert.pem /etc/grid-security/hostcert.pem
#ln -s /etc/pki/tls/private/goc.egi.eu.key.pem /etc/pki/tls/private/hostkey.pem
-## note, after the main instance has been restored, you will need to revert this swap:
-## ln -s /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
-## ln -s /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem
+## note, after the main instance has been restored, you will need to revert this swap:
+## ln -s /etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
+## ln -s /etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem /etc/grid-security/hostcert.pem
#errorLogger "After server cert swap"
@@ -219,15 +219,15 @@ errorLogger "Swapping server certs"
#fi
#
#
-## Restart apache
+## Restart apache
#errorLogger "Restarting apache"
-#service httpd restart
+#service httpd restart
-# Finally create the lockFile to indicate the failover ran ok
+# Finally create the lockFile to indicate the failover ran ok
touch $lockFile
email "Failover script completed"
-# End
+# End
errorLogger "==========================End failover switch======================="
diff --git a/nsupdate_goc/goc_failover.sh b/nsupdate_goc/goc_failover.sh
index a960ddd..e962520 100644
--- a/nsupdate_goc/goc_failover.sh
+++ b/nsupdate_goc/goc_failover.sh
@@ -1,14 +1,14 @@
-#echo "changing goc.egi.eu DNS record at ns.mui.cz to goc.dl.ac.uk"
+#echo "changing goc.egi.eu DNS record at ns.mui.cz to gocdb.hartree.stfc.ac.uk"
#echo
nsupdate -k goc.egi.eu_ns.muni.cz_key.conf <