From 18b4bb449c77c8c95f489ae737d9b3b955e1ddad Mon Sep 17 00:00:00 2001 From: Greg Corbett Date: Mon, 9 Aug 2021 16:47:23 +0100 Subject: [PATCH 1/2] Remove whitespace --- README.md | 76 +++++++++++++-------------- autoEngageFailover/engageFailover.sh | 78 ++++++++++++++-------------- nsupdate_goc/goc_failover.sh | 2 +- nsupdate_goc/nsupdateReadme.md | 28 +++++----- 4 files changed, 91 insertions(+), 93 deletions(-) diff --git a/README.md b/README.md index 8a337c1..9c008ec 100644 --- a/README.md +++ b/README.md @@ -5,26 +5,26 @@ Author: David Meredith + JK This repo contains the service and cron scripts used to run a failover gocdb instance, includes the following dirs: * autoEngageFailover/ - * Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed. + * Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed. * importDBdmpFile/ - * Contains a script that should be invoked by cron hourly (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process. + * Contains a script that should be invoked by cron hourly (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process. * nsupdate_goc/ - * Scripts for switching the DNS to/from the production/failover instance. + * Scripts for switching the DNS to/from the production/failover instance. * archiveDmpDownload/ - * Contains a script to download/archive dmp files in a separate process + * Contains a script to download/archive dmp files in a separate process # Packages -* The following scripts needs to be installed and configuired for your installation: +* The following scripts needs to be installed and configuired for your installation: ``` /root/ autoEngageFailover/ # Scripts to mon the production instance and engage failover |_ gocdb-autofailover.sh# MAIN SERVICE SCRIPT to mon production instance |_ engageFailover.sh # Child script, run if prolonged outage is detected - + importDBdmpFile/ # Scripts fetch/install a .dmp of the prod data - |_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below + |_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below |_ ora11gEnvVars.sh # Setup oracle env - |_ getDump.sh # Fetch a .dmp of the production data + |_ getDump.sh # Fetch a .dmp of the production data |_ dropGocdbUser.sh # Drops the current DB schema |_ loadData.sh # Load the last successfully fetched DB dmp into the RDBMS |_ gatherStats.sh # Oracle gathers stats to re-index @@ -32,32 +32,32 @@ This repo contains the service and cron scripts used to run a failover gocdb ins nsupdate_goc/ # Scripts for switching the DNS to the failover |_ goc_failover.sh # Points DNS to failover instance - |_ goc_production.sh # Points DNS to production instance + |_ goc_production.sh # Points DNS to production instance archiveDmpDownload/ # Contains script to download/archive dmp files in a separate process e.g from cron.daily - |_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir - |_ archive/ # Contains archive/dmp files + |_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir + |_ archive/ # Contains archive/dmp files ``` -## /root/autoEngageFailover/ +## /root/autoEngageFailover/ Start in this dir. Dir contains the 'gocdb-autofailover.sh' service script which should be installed as a service in '/etc/init.d/gocdb-autofailover'. This service invokes 'engageFailover.sh' which monitors the production instance with a ping-check. If a continued outage is detected; the script starts the failover procedure which includes the -following: -* the gocdb admins are emailed, +following: +* the gocdb admins are emailed, * the age of the last successfully imported dmp file is - checked to see that it is current, + checked to see that it is current, * the hourly cron that fetches the dmp file is stopped (see - importDBdmpFile below), + importDBdmpFile below), * symbolic links to the server cert/key are updated so they - point to the 'goc.egi.eu' cert/key (note, no longer needed as cert contains dual SAN) + point to the 'goc.egi.eu' cert/key (note, no longer needed as cert contains dual SAN) * the dnscripts are invoked to change the dns (see nsupdate_goc below). -## /root/importDBdmpFile/ +## /root/importDBdmpFile/ Contains scripts that fetches the .dmp file and install this dmp file into the local Oracle XE instance. The master script is '1_runDbUpdate.sh' which needs to be invoked from an hourly @@ -70,29 +70,29 @@ cron: /root/importDBdmpFile/1_runDbUpdate.sh ``` -You will also need to: +You will also need to: * generate a public/private key pair using `ssh-keygen` and ensure the public key is present on the host with the database dmp file. * populate `importDBdmpFile/failover_TEMPLATE.sh` with appropriate values and copy it to `/etc/gocdb/failover.sh` - + ## /root/nsupdate_goc/ Contains the nsupdate keys and nsupdate scripts for switching the 'goc.egi.eu' top level DNS alias to point to either the -production instance or the failover. +production instance or the failover. ## /root/archiveDmpDownload/ Contains a script that downloads the dmp file and stores the file in the archive/ sub-dir. -The script also deletes archived files that are older than 'x' days. -This script can be called in a separate process, e.g. from cron.daily to build a -set of backups. +The script also deletes archived files that are older than 'x' days. +This script can be called in a separate process, e.g. from cron.daily to build a +set of backups. -#Failover Instructions +#Failover Instructions * Choose from options 1) 2) 3) -## To start/stop the auto failover service +## To start/stop the auto failover service This will continuously monitor the production instance and engage the failover automatically during prolonged outages @@ -105,8 +105,8 @@ chkconfig --list | grep gocdb-auto /sbin/service gocdb-autofailover status ``` - -Directly (not as a service): + +Directly (not as a service): ```bash cd /root/autoEngageFailover @@ -114,15 +114,15 @@ cd /root/autoEngageFailover ``` -## To manually engage the failover immediately +## To manually engage the failover immediately E.g. for known/scheduled outages, run the following passing 'now' as the first command-line argument: -Stop the service: +Stop the service: ``` service gocdb-autofailover stop ``` -Or to stop if running manually: +Or to stop if running manually: ``` cd /root/autoEngageFailover ./gocdb-autofailover.sh stop @@ -137,15 +137,15 @@ You will need to manually revert the steps executed by the failover so the dns points back to the production instance and restore/restart the failover process. This includes: * restore the symlinks to the goc.dl.ac.uk server cert and key - (see details below) (no longer needed as cert contains dual SAN) + (see details below) (no longer needed as cert contains dual SAN) * restore the hourly cron to fetch the dmp of the DB * run nsupdate procedure to repoint 'goc.egi.eu' back to 'gocdb-base.esc.rl.ac.uk' - MUST read /root/nsupdate_goc/nsupdateReadme.txt. + MUST read /root/nsupdate_goc/nsupdateReadme.txt. * restart the failover service ####Restore Walkthrough -At end of downtime (production instance ready to be restored) first re-point DNS: +At end of downtime (production instance ready to be restored) first re-point DNS: ```bash echo We first switch dns to point to production instance @@ -154,7 +154,7 @@ cd /root/nsupdate_goc ``` -Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will +Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will swtich between the failover instance and the production instance. You should monitor this using nsupdate: ```bash @@ -167,7 +167,7 @@ nslookup goc.egi.eu Address: 130.246.143.160 ``` -After DNS has become stable the production instance will now be serving requests. +After DNS has become stable the production instance will now be serving requests. Only after this ~2hr period should we re-start failover service: ```bash @@ -177,14 +177,14 @@ rm /root/autoEngageFailover/engage.lock mv cronRunDbUpdate.sh /etc/cron.hourly # Below server cert change no longer needed as cert contains dual SAN -# This means a server restart is no longer needed. +# This means a server restart is no longer needed. #echo Change server certificate and key back for goc.dl.ac.uk #ln -sf /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem #ln -sf /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem #service httpd restart #service gocdb-autofailover start #service gocdb-autofailover status -# gocdb-autofailover is running... +# gocdb-autofailover is running... ``` Now check the '/root/autoEngageFailover/pingCheckLog.txt' and diff --git a/autoEngageFailover/engageFailover.sh b/autoEngageFailover/engageFailover.sh index 845425c..8f0145d 100644 --- a/autoEngageFailover/engageFailover.sh +++ b/autoEngageFailover/engageFailover.sh @@ -1,23 +1,23 @@ #!/bin/bash -# Usage: ./autoEnageFailover.sh [now] -# where now is optional. If 'now' is specified as the first cmd line arg, then -# the failover is engaged immediately rather than on detection of a prolongued outage. -# -# Script will fail early if the lockFile from previous engage is present. +# Usage: ./autoEnageFailover.sh [now] +# where now is optional. If 'now' is specified as the first cmd line arg, then +# the failover is engaged immediately rather than on detection of a prolongued outage. # -# Note, after the main instance has been restored, you will need to manually -# do the following steps: +# Script will fail early if the lockFile from previous engage is present. +# +# Note, after the main instance has been restored, you will need to manually +# do the following steps: # Revert this swap: # ln -s /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem # ln -s /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem # -# Restore hourly cron job: -# mv /root/cronRunDbUpdate.sh /etc/cron.hourly/ +# Restore hourly cron job: +# mv /root/cronRunDbUpdate.sh /etc/cron.hourly/ # ====================Setup Variables=========================== -# setup log files +# setup log files updateLog=/root/autoEngageFailover/pingCheckLog.txt errorEngageFailoverLog=/root/autoEngageFailover/errorEngageFailoverLog.txt lockFile=/root/autoEngageFailover/engage.lock @@ -28,9 +28,9 @@ importDBdmpFile=/root/importDBdmpFile # maintainthe current fail count failcount=0 -# server certificate / key -# note, in production we will use the goc.dl.ac.uk server/host cert and key which has no -# password protecting the private key. +# server certificate / key +# note, in production we will use the goc.dl.ac.uk server/host cert and key which has no +# password protecting the private key. userkey="/etc/pki/tls/private/goc.dl.ac.uk.key.pem" usercert="/etc/grid-security/goc.dl.ac.uk.cert.pem" @@ -40,17 +40,17 @@ pingUrl="https://goc.egi.eu/portal/GOCDB_monitor/ops_monitor_check.php" # An external url to check that local network can reach outside externalPingUrl="http://google.co.uk" -# number of secs between re-pings (600secs = 10mins) +# number of secs between re-pings (600secs = 10mins) sleepTime=600s # number of successive fails before invoking failover (30 * 10mins = 300mins = 5hrs) failCountLimit=30 -# email subject and to address for notification that failover is engaged +# email subject and to address for notification that failover is engaged SUBJECT="gocdb failover warning" TO="some.body@world.com,a.n.other@world.com" -# Determine whether to engage the failover immediately +# Determine whether to engage the failover immediately ENGAGENOW="false" # ===================================================== @@ -65,7 +65,7 @@ if [ -n "$1" ] ; then fi -# email all given args to $TO +# email all given args to $TO function email { /bin/mail -s "$SUBJECT" "$TO" <> $updateLog } @@ -132,28 +132,28 @@ fi -# Create the log if it don't already exist -touch $updateLog +# Create the log if it don't already exist +touch $updateLog touch $errorEngageFailoverLog logger "==============================Starting up $(date)=====================================" errorLogger "===================================Starting up $(date)==================================" # loop if not engaging now if [ $ENGAGENOW == "false" ] ; then - # loop while global failcount is less than x + # loop while global failcount is less than x while [ $failcount -lt $failCountLimit ] do - pingCode=$(pingcheck) + pingCode=$(pingcheck) if [ $pingCode != 0 ]; then - # if ping failed then increment failcount + # if ping failed then increment failcount (( failcount++ )) else # else if ping worked re-set failcount (back) to zero failcount=0 - #logger "ping ok $(date) : $pingUrl" + #logger "ping ok $(date) : $pingUrl" fi - - #echo "failcount is: $failcount, pingcode is: $pingCode" + + #echo "failcount is: $failcount, pingcode is: $pingCode" sleep $sleepTime done fi @@ -162,20 +162,20 @@ fi # 'N' consecutive failures encountered. Next invoke failover script # ================================================================= -# - log the date +# - log the date errorLogger "=============Start Failover Swtich=================" errorLogger "Detected successive failues on $(date)" errorLogger "Starting engage failover" email "Detected successive failures. Attempting to engage the failover - please see the logs: $updateLog $errorEngageFailoverLog" -# While developing, force an exit here (will have to practice below using -# the provided test.egi.eu goc domain) +# While developing, force an exit here (will have to practice below using +# the provided test.egi.eu goc domain) #exit 0 -# - Test that the last goc.dmp imported ok by parsing /root/importDBdmpFile/updateLog.txt +# - Test that the last goc.dmp imported ok by parsing /root/importDBdmpFile/updateLog.txt #cd /root/importDBdmpFile cd $importDBdmpFile if [ "$(tail -1 ./updateLog.txt)" != "completed ok" ]; then @@ -185,18 +185,18 @@ fi errorLogger "Attempting to move cron" -# - Move hourly cron job to disable (don't want this to execute while in failover mode) -mv /etc/cron.hourly/cronRunDbUpdate.sh /root +# - Move hourly cron job to disable (don't want this to execute while in failover mode) +mv /etc/cron.hourly/cronRunDbUpdate.sh /root errorLogger "Swapping server certs" -## - Swap server cert +## - Swap server cert ## Not needed e.g. if your server cert has a dual SAN -#unlink /etc/grid-security/hostcert.pem +#unlink /etc/grid-security/hostcert.pem #unlink /etc/pki/tls/private/hostkey.pem #ln -s /etc/grid-security/goc.egi.eu.cert.pem /etc/grid-security/hostcert.pem #ln -s /etc/pki/tls/private/goc.egi.eu.key.pem /etc/pki/tls/private/hostkey.pem -## note, after the main instance has been restored, you will need to revert this swap: +## note, after the main instance has been restored, you will need to revert this swap: ## ln -s /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem ## ln -s /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem @@ -219,15 +219,15 @@ errorLogger "Swapping server certs" #fi # # -## Restart apache +## Restart apache #errorLogger "Restarting apache" -#service httpd restart +#service httpd restart -# Finally create the lockFile to indicate the failover ran ok +# Finally create the lockFile to indicate the failover ran ok touch $lockFile email "Failover script completed" -# End +# End errorLogger "==========================End failover switch=======================" diff --git a/nsupdate_goc/goc_failover.sh b/nsupdate_goc/goc_failover.sh index a960ddd..f547f13 100644 --- a/nsupdate_goc/goc_failover.sh +++ b/nsupdate_goc/goc_failover.sh @@ -8,7 +8,7 @@ update add goc.egi.eu. 60 CNAME goc.dl.ac.uk. show send EOF -#echo +#echo #echo "verifying the change ..." #echo #nslookup goc.egi.eu ns.muni.cz diff --git a/nsupdate_goc/nsupdateReadme.md b/nsupdate_goc/nsupdateReadme.md index 67be06b..1865171 100644 --- a/nsupdate_goc/nsupdateReadme.md +++ b/nsupdate_goc/nsupdateReadme.md @@ -1,13 +1,13 @@ # DNS Switch via nsupdate This recipe describes the actions to carry in order to swap GOCDB top DNS -alias from one instance to another +alias from one instance to another Author: David Meredith DNS switching is used so that goc.egi.eu domain points to the failover at DL. -Note: -* we have the nsupdate keys safe - * on gocdb-base.esc.rl.ac.uk in /root/nsupdate_goc - * on goc.dl.ac.uk in /root/nsupdate_goc +Note: +* we have the nsupdate keys safe + * on gocdb-base.esc.rl.ac.uk in /root/nsupdate_goc + * on goc.dl.ac.uk in /root/nsupdate_goc * Updated nsupdate files for controlling 'goc.egi.eu' are given below @@ -17,7 +17,7 @@ Note: cd /root/nsupdate_goc ``` -* Run goc_failover.sh to swich the dns to the failover +* Run goc_failover.sh to swich the dns to the failover ``` [root@goc nsupdate_goc]# goc_failover.sh ``` @@ -32,24 +32,22 @@ cd /root/nsupdate_goc nslookup goc.egi.eu ns.muni.cz ``` -* And globally using: +* And globally using: ``` -nslookup goc.egi.eu +nslookup goc.egi.eu host goc.egi.eu ``` ## Background GOCDB top level DNS alias (goc.egi.eu) is maintained and operated by CESNET. -The alias points to the GOCDB production web server, and can be swapped -between different instances in order to transparently bring up a replica server +The alias points to the GOCDB production web server, and can be swapped +between different instances in order to transparently bring up a replica server when the master is unreachable. Swaping is only allowed with an authorised key. -Be aware that owning the key is vital to be able to change the entry -for gog.egi.eu. Anybody who has them can do that using nsupdate with the nsupdate file below. +Be aware that owning the key is vital to be able to change the entry +for gog.egi.eu. Anybody who has them can do that using nsupdate with the nsupdate file below. -The main script (goc_failover.sh) does a nsupdate command passing in the private key and +The main script (goc_failover.sh) does a nsupdate command passing in the private key and using a 'here' document to redirect the config file into the command. Note, nsupdate with the -k option, nsupdate reads the shared secret from the file keyfile. - - From d2f43dfec7ccf7e69051297448c2116a7d35d935 Mon Sep 17 00:00:00 2001 From: Greg Corbett Date: Mon, 9 Aug 2021 16:48:26 +0100 Subject: [PATCH 2/2] Replace old failover references with new failover --- README.md | 8 ++++---- autoEngageFailover/engageFailover.sh | 14 +++++++------- nsupdate_goc/goc_failover.sh | 4 ++-- nsupdate_goc/nsupdateReadme.md | 4 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 9c008ec..31a8751 100644 --- a/README.md +++ b/README.md @@ -136,7 +136,7 @@ Engage the failover now: You will need to manually revert the steps executed by the failover so the dns points back to the production instance and restore/restart the failover process. This includes: -* restore the symlinks to the goc.dl.ac.uk server cert and key +* restore the symlinks to the gocdb.hartree.stfc.ac.uk server cert and key (see details below) (no longer needed as cert contains dual SAN) * restore the hourly cron to fetch the dmp of the DB * run nsupdate procedure to repoint 'goc.egi.eu' back to @@ -178,9 +178,9 @@ mv cronRunDbUpdate.sh /etc/cron.hourly # Below server cert change no longer needed as cert contains dual SAN # This means a server restart is no longer needed. -#echo Change server certificate and key back for goc.dl.ac.uk -#ln -sf /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem -#ln -sf /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem +#echo Change server certificate and key back for gocdb.hartree.stfc.ac.uk +#ln -sf /etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem +#ln -sf /etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem /etc/grid-security/hostcert.pem #service httpd restart #service gocdb-autofailover start #service gocdb-autofailover status diff --git a/autoEngageFailover/engageFailover.sh b/autoEngageFailover/engageFailover.sh index 8f0145d..6468aac 100644 --- a/autoEngageFailover/engageFailover.sh +++ b/autoEngageFailover/engageFailover.sh @@ -9,8 +9,8 @@ # Note, after the main instance has been restored, you will need to manually # do the following steps: # Revert this swap: -# ln -s /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem -# ln -s /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem +# ln -s /etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem +# ln -s /etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem /etc/grid-security/hostcert.pem # # Restore hourly cron job: # mv /root/cronRunDbUpdate.sh /etc/cron.hourly/ @@ -29,10 +29,10 @@ importDBdmpFile=/root/importDBdmpFile failcount=0 # server certificate / key -# note, in production we will use the goc.dl.ac.uk server/host cert and key which has no +# note, in production we will use the gocdb.hartree.stfc.ac.uk server/host cert and key which has no # password protecting the private key. -userkey="/etc/pki/tls/private/goc.dl.ac.uk.key.pem" -usercert="/etc/grid-security/goc.dl.ac.uk.cert.pem" +userkey="/etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem" +usercert="/etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem" # URL to monitor for the main production instance pingUrl="https://goc.egi.eu/portal/GOCDB_monitor/ops_monitor_check.php" @@ -197,8 +197,8 @@ errorLogger "Swapping server certs" #ln -s /etc/grid-security/goc.egi.eu.cert.pem /etc/grid-security/hostcert.pem #ln -s /etc/pki/tls/private/goc.egi.eu.key.pem /etc/pki/tls/private/hostkey.pem ## note, after the main instance has been restored, you will need to revert this swap: -## ln -s /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem -## ln -s /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem +## ln -s /etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem +## ln -s /etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem /etc/grid-security/hostcert.pem #errorLogger "After server cert swap" diff --git a/nsupdate_goc/goc_failover.sh b/nsupdate_goc/goc_failover.sh index f547f13..e962520 100644 --- a/nsupdate_goc/goc_failover.sh +++ b/nsupdate_goc/goc_failover.sh @@ -1,10 +1,10 @@ -#echo "changing goc.egi.eu DNS record at ns.mui.cz to goc.dl.ac.uk" +#echo "changing goc.egi.eu DNS record at ns.mui.cz to gocdb.hartree.stfc.ac.uk" #echo nsupdate -k goc.egi.eu_ns.muni.cz_key.conf <