Skip to content

Replace old failover references with new failover #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 42 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,59 +5,59 @@ Author: David Meredith + JK

This repo contains the service and cron scripts used to run a failover gocdb instance, includes the following dirs:
* autoEngageFailover/
* Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
* Contians a Service script (```gocdb-autofailover.sh```) and child scripts that monitors the main production instance. If a prolonged outage is detected, the GOCDB top DNS alias 'goc.egi.eu' is swtiched from the production instance to the failover instance. This switch can also be performed manually when needed.
* importDBdmpFile/
* Contains a script that should be invoked by cron hourly (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process.
* Contains a script that should be invoked by cron hourly (```1_runDbUpdate.sh```) to fetch and install a .dmp of the production DB into the local failover DB. This runs separtely from the autoEngageFailover process.
* nsupdate_goc/
* Scripts for switching the DNS to/from the production/failover instance.
* Scripts for switching the DNS to/from the production/failover instance.
* archiveDmpDownload/
* Contains a script to download/archive dmp files in a separate process
* Contains a script to download/archive dmp files in a separate process

# Packages
* The following scripts needs to be installed and configuired for your installation:
* The following scripts needs to be installed and configuired for your installation:
```
/root/
autoEngageFailover/ # Scripts to mon the production instance and engage failover
|_ gocdb-autofailover.sh# MAIN SERVICE SCRIPT to mon production instance
|_ engageFailover.sh # Child script, run if prolonged outage is detected

importDBdmpFile/ # Scripts fetch/install a .dmp of the prod data
|_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below
|_ 1_runDbUpdate.sh # MAIN SCRIPT that can be called from cron, invokes child scripts below
|_ ora11gEnvVars.sh # Setup oracle env
|_ getDump.sh # Fetch a .dmp of the production data
|_ getDump.sh # Fetch a .dmp of the production data
|_ dropGocdbUser.sh # Drops the current DB schema
|_ loadData.sh # Load the last successfully fetched DB dmp into the RDBMS
|_ gatherStats.sh # Oracle gathers stats to re-index
|_ pass_file_exemplar.txt # Sample pwd file for DB (rename to pass_file)

nsupdate_goc/ # Scripts for switching the DNS to the failover
|_ goc_failover.sh # Points DNS to failover instance
|_ goc_production.sh # Points DNS to production instance
|_ goc_production.sh # Points DNS to production instance

archiveDmpDownload/ # Contains script to download/archive dmp files in a separate process e.g from cron.daily
|_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir
|_ archive/ # Contains archive/dmp files
|_ archiveDump.sh # Main script that dowloads dmp and saves in a sub-dir
|_ archive/ # Contains archive/dmp files
```

## /root/autoEngageFailover/
## /root/autoEngageFailover/
Start in this dir. Dir contains the 'gocdb-autofailover.sh'
service script which should be installed as a service in
'/etc/init.d/gocdb-autofailover'. This service invokes
'engageFailover.sh' which monitors the production instance
with a ping-check. If a continued outage is detected;
the script starts the failover procedure which includes the
following:
* the gocdb admins are emailed,
following:
* the gocdb admins are emailed,
* the age of the last successfully imported dmp file is
checked to see that it is current,
checked to see that it is current,
* the hourly cron that fetches the dmp file is stopped (see
importDBdmpFile below),
importDBdmpFile below),
* <strike>symbolic links to the server cert/key are updated so they
point to the 'goc.egi.eu' cert/key</strike> (note, no longer needed as cert contains dual SAN)
point to the 'goc.egi.eu' cert/key</strike> (note, no longer needed as cert contains dual SAN)
* the dnscripts are invoked to change the dns (see
nsupdate_goc below).

## /root/importDBdmpFile/
## /root/importDBdmpFile/
Contains scripts that fetches the .dmp file and install this
dmp file into the local Oracle XE instance. The master script
is '1_runDbUpdate.sh' which needs to be invoked from an hourly
Expand All @@ -70,29 +70,29 @@ cron:
/root/importDBdmpFile/1_runDbUpdate.sh
```

You will also need to:
You will also need to:
* generate a public/private key pair using `ssh-keygen` and ensure the public
key is present on the host with the database dmp file.
* populate `importDBdmpFile/failover_TEMPLATE.sh` with
appropriate values and copy it to `/etc/gocdb/failover.sh`

## /root/nsupdate_goc/
Contains the nsupdate keys and nsupdate scripts for switching
the 'goc.egi.eu' top level DNS alias to point to either the
production instance or the failover.
production instance or the failover.


## /root/archiveDmpDownload/
Contains a script that downloads the dmp file and stores the file in the archive/ sub-dir.
The script also deletes archived files that are older than 'x' days.
This script can be called in a separate process, e.g. from cron.daily to build a
set of backups.
The script also deletes archived files that are older than 'x' days.
This script can be called in a separate process, e.g. from cron.daily to build a
set of backups.


#Failover Instructions
#Failover Instructions
* Choose from options 1) 2) 3)

## To start/stop the auto failover service
## To start/stop the auto failover service
This will continuously monitor the production
instance and engage the failover automatically during prolonged outages

Expand All @@ -105,24 +105,24 @@ chkconfig --list | grep gocdb-auto
/sbin/service gocdb-autofailover status

```
Directly (not as a service):

Directly (not as a service):

```bash
cd /root/autoEngageFailover
./gocdb-autofailover.sh {start|stop|restart}

```

## To manually engage the failover immediately
## To manually engage the failover immediately
E.g. for known/scheduled outages, run the following
passing 'now' as the first command-line argument:

Stop the service:
Stop the service:
```
service gocdb-autofailover stop
```
Or to stop if running manually:
Or to stop if running manually:
```
cd /root/autoEngageFailover
./gocdb-autofailover.sh stop
Expand All @@ -136,16 +136,16 @@ Engage the failover now:
You will need to manually revert the steps executed by the
failover so the dns points back to the production instance
and restore/restart the failover process. This includes:
* <strike>restore the symlinks to the goc.dl.ac.uk server cert and key
(see details below)</strike> (no longer needed as cert contains dual SAN)
* <strike>restore the symlinks to the gocdb.hartree.stfc.ac.uk server cert and key
(see details below)</strike> (no longer needed as cert contains dual SAN)
* restore the hourly cron to fetch the dmp of the DB
* run nsupdate procedure to repoint 'goc.egi.eu' back to
'gocdb-base.esc.rl.ac.uk'
MUST read /root/nsupdate_goc/nsupdateReadme.txt.
MUST read /root/nsupdate_goc/nsupdateReadme.txt.
* restart the failover service

####Restore Walkthrough
At end of downtime (production instance ready to be restored) first re-point DNS:
At end of downtime (production instance ready to be restored) first re-point DNS:

```bash
echo We first switch dns to point to production instance
Expand All @@ -154,7 +154,7 @@ cd /root/nsupdate_goc

```

Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will
Now wait for DNS to settle, this takes approx **2hrs** and during this time the goc.egi.eu domain will
swtich between the failover instance and the production instance. You should monitor this using nsupdate:

```bash
Expand All @@ -167,7 +167,7 @@ nslookup goc.egi.eu
Address: 130.246.143.160
```

After DNS has become stable the production instance will now be serving requests.
After DNS has become stable the production instance will now be serving requests.
Only after this ~2hr period should we re-start failover service:

```bash
Expand All @@ -177,14 +177,14 @@ rm /root/autoEngageFailover/engage.lock
mv cronRunDbUpdate.sh /etc/cron.hourly

# Below server cert change no longer needed as cert contains dual SAN
# This means a server restart is no longer needed.
#echo Change server certificate and key back for goc.dl.ac.uk
#ln -sf /etc/pki/tls/private/goc.dl.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
#ln -sf /etc/grid-security/goc.dl.ac.uk.cert.pem /etc/grid-security/hostcert.pem
# This means a server restart is no longer needed.
#echo Change server certificate and key back for gocdb.hartree.stfc.ac.uk
#ln -sf /etc/pki/tls/private/gocdb.hartree.stfc.ac.uk.key.pem /etc/pki/tls/private/hostkey.pem
#ln -sf /etc/grid-security/gocdb.hartree.stfc.ac.uk.cert.pem /etc/grid-security/hostcert.pem
#service httpd restart
#service gocdb-autofailover start
#service gocdb-autofailover status
# gocdb-autofailover is running...
# gocdb-autofailover is running...
```

Now check the '/root/autoEngageFailover/pingCheckLog.txt' and
Expand Down
Loading