diff --git a/documentation/automated_recovery.md b/documentation/automated_recovery.md index 989bfd69..701e163f 100644 --- a/documentation/automated_recovery.md +++ b/documentation/automated_recovery.md @@ -2,7 +2,29 @@ These instructions provide automated procedures for recovering from select failures of PE components which are managed by PEADM. -Additional manual procedures are documented in [recovery.md](recovery.md) +Manual procedures are documented in [recovery.md](recovery.md) + +## Recover from failed primary Puppet server + +1. Promote the replica ([official docs](https://puppet.com/docs/pe/2019.8/dr_configure.html#dr-promote-replica)) +2. [Replace missing or failed replica Puppet server](#replace-missing-or-failed-replica-puppet-server) + +## Replace missing or failed replica Puppet server + +This procedure uses the following placeholder references. + +* _\_ - The FQDN and certname of the primary Puppet server +* _\_ - The FQDN and certname of the PE-PostgreSQL server which resides in the same availability group as the replacement replica Puppet server +* _\_ - The FQDN and certname of the replacement replica Puppet server + +1. Run `peadm::add_replica` plan to deploy replacement replica Puppet server + 1. For Standard and Large deployments + + bolt plan run peadm::add_replica primary_host= replica_host= + + 2. For Extra Large deployments + + bolt plan run peadm::add_replica primary_host= replica_host= replica_postgresql_host= ## Replace failed PE-PostgreSQL server (A or B side) @@ -22,7 +44,7 @@ Procedure: 2. Temporarily set both primary and replica server nodes so that they use the remaining healthy PE-PostgreSQL server - bolt plan run peadm::util::update_db_setting --target , primary_postgresql_host= override=true + bolt plan run peadm::util::update_db_setting --target , postgresql_host= override=true 3. Restart `pe-puppetdb.service` on Puppet server primary and replica @@ -34,4 +56,18 @@ Procedure: 5. Run `peadm::add_database` plan to deploy replacement PE-PostgreSQL server - bolt plan run peadm::add_database -t primary_host= \ No newline at end of file + bolt plan run peadm::add_database -t primary_host= + +## Replace failed replica puppet server AND failed replica pe-postgresql server + +This procedure uses the following placeholder references. + +* _\_ - The FQDN and certname of the primary Puppet server +* _\_ - The FQDN and certname of the failed replica Puppet server + +1. Ensure the old replica server is forgotten. + + bolt command run "/opt/puppetlabs/bin/puppet infrastructure forget " --targets + +2. [Replace failed PE-PostgreSQL server (A or B side)](#replace-failed-pe-postgresql-server-a-or-b-side) +3. [Replace missing or failed replica Puppet server](#replace-missing-or-failed-replica-puppet-server) \ No newline at end of file diff --git a/plans/add_replica.pp b/plans/add_replica.pp index c906332d..49d38864 100644 --- a/plans/add_replica.pp +++ b/plans/add_replica.pp @@ -31,6 +31,15 @@ $replica_postgresql_target, ])) + # Get current peadm config to ensure we forget active replicas + $peadm_config = run_task('peadm::get_peadm_config', $primary_target).first.value + + # Make list of all possible replicas, configured and provided + $replicas = peadm::flatten_compact([ + $replica_host, + $peadm_config['params']['replica_host'] + ]).unique + $certdata = run_task('peadm::cert_data', $primary_target).first.value $primary_avail_group_letter = $certdata['extensions'][peadm::oid('peadm_availability_group')] $replica_avail_group_letter = $primary_avail_group_letter ? { 'A' => 'B', 'B' => 'A' } @@ -40,7 +49,9 @@ $dns_alt_names = [$replica_target.peadm::certname()] + (pick($certdata['dns-alt-names'], []) - $certdata['certname']) # This has the effect of revoking the node's certificate, if it exists - run_command("/opt/puppetlabs/bin/puppet infrastructure forget ${replica_target.peadm::certname()}", $primary_target, _catch_errors => true) + $replicas.each |$replica| { + run_command("/opt/puppetlabs/bin/puppet infrastructure forget ${replica}", $primary_target, _catch_errors => true) + } run_plan('peadm::subplans::component_install', $replica_target, primary_host => $primary_target, @@ -76,7 +87,8 @@ server_a_host => $replica_avail_group_letter ? { 'A' => $replica_host, default => undef }, server_b_host => $replica_avail_group_letter ? { 'B' => $replica_host, default => undef }, internal_compiler_a_pool_address => $replica_avail_group_letter ? { 'A' => $replica_host, default => undef }, - internal_compiler_b_pool_address => $replica_avail_group_letter ? { 'B' => $replica_host, default => undef } + internal_compiler_b_pool_address => $replica_avail_group_letter ? { 'B' => $replica_host, default => undef }, + peadm_config => $peadm_config ) # Source the global hiera.yaml from Primary and synchronize to new Replica