Skip to content

Commit 9d99770

Browse files
authored
Merge pull request #269 from ody/beyond_with_docs
Updates documentation
2 parents 57f8930 + 29614cb commit 9d99770

File tree

2 files changed

+53
-5
lines changed

2 files changed

+53
-5
lines changed

documentation/automated_recovery.md

+39-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,29 @@
22

33
These instructions provide automated procedures for recovering from select failures of PE components which are managed by PEADM.
44

5-
Additional manual procedures are documented in [recovery.md](recovery.md)
5+
Manual procedures are documented in [recovery.md](recovery.md)
6+
7+
## Recover from failed primary Puppet server
8+
9+
1. Promote the replica ([official docs](https://puppet.com/docs/pe/2019.8/dr_configure.html#dr-promote-replica))
10+
2. [Replace missing or failed replica Puppet server](#replace-missing-or-failed-replica-puppet-server)
11+
12+
## Replace missing or failed replica Puppet server
13+
14+
This procedure uses the following placeholder references.
15+
16+
* _\<primary-server-fqdn\>_ - The FQDN and certname of the primary Puppet server
17+
* _\<replica-postgres-server-fqdn\>_ - The FQDN and certname of the PE-PostgreSQL server which resides in the same availability group as the replacement replica Puppet server
18+
* _\<replacement-replica-fqdn\>_ - The FQDN and certname of the replacement replica Puppet server
19+
20+
1. Run `peadm::add_replica` plan to deploy replacement replica Puppet server
21+
1. For Standard and Large deployments
22+
23+
bolt plan run peadm::add_replica primary_host=<primary-server-fqdn> replica_host=<replacement-replica-fqdn>
24+
25+
2. For Extra Large deployments
26+
27+
bolt plan run peadm::add_replica primary_host=<primary-server-fqdn> replica_host=<replacement-replica-fqdn> replica_postgresql_host=<replica-postgres-server-fqdn>
628

729
## Replace failed PE-PostgreSQL server (A or B side)
830

@@ -22,7 +44,7 @@ Procedure:
2244

2345
2. Temporarily set both primary and replica server nodes so that they use the remaining healthy PE-PostgreSQL server
2446

25-
bolt plan run peadm::util::update_db_setting --target <primary-server-fqdn>,<replica-server-fqdn> primary_postgresql_host=<working-postgres-server-fqdn> override=true
47+
bolt plan run peadm::util::update_db_setting --target <primary-server-fqdn>,<replica-server-fqdn> postgresql_host=<working-postgres-server-fqdn> override=true
2648

2749
3. Restart `pe-puppetdb.service` on Puppet server primary and replica
2850

@@ -34,4 +56,18 @@ Procedure:
3456

3557
5. Run `peadm::add_database` plan to deploy replacement PE-PostgreSQL server
3658

37-
bolt plan run peadm::add_database -t <replacement-postgres-server-fqdn> primary_host=<primary-server-fqdn>
59+
bolt plan run peadm::add_database -t <replacement-postgres-server-fqdn> primary_host=<primary-server-fqdn>
60+
61+
## Replace failed replica puppet server AND failed replica pe-postgresql server
62+
63+
This procedure uses the following placeholder references.
64+
65+
* _\<primary-server-fqdn\>_ - The FQDN and certname of the primary Puppet server
66+
* _\<failed-replica-fqdn\>_ - The FQDN and certname of the failed replica Puppet server
67+
68+
1. Ensure the old replica server is forgotten.
69+
70+
bolt command run "/opt/puppetlabs/bin/puppet infrastructure forget <failed-replica-fqdn>" --targets <primary-server-fqdn>
71+
72+
2. [Replace failed PE-PostgreSQL server (A or B side)](#replace-failed-pe-postgresql-server-a-or-b-side)
73+
3. [Replace missing or failed replica Puppet server](#replace-missing-or-failed-replica-puppet-server)

plans/add_replica.pp

+14-2
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,15 @@
3131
$replica_postgresql_target,
3232
]))
3333

34+
# Get current peadm config to ensure we forget active replicas
35+
$peadm_config = run_task('peadm::get_peadm_config', $primary_target).first.value
36+
37+
# Make list of all possible replicas, configured and provided
38+
$replicas = peadm::flatten_compact([
39+
$replica_host,
40+
$peadm_config['params']['replica_host']
41+
]).unique
42+
3443
$certdata = run_task('peadm::cert_data', $primary_target).first.value
3544
$primary_avail_group_letter = $certdata['extensions'][peadm::oid('peadm_availability_group')]
3645
$replica_avail_group_letter = $primary_avail_group_letter ? { 'A' => 'B', 'B' => 'A' }
@@ -40,7 +49,9 @@
4049
$dns_alt_names = [$replica_target.peadm::certname()] + (pick($certdata['dns-alt-names'], []) - $certdata['certname'])
4150

4251
# This has the effect of revoking the node's certificate, if it exists
43-
run_command("/opt/puppetlabs/bin/puppet infrastructure forget ${replica_target.peadm::certname()}", $primary_target, _catch_errors => true)
52+
$replicas.each |$replica| {
53+
run_command("/opt/puppetlabs/bin/puppet infrastructure forget ${replica}", $primary_target, _catch_errors => true)
54+
}
4455

4556
run_plan('peadm::subplans::component_install', $replica_target,
4657
primary_host => $primary_target,
@@ -76,7 +87,8 @@
7687
server_a_host => $replica_avail_group_letter ? { 'A' => $replica_host, default => undef },
7788
server_b_host => $replica_avail_group_letter ? { 'B' => $replica_host, default => undef },
7889
internal_compiler_a_pool_address => $replica_avail_group_letter ? { 'A' => $replica_host, default => undef },
79-
internal_compiler_b_pool_address => $replica_avail_group_letter ? { 'B' => $replica_host, default => undef }
90+
internal_compiler_b_pool_address => $replica_avail_group_letter ? { 'B' => $replica_host, default => undef },
91+
peadm_config => $peadm_config
8092
)
8193

8294
# Source the global hiera.yaml from Primary and synchronize to new Replica

0 commit comments

Comments
 (0)