Fix classification when adding some components #258

ody · 2022-05-06T00:28:59Z

Ensure classification is updated appropriately. Lacking classification results in plans only being able to replace components if they are of the same name but entirely unconfigured.

Fixes the lack of classificaton in the add_replica plan so that it does not fail when adding a replica to a deployment which was not previously configured with one. Without this fix, the plan could only replace failed replicas of the same name.

Changes to add_replica which fix classificaton invalidate tests, commit makes them valid again.

Previous to this, utility plan update_classification made unnecessary assumptions about primaries and replicas. This commit ensure those assumptions are not made and classification is based solely on availability group letter.

The switch to availability group based classification necessitated changes to add_database for it to continue working. Does a little clean up of various cruft along the way.

It is not guaranteed to be in path

When reusing failed infrastructure components they may be configured for a different primary then is current and have an old certificate revacation list. Commit ensures that agent configuration is updated for the current primary and fetches CRL from that primary. Includes a little cleanup lifted from the add_compiler plan.

When running peadm::subplans::modify_certificate also get status of certificate from the perspective of the primary to detect if the certificate has been revoked. Introduces new task, peadm::cert_valid_status which checks different failure scenarios when validating certificates.

Acceptable failures when running clean on a primary expanded to address scenarios where an infrastructure component is cleaned by another process, e.g. puppet infrastructure forget

Creates a utility plan that is used by add_replica plan to source the primary's global hiera configuration and distribute it to replicat target. Without this, data in the console is not available when compiling catalogs after replica is promoted.

Capability to set PuppetDB database backend address to anything. Previously, peadm::util::update_db_setting would always attempt to pair configuration with appropriate availability group letter but in DR scenarios this is not appropriate.

The addition of the peadm::cert_valid_status task triggered test suite failures. Commit fixes them.

ody · 2022-05-31T23:46:40Z

Ran through orchestrator. Bolt runs successfully until reaching the final step when it reported that it failed to connect to the rbac-api.

Starting: task peadm::provision_replica on pe-server-2b9722-0.us-west1-a.c.slice-cody.internal
Finished: task peadm::provision_replica with 1 failure in 74.6 sec
Finished: plan peadm::add_replica in 2 min, 53 sec
Failed on pe-server-2b9722-0.us-west1-a.c.slice-cody.internal:
  Could not connect to server with https://pe-server-2b9722-0.us-west1-a.c.slice-cody.internal:4433/rbac-api/v2/auth/token/authenticate
Failed on 1 target: pe-server-2b9722-0.us-west1-a.c.slice-cody.internal
Ran on 1 target

Orchestrator does continue running the final task though and it is ultimately successful, resulting in a functional fully provisioned Replica. Test was completed on the CLI via Bolt, using a token. I presume this is related to the puppet infrastructure enable replica command causing restarts of pe-puppetserver

ody · 2022-06-08T23:34:58Z

Noticed today while testing some workflows that the add_database code does not take into consideration postgres 14

documentation/automated_recovery.md

mcka1n · 2022-06-14T14:31:54Z

Hey @ody the logic looks good to me 👍

ody force-pushed the use_failed_primary branch 14 times, most recently from a6806a7 to 9ea0cf5 Compare May 11, 2022 22:56

ody changed the title ~~(WIP) Deploy new replica hosts~~ Fix classification of some added components May 12, 2022

ody added the bugfix label May 12, 2022

ody changed the title ~~Fix classification of some added components~~ Fix classification when adding some components May 12, 2022

ody force-pushed the use_failed_primary branch 13 times, most recently from 9c9945c to 9085054 Compare May 13, 2022 17:27

ody force-pushed the use_failed_primary branch 6 times, most recently from f6f5b7e to 828050f Compare May 20, 2022 20:39

ody added 12 commits May 31, 2022 20:50

Repair failing add_replica tests after additions

724fcbe

Changes to add_replica which fix classificaton invalidate tests, commit makes them valid again.

Base classification on availability group

168d378

Previous to this, utility plan update_classification made unnecessary assumptions about primaries and replicas. This commit ensure those assumptions are not made and classification is based solely on availability group letter.

Make add_database compatible with classification

d8e439a

The switch to availability group based classification necessitated changes to add_database for it to continue working. Does a little clean up of various cruft along the way.

Add full path to puppet command

b33a511

It is not guaranteed to be in path

Expand scenario where clean failure is acceptable

ebb96d1

Acceptable failures when running clean on a primary expanded to address scenarios where an infrastructure component is cleaned by another process, e.g. puppet infrastructure forget

Provide more more comments for clarity

39e7708

Provide ability to override db settings

113b47a

Capability to set PuppetDB database backend address to anything. Previously, peadm::util::update_db_setting would always attempt to pair configuration with appropriate availability group letter but in DR scenarios this is not appropriate.

Fix tests after inclusion of cert_valid_status

8b678e4

The addition of the peadm::cert_valid_status task triggered test suite failures. Commit fixes them.

ody force-pushed the use_failed_primary branch from 828050f to 8b678e4 Compare May 31, 2022 20:50

Fix rubocop

1de4526

ody mentioned this pull request Jun 13, 2022

add_compiler plan not configuring puppedb allow list on secondary #264

Closed

mcka1n reviewed Jun 14, 2022

View reviewed changes

documentation/automated_recovery.md Show resolved Hide resolved

mcka1n approved these changes Jun 14, 2022

View reviewed changes

ody marked this pull request as ready for review June 15, 2022 22:17

ody requested a review from a team as a code owner June 15, 2022 22:17

ody merged commit 33317df into puppetlabs:main Jun 15, 2022

ody deleted the use_failed_primary branch June 15, 2022 23:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix classification when adding some components #258

Fix classification when adding some components #258

ody commented May 6, 2022 •

edited

Loading

ody commented May 31, 2022 •

edited

Loading

ody commented Jun 8, 2022

mcka1n commented Jun 14, 2022

Fix classification when adding some components #258

Fix classification when adding some components #258

Conversation

ody commented May 6, 2022 • edited Loading

ody commented May 31, 2022 • edited Loading

ody commented Jun 8, 2022

mcka1n commented Jun 14, 2022

ody commented May 6, 2022 •

edited

Loading

ody commented May 31, 2022 •

edited

Loading