Skip to content

Fix group letter assignments during upgrade #336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Feb 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
8e59121
code layout
timidri Sep 23, 2022
5827944
code layout
timidri Sep 23, 2022
6b25eba
typo
timidri Sep 23, 2022
f43e363
docs and formatting
timidri Sep 27, 2022
93a212e
formatting
timidri Sep 28, 2022
18c851c
formatting
timidri Sep 30, 2022
acbef6b
add perform_failover spec plan
timidri Sep 30, 2022
85a3334
output error when rbac token request fails
timidri Sep 30, 2022
ab85bc9
fix failover test plan and GA
timidri Sep 30, 2022
64b0322
fix architecture
timidri Sep 30, 2022
3a3b3ef
fix provision
timidri Sep 30, 2022
331a810
fix architecture
timidri Sep 30, 2022
4c29ac5
swap primary and replica; correct params
timidri Sep 30, 2022
6ad3cd0
add trace
timidri Sep 30, 2022
87c16ad
add logging
timidri Oct 1, 2022
c826700
revert back to using targets
timidri Oct 4, 2022
26d7e17
formatting
timidri Oct 5, 2022
437cf23
use .name to convert target to string
timidri Oct 5, 2022
3e7955e
change name to uri
timidri Oct 5, 2022
6f06cab
debugging
timidri Oct 6, 2022
bf027cd
debugging
timidri Oct 10, 2022
04611f6
fix
timidri Oct 10, 2022
249364c
remove certname() call
timidri Oct 10, 2022
54970a8
purge failed primary before promoting
timidri Oct 11, 2022
0b0a707
fix
timidri Oct 11, 2022
7374797
moving purge to after promote
timidri Oct 12, 2022
b041010
add timeout 0 to shutdown command
timidri Oct 14, 2022
9531601
destroy ssldir, use infra forget
timidri Oct 14, 2022
ee9de3f
disable start of networking on failed primary
timidri Oct 14, 2022
12bab14
generate rbac token before infra forget
timidri Oct 17, 2022
1f0f3ee
list active nodes
timidri Oct 18, 2022
c92d5e3
add puppetdb queries for debugging
timidri Nov 3, 2022
aab97fe
set param "legacy" to false for provision_replica
timidri Nov 3, 2022
a5cd053
add different query
timidri Nov 10, 2022
d2843b1
adding catch_errors to the provision_replica task call
timidri Nov 10, 2022
4abfcaa
disable _catch_errors
timidri Nov 11, 2022
a3125cd
merge main
timidri Feb 6, 2023
793b7f4
fix syntax and spec tests
timidri Feb 6, 2023
0585f6a
add task to delete certname from psql db
timidri Feb 9, 2023
213d8db
don't attept to run puppet on the failed primary
timidri Feb 10, 2023
bdd0c89
fix primary/replica swap sed command
timidri Feb 10, 2023
2fb1e52
fix swap
timidri Feb 10, 2023
7715a98
fix log level
timidri Feb 10, 2023
7d973da
another sed command fix
timidri Feb 10, 2023
9ab4deb
add updating inventory file
timidri Feb 12, 2023
8ad4fdf
Merge branch 'main' into SOLARCH-674
timidri Feb 12, 2023
b793f1f
display peadm config before and after node manager config
timidri Feb 14, 2023
4e3e2c4
determine avail groups properly
timidri Feb 15, 2023
bcfadfb
merge main
timidri Feb 17, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 32 additions & 21 deletions .github/workflows/test-failover.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ on:
description: 'Boolean; whether or not to pause for ssh debugging'
required: true
default: 'false'
log_level:
description: 'Bolt log level'
required: false
default: 'debug'

env:
HONEYCOMB_WRITEKEY: 7f3c63a70eecc61d635917de46bea4e6
Expand All @@ -37,13 +41,12 @@ jobs:
strategy:
fail-fast: false
matrix:
architecture:
- "extra-large-with-dr-and-spare-replica"
version:
- "${{ github.event.inputs.version }}"
image:
- "${{ github.event.inputs.image }}"

architecture:
- "extra-large-with-dr"
steps:
- name: 'Start SSH session'
if: ${{ github.event.inputs.ssh-debugging == 'true' }}
Expand Down Expand Up @@ -89,7 +92,7 @@ jobs:
echo STEP_START=$(date +%s) >> $GITHUB_ENV
echo ::endgroup::

- name: 'Provision test cluster (specified architecture with added DR)'
- name: 'Provision test cluster (XL with spare replica)'
timeout-minutes: 15
run: |
echo ::group::prepare
Expand All @@ -106,13 +109,7 @@ jobs:
--modulepath spec/fixtures/modules \
provider=provision_service \
image=${{ matrix.image }} \
architecture=${{ matrix.architecture }}-with-dr
buildevents cmd $TRACE_ID $STEP_ID 'bolt task run provision::provision_service' -- \
bundle exec bolt bolt task run provision::provision_service \
--modulepath spec/fixtures/modules \
action=provision
platform=${{ matrix.image }} \
vars="role: primary"
architecture=${{ matrix.architecture }}-and-spare-replica
echo ::endgroup::

echo ::group::info:request
Expand All @@ -136,7 +133,7 @@ jobs:
timeout-minutes: 120
run: |
buildevents cmd $TRACE_ID $STEP_ID 'bolt plan run peadm_spec::install_test_cluster' -- \
bundle exec bolt plan run peadm_spec::install_test_cluster \
bundle exec bolt plan run peadm_spec::install_test_cluster --log_level ${{ github.event.inputs.log_level }} \
--inventoryfile spec/fixtures/litmus_inventory.yaml \
--modulepath spec/fixtures/modules \
architecture=${{ matrix.architecture }} \
Expand All @@ -154,11 +151,9 @@ jobs:
- name: 'Perform failover'
run: |
buildevents cmd $TRACE_ID $STEP_ID 'bolt plan run peadm_spec::perform_failover' -- \
bundle exec bolt plan run peadm_spec::perform_failover \
bundle exec bolt plan run peadm_spec::perform_failover --log_level ${{ github.event.inputs.log_level }} \
--inventoryfile spec/fixtures/litmus_inventory.yaml \
--modulepath spec/fixtures/modules \
platform=${{ matrix.image }} \
vars="role: primary"
--modulepath spec/fixtures/modules

- name: "Honeycomb: Record falover time"
if: ${{ always() }}
Expand All @@ -178,20 +173,36 @@ jobs:
done
echo "${HOME}/pause absent, continuing workflow."

- name: Set up yq
uses: frenck/action-setup-yq@v1
with:
version: v4.30.5

- name: 'Update inventory'
run: |
# Remove failed primary
yq -i 'del(.groups[].targets[] | select(.vars.role == "primary"))' spec/fixtures/litmus_inventory.yaml
# Swap primary and replica nodes
sed -i.sedbak 's/primary/__tmp__/;s/spare-replica/__tmp2__/;s/replica/primary/;s/__tmp__/replica/;s/__tmp2__/replica/' \
spec/fixtures/litmus_inventory.yaml
echo ::group::info:inventory
sed -e 's/password: .*/password: "[redacted]"/' < spec/fixtures/litmus_inventory.yaml || true
echo ::endgroup::

- name: 'Upgrade PE on test cluster'
if: ${{ always() && github.event.inputs.version_to_upgrade != '' }}
if: ${{ success() && github.event.inputs.version_to_upgrade != '' }}
timeout-minutes: 120
run: |
buildevents cmd $TRACE_ID $STEP_ID 'bolt plan run peadm_spec::upgrade_test_cluster' -- \
bundle exec bolt plan run peadm_spec::upgrade_test_cluster \
bundle exec bolt plan run peadm_spec::upgrade_test_cluster --log_level ${{ github.event.inputs.log_level }} \
--inventoryfile spec/fixtures/litmus_inventory.yaml \
--modulepath spec/fixtures/modules \
architecture='extra-large-with-dr' \
architecture=${{ matrix.architecture }} \
download_mode='direct' \
version=${{ matrix.version_to_upgrade }}
version=${{ github.event.inputs.version_to_upgrade }}

- name: "Honeycomb: Record upgrade time"
if: ${{ always() && github.event.inputs.version_to_upgrade != '' }}
if: ${{ success() && github.event.inputs.version_to_upgrade != '' }}
run: |
echo ::group::honeycomb
buildevents step $TRACE_ID $STEP_ID $STEP_START 'Upgrade PE on test cluster'
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,4 @@ spec/docker/**/*.tar.gz
spec/docker/**/*.asc
spec/docker/**/files/puppet-enterprise*
spec/docker/.task_cache.json
.vscode/settings.json
2 changes: 1 addition & 1 deletion functions/assert_supported_architecture.pp
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ function peadm::assert_supported_architecture (
# lint:ignore:strict_indent
default: { # Invalid
out::message(inline_epp(@(HEREDOC)))
Invalid architecture! Recieved:
Invalid architecture! Received:
- primary
<% if $replica_host { -%>
- primary-replica
Expand Down
4 changes: 2 additions & 2 deletions functions/assert_supported_pe_version.pp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ function peadm::assert_supported_pe_version (
if $permit_unsafe_versions {
# lint:ignore:strict_indent
warning(@("WARN"/L))
WARNING: Permitting unsafe PE versions. This is not supported or tested.
WARNING: Permitting unsafe PE versions. This is not supported or tested.
Proceeding with this action could result in a broken PE Infrastructure.
| WARN
# lint:endignore
Expand All @@ -21,7 +21,7 @@ function peadm::assert_supported_pe_version (
if (!$supported and $permit_unsafe_versions) {
# lint:ignore:strict_indent
warning(@("WARN"/L))
WARNING: PE version ${version} is NOT SUPPORTED!
WARNING: PE version ${version} is NOT SUPPORTED!
| WARN
# lint:endignore
}
Expand Down
8 changes: 5 additions & 3 deletions plans/add_replica.pp
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
# @summary Replace a replica host for a Standard or Large architecture.
# Supported use cases:
# 1: The existing replica is broken, we have a fresh new VM we want to provision the replica to.
# The new replica should have the same certname as the broken one.
# @param primary_host - The hostname and certname of the primary Puppet server
# @param replica_host - The hostname and certname of the replica VM
# @param replica_postgresql_host - The hostname and certname of the host with the replica PE-PosgreSQL database.
# @param replica_postgresql_host - The hostname and certname of the host with the replica PE-PosgreSQL database.
# @param token_file - (optional) the token file in a different location than the default.
#
# Can be a separate host in an XL architecture, or undef in Standard or Large.
plan peadm::add_replica(
# Standard or Large
Expand Down Expand Up @@ -119,7 +120,8 @@
# Race condition, where the provision command checks PuppetDB status and
# probably gets "starting", but fails out because that's not "running".
# Can remove flag when that issue is fixed.
legacy => true,
legacy => false,
# _catch_errors => true, # testing
)

# start puppet service
Expand Down
1 change: 1 addition & 0 deletions plans/modify_certificate.pp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
# TODO: convert $add_extensions and $remov_extensions to OIDs, if friendly
# names have been given

out::message("peadm::modify_certificate: primary host: ${primary_target} - ${primary_target.name} - ${primary_target.uri}")
$primary_certname = run_task('peadm::cert_data', $primary_target).first['certname']

# Do the primary first, if it's in the list
Expand Down
2 changes: 1 addition & 1 deletion plans/subplans/install.pp
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@
# lint:ignore:strict_indent
warning(@("HEREDOC"))
WARNING: Target name / hostname mismatch: target ${name} reports ${result['hostname']}
Certificate name will be set to target name. Please ensure target name is correct and resolvable
Certificate name will be set to target name. Please ensure target name is correct and resolvable
|-HEREDOC
# lint:endignore
}
Expand Down
2 changes: 1 addition & 1 deletion plans/subplans/modify_certificate.pp
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
# The docs are broken, and the process is unclean. Sadface.
run_task('service', $target, { action => 'stop', name => 'pe-puppetserver' })
run_command(@("HEREDOC"/L), $target)
rm -f \
rm -f \
/etc/puppetlabs/puppet/ssl/certs/${certname}.pem \
/etc/puppetlabs/puppet/ssl/private_keys/${certname}.pem \
/etc/puppetlabs/puppet/ssl/public_keys/${certname}.pem \
Expand Down
6 changes: 5 additions & 1 deletion plans/subplans/prepare_agent.pp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
$agent_target = peadm::get_targets($targets, 1)
$primary_target = peadm::get_targets($primary_host, 1)

out::message("Preparing agent ${agent_target} to connect to ${primary_target}")
out::message("agent target ${agent_target} to connect to ${primary_target}")

$dns_alt_names_flag = $dns_alt_names? {
undef => [],
default => ["main:dns_alt_names=${dns_alt_names.join(',')}"],
Expand Down Expand Up @@ -80,8 +83,9 @@

# If agent certificate is good but lacks appropriate extensions, plan will still
# regenerate certificate
out::message("primary target: ${primary_target}, certname: ${primary_target.peadm::certname()}, uri: ${primary_target[0].uri}")
run_plan('peadm::modify_certificate', $agent_target,
primary_host => $primary_target.peadm::certname(),
primary_host => $primary_target,
add_extensions => $certificate_extensions,
force_regenerate => $force_regenerate
)
Expand Down
33 changes: 29 additions & 4 deletions plans/upgrade.pp
Original file line number Diff line number Diff line change
Expand Up @@ -255,20 +255,45 @@
},
)

# Log the peadm configuration before node manager setup
run_task('peadm::get_peadm_config', $primary_target)

# Update classification. This needs to be done now because if we don't, and
# the PE Compiler node groups are wrong, then the compilers won't be able to
# successfully classify and update

# First, determine the correct hosts for the A and B availability groups
$server_a_host = $cert_extensions.dig($primary_target.peadm::certname(), peadm::oid('peadm_availability_group')) ? {
'A' => $primary_target.peadm::certname(),
default => $replica_target.peadm::certname(),
}

$server_b_host = $server_a_host ? {
$primary_target.peadm::certname() => $replica_target.peadm::certname(),
default => $primary_target.peadm::certname(),
}

$postgresql_a_host = $cert_extensions.dig($primary_postgresql_target.peadm::certname(), peadm::oid('peadm_availability_group')) ? {
'A' => $primary_postgresql_target.peadm::certname(),
default => $replica_postgresql_target.peadm::certname(),
}

$postgresql_b_host = $postgresql_a_host ? {
$primary_postgresql_target.peadm::certname() => $replica_postgresql_target.peadm::certname(),
default => $primary_postgresql_target.peadm::certname(),
}

apply($primary_target) {
class { 'peadm::setup::node_manager_yaml':
primary_host => $primary_target.peadm::certname(),
}

class { 'peadm::setup::node_manager':
primary_host => $primary_target.peadm::certname(),
server_a_host => $primary_target.peadm::certname(),
server_b_host => $replica_target.peadm::certname(),
postgresql_a_host => $primary_postgresql_target.peadm::certname(),
postgresql_b_host => $replica_postgresql_target.peadm::certname(),
server_a_host => $server_a_host,
server_b_host => $server_b_host,
postgresql_a_host => $postgresql_a_host,
postgresql_b_host => $postgresql_b_host,
compiler_pool_address => $compiler_pool_address,
internal_compiler_a_pool_address => $internal_compiler_a_pool_address,
internal_compiler_b_pool_address => $internal_compiler_b_pool_address,
Expand Down
4 changes: 1 addition & 3 deletions spec/acceptance/peadm_spec/plans/add_replica.pp
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
plan peadm_spec::add_replica(
){

) {
$t = get_targets('*')
wait_until_available($t)

Expand All @@ -22,5 +21,4 @@
replica_host => $replica_host,
replica_postgresql_host => $replica_postgresql_host ? { [] => undef, default => $replica_postgresql_host },
)

}
29 changes: 14 additions & 15 deletions spec/acceptance/peadm_spec/plans/install_test_cluster.pp
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
Boolean $permit_unsafe_versions = false,
Enum['enable', 'disable'] $fips = 'disable'
) {

$t = get_targets('*')
wait_until_available($t)

Expand Down Expand Up @@ -34,36 +33,36 @@

$arch_params =
case $architecture {
'standard': {{
'standard': { {
primary_host => $t.filter |$n| { $n.vars['role'] == 'primary' },
}}
'standard-with-dr': {{
} }
'standard-with-dr': { {
primary_host => $t.filter |$n| { $n.vars['role'] == 'primary' },
replica_host => $t.filter |$n| { $n.vars['role'] == 'replica' },
}}
'large': {{
} }
'large': { {
primary_host => $t.filter |$n| { $n.vars['role'] == 'primary' },
compiler_hosts => $t.filter |$n| { $n.vars['role'] == 'compiler' },
}}
'large-with-dr': {{
} }
'large-with-dr': { {
primary_host => $t.filter |$n| { $n.vars['role'] == 'primary' },
replica_host => $t.filter |$n| { $n.vars['role'] == 'replica' },
compiler_hosts => $t.filter |$n| { $n.vars['role'] == 'compiler' },
}}
'extra-large': {{
} }
'extra-large': { {
primary_host => $t.filter |$n| { $n.vars['role'] == 'primary' },
primary_postgresql_host => $t.filter |$n| { $n.vars['role'] == 'primary-pdb-postgresql' },
compiler_hosts => $t.filter |$n| { $n.vars['role'] == 'compiler' },
}}
'extra-large-with-dr': {{
} }
'extra-large-with-dr': { {
primary_host => $t.filter |$n| { $n.vars['role'] == 'primary' },
primary_postgresql_host => $t.filter |$n| { $n.vars['role'] == 'primary-pdb-postgresql' },
replica_host => $t.filter |$n| { $n.vars['role'] == 'replica' },
replica_postgresql_host => $t.filter |$n| { $n.vars['role'] == 'replica-pdb-postgresql' },
compiler_hosts => $t.filter |$n| { $n.vars['role'] == 'compiler' },
}}
default: { fail('Invalid architecture!') }
}
} }
default: { fail('Invalid architecture!') }
}

$install_result =
run_plan('peadm::install', $arch_params + $common_params)
Expand Down
Loading