Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from 2018 #101

Merged
merged 17 commits into from
Jun 24, 2020
Merged

Upgrade from 2018 #101

merged 17 commits into from
Jun 24, 2020

Conversation

reidmv
Copy link
Contributor

@reidmv reidmv commented Jun 5, 2020

This PR adds the ability to upgrade from PE 2018.1 to PE 2019.7.

reidmv added 13 commits June 2, 2020 16:13
This will allow the convert plan to be used to update trusted
certificate extensions without enforcing node group changes. Such a
capability is useful for upgrading from 2018.1 to 2019.7
Puppet 5, doesn't have the `puppet ssl` command.
New style compilers don't have a peadm_role cert extension anymore; they
only have a pp_auth_role.
So that compilers classify successfully and can run Puppet
When using the orchestrator transport there is a problem with the
built-in service task when the orchestrator is upgraded but the
pxp-agents are not. Switching to run_command and `systemctl stop` during
this time avoids the problem.
Otherwise, the certs potentially can't be signed due to having
authorization extensions.
If it was stopped before, it should still be stopped after.
We now use this plan at a time when the master is 2019.x but agents
could be 2018.x. So, make it compatible.
@reidmv reidmv requested a review from a team as a code owner June 5, 2020 02:20
@timidri
Copy link
Contributor

timidri commented Jun 5, 2020

@reidmv there is 1 rubocop issue (single quotes preferred over double quotes) breaking syntax check.

@timidri
Copy link
Contributor

timidri commented Jun 5, 2020

@reidmv Can you recommend a straightforward way of testing this?
I have tried to setup a 2018.1 architecture using peadm but received error messages about peadm not supporting that version and suggested me to check out older versions. I reverted to 1.2.0 to no avail (similar message) and then 0.4.2 which didn't work for other reasons, and then I gave up :-)

@vchepkov
Copy link
Contributor

vchepkov commented Jun 5, 2020

I don't think you can install PE2018.1 using this module at all, only upgrade
It's using Deferred type during installation, which is not available in puppet 5

@reidmv
Copy link
Contributor Author

reidmv commented Jun 5, 2020

@timidri the 0.4.x branch, in which the module is named "pe_xl" rather than "peadm", is the only one that can be used to actually install 2018.1. I've been using the autope project to deploy test stacks, modifying the plan to change out "peadm" for "pe_xl".

The 2.x version of peadm supports installing 2019.7 (nothing older), and upgrading from 2018.1.x, and 2019.1.0 or newer.

@timidri
Copy link
Contributor

timidri commented Jun 5, 2020

@reidmv I did the same, but using the aws provider pe_xl fails with the error we have fixed since:

  "kind": "bolt/plan-failure",
  "msg": "Hostname / DNS name mismatch: target ec2-3-120-158-94.eu-central-1.compute.amazonaws.com reports 'ip-10-138-1-58.eu-central-1.compute.internal'",
  "details": {
  }
}```

I'll try the GCP provider now.

@timidri
Copy link
Contributor

timidri commented Jun 5, 2020

And btw, I've created a symlink from peadm to pe_xl and the plan worked under its old name

@reidmv
Copy link
Contributor Author

reidmv commented Jun 5, 2020

Yeah, the 0.4.x version insisted on hostnames matching exactly the inventory names used to connect. I can confirm that in GCP, that condition is met so the deployment seems to go smoothly.

@logicminds
Copy link
Contributor

I use the docker examples in this repo to create the 2018 stack. Will need to switch to the 0.4.x stack to do so. @vchepkov @timidri

See docker examples

@@ -13,6 +13,9 @@
# Common Configuration
String $compiler_pool_address = $master_host,
Array[String] $dns_alt_names = [ ],

# Options
Boolean $configure_node_groups = true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the $configure_node_groups should be dynamically generated rather than relying on a human. Create a task to find the value of PE version in order to make the boolean.

Copy link
Contributor

@vchepkov vchepkov Jun 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the $configure_node_groups should be dynamically generated rather than relying on a human. Create a task to find the value of PE version in order to make the boolean.

I think that option not to create additional classification is very useful. There is no need for it in standard configuration with only primary and replica. Also module doesn't provide a plan to promote a replica and one would have issues with removing classifications added by the module when primary is not available to use standard promote procedure

$pe_version = run_task('peadm::read_file', $master_target,
path => '/opt/puppetlabs/server/pe_version',
)[0][content].chomp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we have the PE version here already. So my comment above should use this value.

)
# Shut down PuppetDB on CMs that use the PM's PDB PG. Use run_command instead
# of run_task(service, ...) so that upgrading from 2018.1 works over PCP.
run_command('systemctl stop pe-puppetdb', $compiler_m1_targets)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes systemctl is available on the host. Since 2018 supports RHEL6 this wouldn't work in all cases.

https://puppet.com/docs/pe/2018.1/supported_operating_systems.html#supported_operating_systems

@@ -13,6 +13,9 @@
# Common Configuration
String $compiler_pool_address = $master_host,
Array[String] $dns_alt_names = [ ],

# Options
Boolean $configure_node_groups = true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to see this dynamically calculated instead. We can't trust humans to figure this out.

# Shut down PuppetDB on CMs that use the replica's PDB PG. Use run_command
# instead of run_task(service, ...) so that upgrading from 2018.1 works
# over PCP.
run_command('systemctl stop pe-puppetdb', $compiler_m2_targets)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same systemctl issue here

@vchepkov
Copy link
Contributor

vchepkov commented Jun 6, 2020

I don't want to mud the water here, I can open a ticket if v0.4 is still supported, but module doesn't work for me. I use Vagrant and this json:

$ python -m json.tool < provision.json 
{
    "console_password": "puppet2018",
    "dns_alt_names": [
        "puppet.localdomain"
    ],
    "master_host": "primary.localdomain",
    "master_replica_host": "replica.localdomain",
    "pe_conf_data": {
        "pe_install::disable_mco": false,
        "puppet_enterprise::profile::console::display_local_time": true,
        "puppet_enterprise::profile::master::check_for_updates": false,
        "puppet_enterprise::send_analytics_data": false
    },
    "stagingdir": "/opt/staging",
    "version": "2018.1.15"
}

$ cat Puppetfile 
#
# Forge modules
#
forge 'http://forge.puppetlabs.com'

mod 'WhatsARanjit/node_manager', :latest
mod 'puppetlabs/stdlib', :latest

# To upgrade PE2018
mod 'peadm',
  :git => 'https://github.com/puppetlabs/puppetlabs-peadm.git',
  :ref => '320b60e8404c85b6cf3a78fed5149201f98c3e6b'
# To install PE2018
mod 'pe_xl',
  :git => 'https://github.com/puppetlabs/puppetlabs-peadm.git',
  :tag => '0.4.2'

Plan fails:

$ bolt plan run pe_xl::provision --params @provision.json 
Starting: plan pe_xl::provision
Starting: plan pe_xl::unit::install
Starting: task pe_xl::hostname on primary.localdomain, replica.localdomain
Finished: task pe_xl::hostname with 0 failures in 1.05 sec
Starting: file upload from /var/folders/nn/3h090sqs4g58grz5h4bhvpsr0000gn/T/pe_xl20200606-13813-1irj15t to /tmp/pe.conf on primary.localdomain
Finished: file upload from /var/folders/nn/3h090sqs4g58grz5h4bhvpsr0000gn/T/pe_xl20200606-13813-1irj15t to /tmp/pe.conf with 0 failures in 1.11 sec
Starting: plan pe_xl::util::retrieve_and_upload
Starting: task pe_xl::filesize on local://localhost
Finished: task pe_xl::filesize with 0 failures in 0.12 sec
Starting: task pe_xl::filesize on primary.localdomain
Finished: task pe_xl::filesize with 0 failures in 0.92 sec
Finished: plan pe_xl::util::retrieve_and_upload in 1.07 sec
Starting: task pe_xl::mkdir_p_file on primary.localdomain
Finished: task pe_xl::mkdir_p_file with 0 failures in 1.34 sec
Starting: task pe_xl::pe_install on primary.localdomain
Finished: task pe_xl::pe_install on primary.localdomain
Starting: task pe_xl::mkdir_p_file on primary.localdomain
Finished: task pe_xl::mkdir_p_file with 1 failure in 0.83 sec
Finished: plan pe_xl::unit::install in 6.38 sec
Finished: plan pe_xl::provision in 6.45 sec
Failed on primary.localdomain:
  The task failed with exit code 1 and no stdout, but stderr contained:
  chown: invalid user: ‘pe-puppet’
Failed on 1 target: primary.localdomain
Ran on 1 target

@timidri
Copy link
Contributor

timidri commented Jun 8, 2020

I had the same result as @vchepkov when using autope in GCP.
However, in my case it was due to the pe_xl::filesize task assuming it's executed on Linux - the stat flags on Darwin are different. The current version of the task takes the differences into account. I replaced the task and the plan succeeded for me.

@reidmv
Copy link
Contributor Author

reidmv commented Jun 8, 2020

@vchepkov I talked to @timidri and he found one issue on 0.4.x which might be the same one you're running into.

When the plan fails, it is failing right after pe_xl::pe_install:

...
Starting: task pe_xl::pe_install on primary.localdomain
Finished: task pe_xl::pe_install on primary.localdomain
Starting: task pe_xl::mkdir_p_file on primary.localdomain
Finished: task pe_xl::mkdir_p_file with 1 failure in 0.83 sec

Because the PE installer is expected to fail on first install (Puppet can't run successfully before the database node is installed as well), the plan doesn't halt there. In 0.4.x particularly, any failure is ignored, and the plan proceeds. It looks like the pe_xl::mkdir_p_file is failing, but the real problem will have been pe_xl::pe_install failing. One way it could fail is if the PE tarball doesn't exist in /tmp at all, and so the PE installer can't be run. If the PE installer can't be run, then the pe-puppet user won't be created. If the pe-puppet user doesn't exist, an attempt to chown files to it will fail (causing pe_xl::mkdir_p_file to fail).

The bug @timidri found in 0.4.x is that pe_xl::filesize doesn't work correctly on Mac OSX in such a way that it will cause the PE installer file not to be uploaded to target systems, which will cause pe_xl::pe_install to fail. I was able to successfully provision in GCP because I was running pe_xl::provision from a Linux machine. Dimitri saw this failure because he was running from his Macbook.

If you are running from a Mac OSX machine you may have seen the same failure. If you are not, try running with the --verbose flag to get more detailed information about what is going wrong.

bolt plan run pe_xl::provision --params @provision.json --verbose

@vchepkov
Copy link
Contributor

vchepkov commented Jun 8, 2020

Yes, I am indeed using Mac and I discovered that installation file isn't being copied, so I transferred it manually and plan failed when provisioning replica instead.
I am not that concerned about it though. It's highly unlikely I will provision a new 2018 PE infrastructure anytime soon, waiting for a new LTS at this point. Thanks for following up though. As I offered earlier, I can submit a separate issue, if you still want to support PE2018 deployments at this point.

@reidmv
Copy link
Contributor Author

reidmv commented Jun 8, 2020

@vchepkov ah, understood. If you don't need to deploy 2018.1 yourself I would say then let's not worry about it. I definitely don't have any unsolved need to deploy it; the only reason it's coming up is @timidri trying to help me test the ability to upgrade from it.

New deployments of 2018.1 are not considered to be supported. 😄

@timidri
Copy link
Contributor

timidri commented Jun 15, 2020

@reidmv Unfortunately I couldn't complete my test run (with a 2018 environment created by pe_xl v 0.42). The famous last words from my peadm::upgrade plan before hanging forever were:

Starting: task peadm::puppet_infra_upgrade on pe-master-6ec0c0-0.europe-west4-a.c.dbt-gcp-project.internal
Starting: task peadm::puppet_infra_upgrade on pe-master-6ec0c0-0.europe-west4-a.c.dbt-gcp-project.internal
Running task peadm::puppet_infra_upgrade with '{"type":"compiler","targets":["pe-compiler-6ec0c0-0.europe-west4-a.c.dbt-gcp-project.internal","pe-compiler-6ec0c0-2.europe-west4-c.c.dbt-gcp-project.internal"],"_task":"peadm::puppet_infra_upgrade"}' on ["pe-master-6ec0c0-0.europe-west4-a.c.dbt-gcp-project.internal"]
Running task 'peadm::puppet_infra_upgrade' on pe-master-6ec0c0-0.europe-west4-a.c.dbt-gcp-project.internal
Initializing ssh connection to pe-master-6ec0c0-0.europe-west4-a.c.dbt-gcp-project.internal
Opened session
Running '/Users/dimitri.tischenko/git/puppetlabs-autope/modules/peadm/tasks/puppet_infra_upgrade.rb' with {"type":"compiler","targets":["pe-compiler-6ec0c0-0.europe-west4-a.c.dbt-gcp-project.internal","pe-compiler-6ec0c0-2.europe-west4-c.c.dbt-gcp-project.internal"],"_task":"peadm::puppet_infra_upgrade"}
Executing: mkdir -m 700 /tmp/c73f2498-ccb3-47d4-a566-23f3e108c0d2
Command returned successfully
Uploading /Users/dimitri.tischenko/git/puppetlabs-autope/modules/peadm/tasks/puppet_infra_upgrade.rb, to /tmp/c73f2498-ccb3-47d4-a566-23f3e108c0d2/puppet_infra_upgrade.rb
Executing: chmod u\+x /tmp/c73f2498-ccb3-47d4-a566-23f3e108c0d2/puppet_infra_upgrade.rb
Command returned successfully
Executing: chmod u\+x /tmp/c73f2498-ccb3-47d4-a566-23f3e108c0d2/wrapper.sh
Command returned successfully
Executing: id -g root
Command returned successfully
Executing: sudo -S -H -u root -p \[sudo\]\ Bolt\ needs\ to\ run\ as\ another\ user,\ password:\  sh -c cd\;\ chown\ -R\ root:0\ /tmp/c73f2498-ccb3-47d4-a566-23f3e108c0d2
Command returned successfully
Executing: sudo -S -H -u root -p \[sudo\]\ Bolt\ needs\ to\ run\ as\ another\ user,\ password:\  sh -c cd\;\ /tmp/c73f2498-ccb3-47d4-a566-23f3e108c0d2/wrapper.sh

reidmv added 2 commits June 15, 2020 15:26
Otherwise it seems there's a chance it'll re-create files we need to be
absent.
@reidmv reidmv merged commit 5653b66 into master Jun 24, 2020
@reidmv reidmv deleted the upgrade-from-2018 branch July 7, 2020 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants