-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(MODULES-10023) Fix multiple xtrabackup regressions #1245
Conversation
I've successfully verified that both full and incremental backups are working perfectly fine after applying the proposed fixes. Verified: Full backupsThe daily cron job works as expected...
...and produces the following directory structure (depends on the time when the cron job ran):
Note: I ran the cron job twice manually for this test case. Verified: Incremental backupsThe weekly cronjob creates the full backup just fine...
...and the daily cron job is able to perform incremental backups based on the weekly full backup;
...and it produces the following directory structure (depends on the time when the cron job ran):
Note: I had to rename the manually created first full backup to match last sunday's date to make this manual test case work. But I'm confident that the cron job works perfectly fine now, I just did not want to wait until sunday :) |
@fraenki Thanks for the PR, the comprehensive descriptions and test coverage you've added. Looking through the change with my colleagues and we'll get back to you a.s.a.p. |
@fraenki After some discussion internally, we were thinking it might be good to add an acceptance test for the xtrabackup functionality to ensure we don't run in to this scenario again. Do you feel that's something you can add? We can help / support if needs be. |
@cmccrisken-puppet Sure, I'll have a look. Are there any existing acceptance tests from Puppetlabs that I should use as a reference? |
@lionce Before I dive into this: are the module's tests expected to run without failure? Because they ALL fail for me:
|
Hey @fraenki , The tests should be successfully ran. e.g. you can check all the steps and the results here on travis report
If you want to provision just one platform, you should use: Cheers! |
@cmccrisken-puppet @lionce I've added the following to this PR:
One step closer, only two CI failures that I can't explain:
Any advice on these two issues? :) |
@fraenki Reopening the PR to rerun the Travis jobs. Thank you. |
Hi @fraenki, Both these failures seem to be related to the installation of the percona package and its dependencies:
I recommend you try to run the installation of the percona package manually from inside the containers to debug these issues a bit easier. Using Litmus you can provision the images on your machine using: In your inventory.yaml (will be created after the provisioning) file you will find the necessary details to ssh into the container and run manually from there. |
@carabasdaniel Thanks for the hints. All tests are working now. Ready for merge :) (Fixes in 3e0e11a: On Debian and older versions of Ubuntu the xtrabackup package name was wrong. Debian also requires the gnupg2 package. On Scientific Linux 6 the value of $releasever in YUM differs from RedHat/CentOS, making the Percona repo incompatible (an upstream issue, nothing that should be handled in this module); as a workaround $releasever is set to the same value that works on RedHat/CentOS.) |
Hi @fraenki Thanks again for submitting this PR. Looks really great now. In order to get this PR merged we have to ensure that this feature works on all the OSs this module supports. Running the full suite of acceptance tests seems to point out there are still failures on installing percona-xtrabackup on a few machines:
|
@carabasdaniel I've addressed these issues in 7d43a58. How can I run this "full suite of acceptance tests"? Besides that after the latest commit the RHEL6/7 tests fail on travis, although they work perfectly fine locally. No idea how this travis "timeout" error should be useful to anyone. :-/ Maybe you have access to logs or debug output.
|
@fraenki Thank you for incorporating the comments. We use vmpoolers to run the full acceptance tests. As of now we don't have an option to run full acceptance tests outside the puppet . It is failing with following error On the following platforms Please let us know if you need more information. Thank you |
@david22swan Thanks for fixing the travis tests! 👍 @sheenaajay I've addressed the CentOS/RHEL 5 errors in 3efa123. However, RHEL5 is EOL so the available docker images are pretty useless and couldn't run all tests. I couldn't find docker images for SLES11 so I can't test it at all. Regarding the EL6 failures: Why don't these failures show up in travis, or: why does Puppetlabs use different images for the final tests? Since I can't trigger these failures, I have to ask you for the full error messages for all EL6-like operating systems. |
Thank you for incorporating the comments. For PR level we perform testing on the following docker images which are available (debian8 debian9 debian10 ubuntu14.04 ubuntu16.04 ubuntu18.04 centos6 scientificlinux6 centos7 oraclelinux7 scientificlinux7) But during bug fix or release testing, we cover most of the remaining supported platforms during the testing. Will send you the error information for the failing platform. |
Hi @fraenki ,Please find the failures below. Let us know if you need more information. sles-11-x86_64 `Failures:
oracle-5-x86_64 `Failures:
redhat-8-x86_64 `Failures:
|
@sheenaajay Commit 1f8eefc disables the new xtrabackup tests on all EOL operating systems, so it should finally pass your internal tests. NOTE: Other tests started failing, because the tests are now run on Puppet 6.14 – this Puppet Agent bug is to blame. No failures when running these tests on Puppet 6.13 or Puppet 6 nightly (aka 6.15 prerelease). @sheenaajay Please, if there's yet another minor issue that could easily be fixed, may I ask you to just merge this PR and commit the fix afterwards? It's been 5 months already, and this is starting to get really frustrating. |
@fraenki Sorry for the delay. Will be talking to the team on priority to review this PR. We can fi the minor problems too. Thanks alot for your contribution. Much appreciated your help. |
@fraenki The PR is top of the list to get it merged. There is small change to exclude it from running on Oracle 6 box the (os[:family] is Redhat instead of redhat). We will do the changes now. |
https://tickets.puppetlabs.com/browse/MODULES-10023
1. Regression: xtrabackup nonfunctional due to backup dir change
Currently the xtrabackup provider is nonfunctional when
$incremental_backups
is set tofalse
. Performing the backup will produce an error starting with the 2nd run (the very first backup is successful):This regression was introduced in #1188 with commit 155f8a1. The value of
--target-dir
was changed fromto just
${backupdir}
.As a result, every backup is written to the exact same target directory, which is not supported by xtrabackup and results in the above error message. So after performing the first backup (which succeeds because the target directory is still empty), all following backup jobs will fail.
Furthermore, this change renders the
$backuprotate
parameter nonfunctional, because only one backup can be kept this way.I think this was an oversight and should be seen as a regression, because it effectively breaks xtrabackup for full backups. That's why I propose to revert
--target-dir
back to the old (working) value.2. Regression: backup rotation nonfunctional due to incompatible find
While working on the first regression, I've noticed a second regression that is directly related. #1176 added backup rotation for xtrabackup. However, the
find
that was added in commit 9834d9a is incompatible:Issues:
.sql
file extension, but instead multiple other file types.${PREFIX}
is nowhere to be found.-maxdepth 1
is incompatible, because a backup may contain several (sub-) directories – and every full backup should be stored in it's own directory (see previous regression).It looks like this
find
was copied from another backup provider, so I propose to change it in a way that deals with xtrabackup's uniqueness. The new approach assumes that every backup (incremental or full) resides in it's own sub-directory, so it's pretty easy to remove an obsolete backup by simply removing the whole directory.Small quality of life improvements
While working on fixing the regressions I've implemented small changes that improve the xtrabackup experience:
date
difference.