Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grubcmdline notifies reboot handler unnecessarily #36

Open
DavidFair opened this issue Dec 20, 2024 · 0 comments
Open

grubcmdline notifies reboot handler unnecessarily #36

DavidFair opened this issue Dec 20, 2024 · 0 comments

Comments

@DavidFair
Copy link

Whilst configuring machines with the grubcmdline the check for the command line is not idempotent, so the same machine will reboot every time.

I suspect this is either

  • related to crashkernel appearing multiple times in the args (which we should fix in the caller, but should the role does not guard against) or
  • resume=UUID gets packed into the cmdline multiple times

The role should either throw a fatal: message, or not notify the reboot handler in these cases

Logs

TASK [stackhpc.linux.grubcmdline : Notify reboot handler] **********************
Friday 20 December 2024  11:22:41 +0000 (0:00:00.049)       0:00:08.921 ******* 
changed: [hv.example.com] => 
  msg: Old command line was BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-503.15.1.el9_5.x86_64 root=UUID=6caabe9f-f426-4105-a38e-52e98545a68a ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=
c38017eb-0531-46fa-bf97-a9265a9caed4 rhgb quiet crashkernel=auto rhgb quiet crashkernel=auto
....
TASK [stackhpc.linux.grubcmdline : Display GRUB_CMDLINE_LINUX_DEFAULT] *********
Friday 20 December 2024  11:22:42 +0000 (0:00:00.039)       0:00:09.543 ******* 
ok: [hv.example.com] => 
  grub_cmdline_linux_default: crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=c38017eb-0531-46fa-bf97-a9265a9caed4 rhgb quiet crashkernel=auto rhgb quiet crashkernel=auto
...
TASK [stackhpc.linux.grubcmdline : Display GRUB_CMDLINE_LINUX] *****************
Friday 20 December 2024  11:22:42 +0000 (0:00:00.038)       0:00:09.582 ******* 
ok: [hv.example.com] => 
  grub_cmdline_linux: crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=c38017eb-0531-46fa-bf97-a9265a9caed4 rhgb quiet crashkernel=auto rhgb quiet crashkernel=auto
.....
.....
  
  
TASK [stackhpc.linux.grubcmdline : Display newly computed GRUB_CMDLINE_LINUX_DEFAULT] ***
Friday 20 December 2024  11:22:42 +0000 (0:00:00.046)       0:00:09.816 ******* 
ok: [hv.example.com] => 
  grub_cmdline_linux_new:
  - crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M
  - resume=UUID=c38017eb-0531-46fa-bf97-a9265a9caed4
  - rhgb
  - quiet
  - crashkernel=auto
  - rhgb
  - quiet
  - crashkernel=auto

After reboot:

cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-503.15.1.el9_5.x86_64 root=UUID=6caabe9f-f426-4105-a38e-52e98545a68a ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=c38017eb-0531-46fa-bf97-a9265a9caed4 rhgb quiet crashkernel=auto rhgb quiet crashkernel=auto

Calling Module

  vars:
    kernel_cmdline_args:
      - "rhgb"
      - "quiet"
      - "crashkernel=auto"
    kernel_cmdline_args_remove:
      - hugepage
      - selinux
  roles:
    - name: stackhpc.linux.grubcmdline
      vars:
        kernel_cmdline: "{{ kernel_cmdline_args }}"
        kernel_cmdline_remove: "{{ kernel_cmdline_args_remove }}"

Own thoughts:

  • Changing the default parameters on /etc/grub2.cfg or grubby args is working around the existing tooling built into the OS and grub:

Adding args should be done through templating files into /etc/grub.d for each override (RH Grubby Docs and StackOverflow example), such as /etc/grub.d/60-stackhpc-grubcmdline-quiet then simply invoke grubby or grub-mkconfig without args

Removing entries would still edit /etc/default/grub, but ansible.builtin.lineinfile should do the "surgical cuts" instead to minimise the changes to the file rather than trying generate a "new" cmdline using Jinja templates (which could generate wrong and result in an unbootable machine).
Alternatively, a grub script can be templated out /etc/grub.d which iterates over the GRUB_CMDLINE_LINUX* variables and appends every argument back except the matching one using the for in and if statements built into grub script. This would be more correct, but I don't have an adaptable example to hand.

The biggest advantages are:

  • Prevents the role fighting upstream changes if they change the default cmdline (e.g. in-place OS upgrades)
  • Would avoid the problems with the UUID
  • Makes troubleshooting easy to diagnose if it's a distro or kernel flag change: mv /etc/grub.d/*stackhpc-grubcmdline /etc/grub.d.bak && grubby ...
  • Idempotent by default, template through ansible then use a handler for re-running grubby and notify reboot if any template / lineinfile has changed

If not the above changes:

  • I think old_cmdline != kernel_cmdline | select() | sort | list is potentially brittle as there will be more problems like this in the future. Instead it could be something akin to the following pseudo code:
changed_when: not(all(grub_cmdline_linux_new in kernel_cmdline)) or any(grub_cmdline_linux_remove in kernel_cmdline)

This would only check for keywords the user has explicitly set in the role, rather than being affected by parameters (such as root=UUID=abc...) which they have not specified at all. However, this still leaves the potential problems around directly generating /etc/default/grub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant