Summary
We've got some large master nodes which are having large etcd database and during upgrade when rolling restart occurs there can be timeouts on restarting service. It will fail on node and start restarting next one, which is not wanted.
what we can do here is:
- ignore this error and add another task with retry to check if unit is started
- add ability to provide override.conf for service with eg. extended timeout (TimeoutSec,TimeoutStartSec).
Issue Type
Feature Idea