Skip to content

HA failover doesn't work when a K8s host fails #633

@nobuto-m

Description

@nobuto-m

From: https://bugs.launchpad.net/snap-openstack/+bug/2106020

Steps to reproduce

Please see the complete reproducer as the attachment, but the rough steps are:

  1. bootstrap local LXD controller (for simplicity instead of using MAAS for a real-world scenario)
  2. deploy k8s in 3 KVM machines
  3. deploy mysql-k8s on those 3 Kubernetes workers to have an HA MySQL cluster
  4. kill one of the Kubernetes worker aka one KVM machine

Expected behavior

The failover of MySQL server succeeds as long as 2 replicas are still alive out of 3.

Actual behavior

MySQL stops working. You can see "offline" and "error" from two replicas alive.

$ juju status mysql
Model           Controller  Cloud/Region             Version  SLA          Timestamp
openstack-test  localhost   myk8scloud-test/default  3.6.8    unsupported  04:57:04Z

App    Version                  Status   Scale  Charm      Channel   Rev  Address        Exposed  Message
mysql  8.0.41-0ubuntu0.22.04.1  waiting    2/3  mysql-k8s  8.0/edge  261  10.152.183.23  no       installing agent

Unit      Workload     Agent  Address     Ports  Message
mysql/0   maintenance  idle   10.1.2.72          error
mysql/1   unknown      lost   10.1.0.215         agent lost, see 'juju show-status-log mysql/1'
mysql/2*  maintenance  idle   10.1.1.148         offline
$ kubectl get pod -A -o wide | grep mysql-
openstack-test   keystone-mysql-router-0               2/2     Running       0             50m   10.1.2.172   juju-768c48-1   <none>           <none>
openstack-test   keystone-mysql-router-1               2/2     Terminating   0             50m   10.1.0.227   juju-768c48-0   <none>           <none>
openstack-test   keystone-mysql-router-2               2/2     Running       0             50m   10.1.1.224   juju-768c48-2   <none>           <none>
openstack-test   mysql-0                               2/2     Running       0             50m   10.1.2.72    juju-768c48-1   <none>           <none>
openstack-test   mysql-1                               2/2     Terminating   0             50m   10.1.0.215   juju-768c48-0   <none>           <none>
openstack-test   mysql-2                               2/2     Running       0             50m   10.1.1.148   juju-768c48-2   <none>           <none>

Versions

Operating system: 24.04 LTS

Juju CLI: 3.6.8

Juju agent: 3.6.8

Charm revision: mysql-k8s 8.0/edge 261

microk8s: N/A
k8s: v1.32.6 3789 latest/stable canonical✓ classic,held

Log output

Juju debug log: debug.log

unit-mysql-2: 04:34:55 ERROR unit.mysql/2.juju-log Failed to execute mysql-shell command
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-2/charm/src/mysql_k8s_helpers.py", line 724, in _run_mysqlsh_script
    stdout, _ = process.wait_output()
  File "/var/lib/juju/agents/unit-mysql-2/charm/venv/lib/python3.10/site-packages/ops/pebble.py", line 1771, in wait_output
    raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['timeout', '30', '/usr/bin/mysqlsh', '--passwords-from-stdin', '--uri=serverconfig@mysql-2.mysql-endpoints.openstack-test.svc.cluster.local.:33062', '--python', '--verbose=0', '-c', "shell.options.set('useWizards', False)\nprint('###')\ncluster = dba.get_cluster('cluster-9635420284348b0774eae65a3c46ad4a')\nprint(cluster.status({'extended': False}))"], stdout="\x1b[1mPlease provide the password for 'serverconfig@mysql-2.mysql-endpoints.openstack-test.svc.cluster.local.:33062': \x1b[0m###\n", stderr='Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nTraceback (most recent call last):\n  File "<string>", line 3, in <module>\nmysqlsh.Error: Shell Error (51314): Dba.get_cluster: This function is not available through a session to an InnoDB Cluster that belongs to an InnoDB ClusterSet but is not ONLINE\n'
unit-mysql-2: 04:34:55 ERROR unit.mysql/2.juju-log Failed to get cluster status for cluster-9635420284348b0774eae65a3c46ad4a
unit-mysql-2: 04:35:01 WARNING unit.mysql/2.juju-log Failed to execute mysql-shell command
unit-mysql-2: 04:35:01 WARNING unit.mysql/2.juju-log Failed to get cluster set status
unit-mysql-2: 04:35:01 ERROR unit.mysql/2.juju-log Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-2/charm/src/mysql_k8s_helpers.py", line 906, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.charm.get_cluster_endpoints(relation_name)
  File "/var/lib/juju/agents/unit-mysql-2/charm/lib/charms/tempo_coordinator_k8s/v0/charm_tracing.py", line 1065, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-mysql-2/charm/lib/charms/mysql/v0/mysql.py", line 789, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster topology")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster topology

Additional context

Reproducer:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions