Skip to content

[skip secret scan] Topology solution: Zookeeper requires leader restart #254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions hybrid/multi-region-clusters/external-access/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@ kubectl apply -f $TUTORIAL_HOME/confluent-platform/rolebindings/c3-rolebindings.
```

### Deploy ZK and Kafka clusters
Here, you'll deploy a 3 node Zookeeper cluster - one node each in the `central`, `east` and `west` regions.
Here, you'll deploy a 5 node Zookeeper cluster - one node in the `central` region and two nodes in each of the, `east` and `west` regions.
You'll deploy 1 Kafka cluster with 6 brokers - two in `central`, two in `east` and two in `west` regions.
```
kubectl apply -f $TUTORIAL_HOME/confluent-platform/zookeeper/zookeeper-central.yaml --context mrc-central
Expand Down Expand Up @@ -491,29 +491,29 @@ kubectl delete ns central --context mrc-central

### Kafka not starting up

If you see in the ZK logs
If you see in the ZK logs:

```
org.apache.zookeeper.server.NettyServerCnxnFactory exceptionCaught - Exception caught
java.lang.NullPointerException
Have smaller server identifier, so dropping the connection
```

or
An immediate fix is to try to restart the Zookeeper leader (by deleting that pod) and wait for it to come up again.

```
Have smaller server identifier, so dropping the connection
```
The long term solution requires mitigation of https://issues.apache.org/jira/browse/ZOOKEEPER-2938 by configuring Zookeeper to:

* Bind to `0.0.0.0` instead of a DNS name
* Have only one replica per `Zookeeper` resource

Try to restart the Zookeeper leader (by deleting that pod) and wait for it to come up again.
You can configure multiple, separate `Zookeeper` resources in a region if required.

The root cause of this error is likely to be https://issues.apache.org/jira/browse/ZOOKEEPER-3988 which is fixed in Zookeeper 3.6.4 and 3.7.1/3. Upgrading to Confluent for Kubernetes 2.6.1 and Confluent Platform 7.4.1 is recommended.
For a complete discussion of why this topology is required see https://docs.confluent.io/operator/current/co-multi-region.html#issue-zk-does-not-start-up

### Check that Kafka is using the Zookeeper deployments

Look at the ZK nodes.

```
$ kubectl exec -it zookeeper-0 -n central -c zookeeper --context mrc-central -- bash
$ kubectl exec -it zookeeper0-0 -n central -c zookeeper --context mrc-central -- bash

bash-4.4$ zookeeper-shell 127.0.0.1:2181

Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
apiVersion: platform.confluent.io/v1beta1
kind: Zookeeper
metadata:
name: zookeeper
name: zookeeper0
namespace: central
annotations:
platform.confluent.io/zookeeper-myid-offset: "0"
spec:
externalAccess:
type: loadBalancer
Expand All @@ -17,15 +19,15 @@ spec:
secretRef: credential
type: digest
peers:
- server.0=zookeeper-0.zookeeper.central.svc.cluster.local:2888:3888
- server.0=0.0.0.0:2888:3888
- server.10=zk-east0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.11=zk-east1.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.20=zk-west0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.21=zk-west1.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
dataVolumeCapacity: 10Gi
image:
application: confluentinc/cp-zookeeper:7.4.0
init: confluentinc/confluent-init-container:2.6.0
application: confluentinc/cp-zookeeper:7.4.1
init: confluentinc/confluent-init-container:2.6.1
logVolumeCapacity: 10Gi
replicas: 1
tls:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ kind: Zookeeper
metadata:
annotations:
platform.confluent.io/zookeeper-myid-offset: "10"
name: zookeeper
name: zookeeper10
namespace: east
spec:
externalAccess:
Expand All @@ -20,15 +20,50 @@ spec:
type: digest
peers:
- server.0=zk-central0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.10=zookeeper-0.zookeeper.east.svc.cluster.local:2888:3888
- server.11=zookeeper-1.zookeeper.east.svc.cluster.local:2888:3888
- server.10=0.0.0.0:2888:3888
- server.11=zk-east1.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.20=zk-west0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.21=zk-west1.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
dataVolumeCapacity: 10Gi
image:
application: confluentinc/cp-zookeeper:7.4.0
init: confluentinc/confluent-init-container:2.6.0
application: confluentinc/cp-zookeeper:7.4.1
init: confluentinc/confluent-init-container:2.6.1
logVolumeCapacity: 10Gi
replicas: 2
replicas: 1
tls:
autoGeneratedCerts: true
---
apiVersion: platform.confluent.io/v1beta1
kind: Zookeeper
metadata:
annotations:
platform.confluent.io/zookeeper-myid-offset: "11"
name: zookeeper11
namespace: east
spec:
externalAccess:
type: loadBalancer
loadBalancer:
domain: platformops.dev.gcp.devel.cpdev.cloud
advertisedURL:
enabled: true
prefix: zk-east
prefix: zk-east
authentication:
jaasConfig:
secretRef: credential
type: digest
peers:
- server.0=zk-central0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.10=zk-east0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.11=0.0.0.0:2888:3888
- server.20=zk-west0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.21=zk-west1.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
dataVolumeCapacity: 10Gi
image:
application: confluentinc/cp-zookeeper:7.4.1
init: confluentinc/confluent-init-container:2.6.1
logVolumeCapacity: 10Gi
replicas: 1
tls:
autoGeneratedCerts: true
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ kind: Zookeeper
metadata:
annotations:
platform.confluent.io/zookeeper-myid-offset: "20"
name: zookeeper
name: zookeeper20
namespace: west
spec:
externalAccess:
Expand All @@ -22,13 +22,48 @@ spec:
- server.0=zk-central0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.10=zk-east0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.11=zk-east1.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.20=zookeeper-0.zookeeper.west.svc.cluster.local:2888:3888
- server.21=zookeeper-1.zookeeper.west.svc.cluster.local:2888:3888
- server.20=0.0.0.0:2888:3888
- server.21=zk-west1.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
dataVolumeCapacity: 10Gi
image:
application: confluentinc/cp-zookeeper:7.4.0
init: confluentinc/confluent-init-container:2.6.0
application: confluentinc/cp-zookeeper:7.4.1
init: confluentinc/confluent-init-container:2.6.1
logVolumeCapacity: 10Gi
replicas: 2
replicas: 1
tls:
autoGeneratedCerts: true
---
apiVersion: platform.confluent.io/v1beta1
kind: Zookeeper
metadata:
annotations:
platform.confluent.io/zookeeper-myid-offset: "21"
name: zookeeper21
namespace: west
spec:
externalAccess:
type: loadBalancer
loadBalancer:
domain: platformops.dev.gcp.devel.cpdev.cloud
advertisedURL:
enabled: true
prefix: zk-west
prefix: zk-west
authentication:
jaasConfig:
secretRef: credential
type: digest
peers:
- server.0=zk-central0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.10=zk-east0.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.11=zk-east1.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.20=zk-west1.platformops.dev.gcp.devel.cpdev.cloud:2888:3888
- server.21=0.0.0.0:2888:3888
dataVolumeCapacity: 10Gi
image:
application: confluentinc/cp-zookeeper:7.4.1
init: confluentinc/confluent-init-container:2.6.1
logVolumeCapacity: 10Gi
replicas: 1
tls:
autoGeneratedCerts: true
30 changes: 15 additions & 15 deletions hybrid/multi-region-clusters/internal-listeners/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ kubectl create secret tls ca-pair-sslcerts \
--cert=$TUTORIAL_HOME/../../../assets/certs/generated/ca.pem \
--key=$TUTORIAL_HOME/../../../assets/certs/generated/ca-key.pem \
-n east --context mrc-east

kubectl create secret tls ca-pair-sslcerts \
--cert=$TUTORIAL_HOME/../../../assets/certs/generated/ca.pem \
--key=$TUTORIAL_HOME/../../../assets/certs/generated/ca-key.pem \
Expand Down Expand Up @@ -143,15 +143,15 @@ kubectl create secret generic credential \
--from-file=plain.txt=$TUTORIAL_HOME/confluent-platform/credentials/kafka-users-client.txt \
--from-file=ldap.txt=$TUTORIAL_HOME/confluent-platform/credentials/ldap-client.txt \
-n central --context mrc-central

kubectl create secret generic credential \
--from-file=digest-users.json=$TUTORIAL_HOME/confluent-platform/credentials/zk-users-server.json \
--from-file=digest.txt=$TUTORIAL_HOME/confluent-platform/credentials/zk-users-client.txt \
--from-file=plain-users.json=$TUTORIAL_HOME/confluent-platform/credentials/kafka-users-server.json \
--from-file=plain.txt=$TUTORIAL_HOME/confluent-platform/credentials/kafka-users-client.txt \
--from-file=ldap.txt=$TUTORIAL_HOME/confluent-platform/credentials/ldap-client.txt \
-n east --context mrc-east

kubectl create secret generic credential \
--from-file=digest-users.json=$TUTORIAL_HOME/confluent-platform/credentials/zk-users-server.json \
--from-file=digest.txt=$TUTORIAL_HOME/confluent-platform/credentials/zk-users-client.txt \
Expand All @@ -168,12 +168,12 @@ kubectl create secret generic mds-token \
--from-file=mdsPublicKey.pem=$TUTORIAL_HOME/../../../assets/certs/mds-publickey.txt \
--from-file=mdsTokenKeyPair.pem=$TUTORIAL_HOME/../../../assets/certs/mds-tokenkeypair.txt \
-n central --context mrc-central

kubectl create secret generic mds-token \
--from-file=mdsPublicKey.pem=$TUTORIAL_HOME/../../../assets/certs/mds-publickey.txt \
--from-file=mdsTokenKeyPair.pem=$TUTORIAL_HOME/../../../assets/certs/mds-tokenkeypair.txt \
-n east --context mrc-east

kubectl create secret generic mds-token \
--from-file=mdsPublicKey.pem=$TUTORIAL_HOME/../../../assets/certs/mds-publickey.txt \
--from-file=mdsTokenKeyPair.pem=$TUTORIAL_HOME/../../../assets/certs/mds-tokenkeypair.txt \
Expand All @@ -184,11 +184,11 @@ kubectl create secret generic mds-token \
kubectl create secret generic mds-client \
--from-file=bearer.txt=$TUTORIAL_HOME/confluent-platform/credentials/mds-client.txt \
-n central --context mrc-central

kubectl create secret generic mds-client \
--from-file=bearer.txt=$TUTORIAL_HOME/confluent-platform/credentials/mds-client.txt \
-n east --context mrc-east

kubectl create secret generic mds-client \
--from-file=bearer.txt=$TUTORIAL_HOME/confluent-platform/credentials/mds-client.txt \
-n west --context mrc-west
Expand All @@ -198,11 +198,11 @@ kubectl create secret generic mds-client \
kubectl create secret generic sr-mds-client \
--from-file=bearer.txt=$TUTORIAL_HOME/confluent-platform/credentials/sr-mds-client.txt \
-n central --context mrc-central

kubectl create secret generic sr-mds-client \
--from-file=bearer.txt=$TUTORIAL_HOME/confluent-platform/credentials/sr-mds-client.txt \
-n east --context mrc-east

kubectl create secret generic sr-mds-client \
--from-file=bearer.txt=$TUTORIAL_HOME/confluent-platform/credentials/sr-mds-client.txt \
-n west --context mrc-west
Expand All @@ -218,11 +218,11 @@ kubectl create secret generic c3-mds-client \
kubectl create secret generic kafka-rest-credential \
--from-file=bearer.txt=$TUTORIAL_HOME/confluent-platform/credentials/mds-client.txt \
-n central --context mrc-central

kubectl create secret generic kafka-rest-credential \
--from-file=bearer.txt=$TUTORIAL_HOME/confluent-platform/credentials/mds-client.txt \
-n east --context mrc-east

kubectl create secret generic kafka-rest-credential \
--from-file=bearer.txt=$TUTORIAL_HOME/confluent-platform/credentials/mds-client.txt \
-n west --context mrc-west
Expand All @@ -238,7 +238,7 @@ kubectl apply -f $TUTORIAL_HOME/confluent-platform/rolebindings/c3-rolebindings.
```

### Deploy ZK and Kafka clusters
Here, you'll deploy a 3 node Zookeeper cluster - one node each in the `central`, `east` and `west` regions.
Here, you'll deploy a 5 node Zookeeper cluster - one node in the `central` region and two nodes in each of the, `east` and `west` regions.
You'll deploy 1 Kafka cluster with 6 brokers - two in `central`, two in `east` and two in `west` regions.
```
kubectl apply -f $TUTORIAL_HOME/confluent-platform/zookeeper/zookeeper-central.yaml --context mrc-central
Expand All @@ -258,8 +258,8 @@ kubectl apply -f $TUTORIAL_HOME/confluent-platform/kafkarestclass.yaml -n west -
```

### Deploy Schema Registry and Control Center
Now, you'll deploy a 5 node Schema Registry cluster - 1 replica in `central`, 2 in `east` and 2 in the `west` regions;
and a single instance of Control Center running in the `central` region.
Now, you'll deploy a 5 node Schema Registry cluster - 1 replica in `central`, 2 in `east` and 2 in the `west` regions;
and a single instance of Control Center running in the `central` region.
```
kubectl apply -f $TUTORIAL_HOME/confluent-platform/schemaregistry/schemaregistry-central.yaml --context mrc-central
kubectl apply -f $TUTORIAL_HOME/confluent-platform/schemaregistry/schemaregistry-east.yaml --context mrc-east
Expand Down Expand Up @@ -384,7 +384,7 @@ kubectl delete ns central --context mrc-central
Look at the ZK nodes.

```
$ kubectl exec -it zookeeper-0 -n central -c zookeeper --context mrc-central -- bash
$ kubectl exec -it zookeeper0-0 -n central -c zookeeper --context mrc-central -- bash

bash-4.4$ zookeeper-shell 127.0.0.1:2181

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: platform.confluent.io/v1beta1
kind: Zookeeper
metadata:
name: zookeeper
name: zookeeper0
namespace: central
spec:
authentication:
Expand All @@ -10,15 +10,15 @@ spec:
type: digest
configOverrides:
peers:
- server.0=zookeeper-0.zookeeper.central.svc.cluster.local:2888:3888
- server.0=0.0.0.0:2888:3888
- server.10=zookeeper-0.zookeeper.east.svc.cluster.local:2888:3888
- server.11=zookeeper-1.zookeeper.east.svc.cluster.local:2888:3888
- server.20=zookeeper-0.zookeeper.west.svc.cluster.local:2888:3888
- server.21=zookeeper-1.zookeeper.west.svc.cluster.local:2888:3888
dataVolumeCapacity: 10Gi
image:
application: confluentinc/cp-zookeeper:7.4.0
init: confluentinc/confluent-init-container:2.6.0
application: confluentinc/cp-zookeeper:7.4.1
init: confluentinc/confluent-init-container:2.6.1
logVolumeCapacity: 10Gi
replicas: 1
tls:
Expand Down
Loading