Hi, it's me again 👋
I'm spiking running a production-grade Vault cluster in Enclaver.
I'm having issues joining a second node to a cluster, at the very last step where the existing leader node needs to communicate to the new-joining node with mTLS.
The client certificate is self-signed and generated by Vault, see an excerpt from the official documentation:
[...]
For the request forwarding method, the servers need direct communication with each other. In order to perform this securely, the active node also advertises, via the encrypted data store entry, a newly-generated private key (ECDSA-P521) and a newly-generated self-signed certificate designated for client and server authentication. Each standby uses the private key and certificate to open a mutually-authenticated TLS 1.2 connection to the active node via the advertised cluster address. When client requests come in, the requests are serialized, sent over this TLS-protected communication channel, and acted upon by the active node. The active node then returns a response to the standby, which sends the response back to the requesting client.
Unfortunately, this communication fails with the following error message from Vault:
{
"@level": "error",
"@message": "failed to heartbeat to",
"@module": "storage.raft",
"@timestamp": "2023-12-01T09:15:23.527220Z",
"backoff time": 2500000000,
"error": "dial tcp 10.1.54.175:8201: connect: network is unreachable",
"peer": "10.1.54.175:8201"
}
Things I've confirmed:
-
The IP address is correct.
-
The nodes can communicate over HTTP on port 8200, since prior to that last step, the new-joining node makes an HTTP call to the existing leader node to submit its desire to join the cluster.
-
The Enclaver manifest file allows both ingress on port 8201 for the existing leader and egress to the VPC CIDR for the new-joining node:
# https://edgebit.io/enclaver/docs/0.x/manifest/
version: v1
name: "enclaver-vault"
sources:
# Name and tag of the Docker container that contains the application code
app: "$SOURCE_DOCKER_IMAGE_NAME"
# Name and tag of the Docker container outputted from the build process
target: "$TARGET_DOCKER_IMAGE_NAME"
ingress:
# Vault listens on both 8200 (API) and 8201 (node-to-node communication)
- listen_port: 8200
- listen_port: 8201
egress:
allow:
# IMDS
- 169.254.169.254
# EC2 APIs for auto-join discovery
- ec2.*.amazonaws.com
# VPC CIDR
- 10.1.0.0/16
# EC2 host (I don't think we need this one)
- host
kms_proxy:
listen_port: 9999
defaults:
memory_mb: 2000
-
I tried the same setup by running the "bare" source Docker images and the node-to-node communication works fine, i.e. the second node did complete joining the cluster.
Do you know if there's something in Enclaver that would prevent this from happening, or if maybe there's a way to make this work?
Thanks, please let me know if you need additional information.
Hi, it's me again 👋
I'm spiking running a production-grade Vault cluster in Enclaver.
I'm having issues joining a second node to a cluster, at the very last step where the existing leader node needs to communicate to the new-joining node with mTLS.
The client certificate is self-signed and generated by Vault, see an excerpt from the official documentation:
Unfortunately, this communication fails with the following error message from Vault:
{ "@level": "error", "@message": "failed to heartbeat to", "@module": "storage.raft", "@timestamp": "2023-12-01T09:15:23.527220Z", "backoff time": 2500000000, "error": "dial tcp 10.1.54.175:8201: connect: network is unreachable", "peer": "10.1.54.175:8201" }Things I've confirmed:
The IP address is correct.
The nodes can communicate over HTTP on port 8200, since prior to that last step, the new-joining node makes an HTTP call to the existing leader node to submit its desire to join the cluster.
The Enclaver manifest file allows both ingress on port 8201 for the existing leader and egress to the VPC CIDR for the new-joining node:
I tried the same setup by running the "bare" source Docker images and the node-to-node communication works fine, i.e. the second node did complete joining the cluster.
Do you know if there's something in Enclaver that would prevent this from happening, or if maybe there's a way to make this work?
Thanks, please let me know if you need additional information.