Skip to content

Commit 20ecd9a

Browse files
authored
add document for vpc egress gateway (#228)
Signed-off-by: zhangzujian <[email protected]>
1 parent 8646725 commit 20ecd9a

File tree

3 files changed

+765
-0
lines changed

3 files changed

+765
-0
lines changed

docs/advance/vpc-egress-gateway.en.md

Lines changed: 382 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,382 @@
1+
# VPC Egress Gateway
2+
3+
VPC Egress Gateway is used to control the access of Pods within VPCs (including the default VPC) to the external network. VPC Egress Gateway refers to the design of VPC NAT Gateway, and implements load balancing based on ECMP routing and high availability based on BFD. It also supports IPv6 and dual-stack.
4+
5+
> VPC Egress Gateway supports both the default VPC and custom VPCs.
6+
7+
## Requirements
8+
9+
The VPC Egress Gateway is the same as the VPC NAT Gateway in that it requires [Multus-CNI](https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/quickstart.md){: target = "_blank" }.
10+
11+
> No ConfigMap needs to be configured to use VPC Egress Gateway.
12+
13+
## Usage
14+
15+
### Creating a Network Attachment Definition
16+
17+
The VPC Egress Gateway uses multiple NICs to access both the VPC and the external network, so you need to create a Network Attachment Definition to connect to the external network. An example of using the `macvlan` plugin with IPAM provided by Kube-OVN is shown below:
18+
19+
```yaml
20+
apiVersion: k8s.cni.cncf.io/v1
21+
kind: NetworkAttachmentDefinition
22+
metadata:
23+
name: eth1
24+
namespace: default
25+
spec:
26+
config: '{
27+
"cniVersion": "0.3.0",
28+
"type": "macvlan",
29+
"master": "eth1",
30+
"mode": "bridge",
31+
"ipam": {
32+
"type": "kube-ovn",
33+
"server_socket": "/run/openvswitch/kube-ovn-daemon.sock",
34+
"provider": "eth1.default"
35+
}
36+
}'
37+
---
38+
apiVersion: kubeovn.io/v1
39+
kind: Subnet
40+
metadata:
41+
name: macvlan1
42+
spec:
43+
protocol: IPv4
44+
provider: eth1.default
45+
cidrBlock: 172.17.0.0/16
46+
gateway: 172.17.0.1
47+
excludeIps:
48+
- 172.17.0.0..172.17.0.10
49+
```
50+
51+
> You can create a Network Attachment Definition with any CNI plugin to access the corresponding network.
52+
53+
For details on how to use multi-nic, please refer to [Manage Multiple Interface](./multi-nic.en.md).
54+
55+
### Creating a VPC Egress Gateway
56+
57+
Create a VPC Egress Gateway resource as shown in the example below:
58+
59+
```yaml
60+
apiVersion: kubeovn.io/v1
61+
kind: VpcEgressGateway
62+
metadata:
63+
name: gateway1
64+
namespace: default
65+
spec:
66+
vpc: ovn-cluster
67+
replicas: 1
68+
externalSubnet: macvlan1
69+
policies:
70+
- snat: true
71+
subnets:
72+
- ovn-default
73+
```
74+
75+
The above resource creates a VPC Egress Gateway named gateway1 for VPC `ovn-cluster` under the default namespace, and all Pods under the `ovn-default` subnet (10.16.0.0/16) within `ovn-cluster` VPC will access the external network via the `macvlan1` subnet with SNAT applied.
76+
77+
After the creation is complete, check out the VPC Egress Gateway resource:
78+
79+
```shell
80+
$ kubectl get veg gateway1
81+
NAME VPC REPLICAS BFD ENABLED EXTERNAL SUBNET PHASE READY AGE
82+
gateway1 ovn-cluster 1 false macvlan1 Completed true 13s
83+
```
84+
85+
To view more informations:
86+
87+
```shell
88+
kubectl get veg gateway1 -o wide
89+
NAME VPC REPLICAS BFD ENABLED EXTERNAL SUBNET PHASE READY INTERNAL IPS EXTERNAL IPS WORKING NODES AGE
90+
gateway1 ovn-cluster 1 false macvlan1 Completed true ["10.16.0.12"] ["172.17.0.11"] ["kube-ovn-worker"] 82s
91+
```
92+
93+
To view the workload:
94+
95+
```shell
96+
$ kubectl get deployment -l ovn.kubernetes.io/vpc-egress-gateway=gateway1
97+
NAME READY UP-TO-DATE AVAILABLE AGE
98+
gateway1 1/1 1 1 4m40s
99+
100+
$ kubectl get pod -l ovn.kubernetes.io/vpc-egress-gateway=gateway1 -o wide
101+
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
102+
gateway1-b9f8b4448-76lhm 1/1 Running 0 4m48s 10.16.0.12 kube-ovn-worker <none> <none>
103+
```
104+
105+
To view IP addresses, routes, and iptables rules in the Pod:
106+
107+
```shell
108+
$ kubectl exec gateway1-b9f8b4448-76lhm -c gateway -- ip address show
109+
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
110+
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
111+
inet 127.0.0.1/8 scope host lo
112+
valid_lft forever preferred_lft forever
113+
inet6 ::1/128 scope host
114+
valid_lft forever preferred_lft forever
115+
2: net1@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
116+
link/ether 62:d8:71:90:7b:86 brd ff:ff:ff:ff:ff:ff link-netnsid 0
117+
inet 172.17.0.11/16 brd 172.17.255.255 scope global net1
118+
valid_lft forever preferred_lft forever
119+
inet6 fe80::60d8:71ff:fe90:7b86/64 scope link
120+
valid_lft forever preferred_lft forever
121+
17: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
122+
link/ether 36:7c:6b:c7:82:6b brd ff:ff:ff:ff:ff:ff link-netnsid 0
123+
inet 10.16.0.12/16 brd 10.16.255.255 scope global eth0
124+
valid_lft forever preferred_lft forever
125+
inet6 fe80::347c:6bff:fec7:826b/64 scope link
126+
valid_lft forever preferred_lft forever
127+
128+
$ kubectl exec gateway1-b9f8b4448-76lhm -c gateway -- ip route show
129+
default via 172.17.0.1 dev net1
130+
10.16.0.0/16 dev eth0 proto kernel scope link src 10.16.0.12
131+
172.17.0.0/16 dev net1 proto kernel scope link src 172.17.0.11
132+
133+
$ kubectl exec gateway1-b9f8b4448-76lhm -c gateway -- iptables -t nat -S
134+
-P PREROUTING ACCEPT
135+
-P INPUT ACCEPT
136+
-P OUTPUT ACCEPT
137+
-P POSTROUTING ACCEPT
138+
-A POSTROUTING -s 10.16.0.0/16 -j MASQUERADE --random-fully
139+
```
140+
141+
Capture packets in the Gateway Pod to verify network traffic:
142+
143+
```shell
144+
$ kubectl exec -ti gateway1-b9f8b4448-76lhm -c gateway -- bash
145+
nobody@gateway1-b9f8b4448-76lhm:/kube-ovn$ tcpdump -i any -nnve icmp and host 172.17.0.1
146+
tcpdump: data link type LINUX_SLL2
147+
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
148+
06:50:58.936528 eth0 In ifindex 17 92:26:b8:9e:f2:1c ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 30481, offset 0, flags [DF], proto ICMP (1), length 84)
149+
10.16.0.9 > 172.17.0.1: ICMP echo request, id 37989, seq 0, length 64
150+
06:50:58.936574 net1 Out ifindex 2 62:d8:71:90:7b:86 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 62, id 30481, offset 0, flags [DF], proto ICMP (1), length 84)
151+
172.17.0.11 > 172.17.0.1: ICMP echo request, id 39449, seq 0, length 64
152+
06:50:58.936613 net1 In ifindex 2 02:42:39:79:7f:08 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 26701, offset 0, flags [none], proto ICMP (1), length 84)
153+
172.17.0.1 > 172.17.0.11: ICMP echo reply, id 39449, seq 0, length 64
154+
06:50:58.936621 eth0 Out ifindex 17 36:7c:6b:c7:82:6b ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 26701, offset 0, flags [none], proto ICMP (1), length 84)
155+
172.17.0.1 > 10.16.0.9: ICMP echo reply, id 37989, seq 0, length 64
156+
```
157+
158+
Routing policies (static routes for custom VPCs) are automatically created on the OVN Logical Router:
159+
160+
```shell
161+
$ kubectl ko nbctl lr-policy-list ovn-cluster
162+
Routing Policies
163+
31000 ip4.dst == 10.16.0.0/16 allow
164+
31000 ip4.dst == 100.64.0.0/16 allow
165+
30000 ip4.dst == 172.18.0.2 reroute 100.64.0.3
166+
30000 ip4.dst == 172.18.0.3 reroute 100.64.0.2
167+
30000 ip4.dst == 172.18.0.4 reroute 100.64.0.4
168+
29100 ip4.src == 10.16.0.0/16 reroute 10.16.0.12
169+
29000 ip4.src == $ovn.default.kube.ovn.control.plane_ip4 reroute 100.64.0.2
170+
29000 ip4.src == $ovn.default.kube.ovn.worker2_ip4 reroute 100.64.0.4
171+
29000 ip4.src == $ovn.default.kube.ovn.worker_ip4 reroute 100.64.0.3
172+
```
173+
174+
If you need to enable load balancing, modify `.spec.replicas` as shown in the following example:
175+
176+
```shell
177+
$ kubectl patch veg gateway1 --type=merge -p '{"spec": {"replicas": 2}}'
178+
vpcegressgateway.kubeovn.io/gateway1 patched
179+
180+
$ kubectl get veg gateway1
181+
NAME VPC REPLICAS BFD ENABLED EXTERNAL SUBNET PHASE READY AGE
182+
gateway1 ovn-cluster 2 false macvlan Completed true 39m
183+
184+
$ kubectl get pod -l ovn.kubernetes.io/vpc-egress-gateway=gateway1 -o wide
185+
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
186+
gateway1-b9f8b4448-76lhm 1/1 Running 0 40m 10.16.0.12 kube-ovn-worker <none> <none>
187+
gateway1-b9f8b4448-zd4dl 1/1 Running 0 64s 10.16.0.13 kube-ovn-worker2 <none> <none>
188+
189+
$ kubectl ko nbctl lr-policy-list ovn-cluster
190+
Routing Policies
191+
31000 ip4.dst == 10.16.0.0/16 allow
192+
31000 ip4.dst == 100.64.0.0/16 allow
193+
30000 ip4.dst == 172.18.0.2 reroute 100.64.0.3
194+
30000 ip4.dst == 172.18.0.3 reroute 100.64.0.2
195+
30000 ip4.dst == 172.18.0.4 reroute 100.64.0.4
196+
29100 ip4.src == 10.16.0.0/16 reroute 10.16.0.12, 10.16.0.13
197+
29000 ip4.src == $ovn.default.kube.ovn.control.plane_ip4 reroute 100.64.0.2
198+
29000 ip4.src == $ovn.default.kube.ovn.worker2_ip4 reroute 100.64.0.4
199+
29000 ip4.src == $ovn.default.kube.ovn.worker_ip4 reroute 100.64.0.3
200+
```
201+
202+
### Enabling BFD-based High Availability
203+
204+
BFD-based high availability relies on the VPC BFD LRP function, so you need to modify the VPC resource to enable BFD Port. Here is an example:
205+
206+
```yaml
207+
apiVersion: kubeovn.io/v1
208+
kind: Vpc
209+
metadata:
210+
name: vpc1
211+
spec:
212+
bfdPort:
213+
enabled: true
214+
ip: 10.255.255.255
215+
---
216+
apiVersion: kubeovn.io/v1
217+
kind: Subnet
218+
metadata:
219+
name: subnet1
220+
spec:
221+
vpc: vpc1
222+
protocol: IPv4
223+
cidrBlock: 192.168.0.0/24
224+
```
225+
226+
After the BFD Port is enabled, an LRP dedicated to BFD is automatically created on the corresponding OVN LR:
227+
228+
```shell
229+
$ kubectl ko nbctl show vpc1
230+
router 0c1d1e8f-4c86-4d96-88b2-c4171c7ff824 (vpc1)
231+
port bfd@vpc1
232+
mac: "8e:51:4b:16:3c:90"
233+
networks: ["10.255.255.255"]
234+
port vpc1-subnet1
235+
mac: "de:c9:5c:38:7a:61"
236+
networks: ["192.168.0.1/24"]
237+
```
238+
239+
After that, set `.spec.bfd.enabled` to `true` in VPC Egress Gateway. An example is shown below:
240+
241+
```yaml
242+
apiVersion: kubeovn.io/v1
243+
kind: VpcEgressGateway
244+
metadata:
245+
name: gateway2
246+
namespace: default
247+
spec:
248+
vpc: vpc1
249+
replicas: 2
250+
internalSubnet: subnet1
251+
externalSubnet: macvlan
252+
bfd:
253+
enabled: true
254+
policies:
255+
- snat: true
256+
ipBlocks:
257+
- 192.168.0.0/24
258+
```
259+
260+
To view VPC Egress Gateway information:
261+
262+
```shell
263+
$ kubectl get veg gateway2 -o wide
264+
NAME VPC REPLICAS BFD ENABLED EXTERNAL SUBNET PHASE READY INTERNAL IPS EXTERNAL IPS WORKING NODES AGE
265+
gateway2 vpc1 2 true macvlan Completed true ["192.168.0.2","192.168.0.3"] ["172.17.0.13","172.17.0.14"] ["kube-ovn-worker","kube-ovn-worker2"] 58s
266+
267+
$ kubectl get pod -l ovn.kubernetes.io/vpc-egress-gateway=gateway2 -o wide
268+
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
269+
gateway2-fcc6b8b87-8lgvx 1/1 Running 0 2m18s 192.168.0.3 kube-ovn-worker2 <none> <none>
270+
gateway2-fcc6b8b87-wmww6 1/1 Running 0 2m18s 192.168.0.2 kube-ovn-worker <none> <none>
271+
272+
$ kubectl ko nbctl lr-route-list vpc1
273+
IPv4 Routes
274+
Route Table <main>:
275+
192.168.0.0/24 192.168.0.2 src-ip ecmp ecmp-symmetric-reply bfd
276+
192.168.0.0/24 192.168.0.3 src-ip ecmp ecmp-symmetric-reply bfd
277+
278+
$ kubectl ko nbctl list bfd
279+
_uuid : 223ede10-9169-4c7d-9524-a546e24bfab5
280+
detect_mult : 3
281+
dst_ip : "192.168.0.2"
282+
external_ids : {af="4", vendor=kube-ovn, vpc-egress-gateway="default/gateway2"}
283+
logical_port : "bfd@vpc1"
284+
min_rx : 1000
285+
min_tx : 1000
286+
options : {}
287+
status : up
288+
289+
_uuid : b050c75e-2462-470b-b89c-7bd38889b758
290+
detect_mult : 3
291+
dst_ip : "192.168.0.3"
292+
external_ids : {af="4", vendor=kube-ovn, vpc-egress-gateway="default/gateway2"}
293+
logical_port : "bfd@vpc1"
294+
min_rx : 1000
295+
min_tx : 1000
296+
options : {}
297+
status : up
298+
```
299+
300+
To view BFD connections:
301+
302+
```shell
303+
$ kubectl exec gateway2-fcc6b8b87-8lgvx -c bfdd -- bfdd-control status
304+
There are 1 sessions:
305+
Session 1
306+
id=1 local=192.168.0.3 (p) remote=10.255.255.255 state=Up
307+
308+
$ kubectl exec gateway2-fcc6b8b87-wmww6 -c bfdd -- bfdd-control status
309+
There are 1 sessions:
310+
Session 1
311+
id=1 local=192.168.0.2 (p) remote=10.255.255.255 state=Up
312+
```
313+
314+
### Configuration Parameters
315+
316+
#### VPC BFD Port
317+
318+
| Fields | Type | Optional | Default Value | Description | Example |
319+
| :--- | :--- | :--- | :--- | :--- | :--- |
320+
| `enabled` | `boolean` | Yes | `false` | Whether to enable the BFD Port. | `true` |
321+
| `ip` | `string` | No | - | The IP address used by the BFD Port. Must NOT conflict with other addresses. IPv4, IPv6 and dual-stack are supported. | `169.255.255.255` / `fdff::1` / `169.255.255.255,fdff::1` |
322+
| `nodeSelector` | `object` | Yes | - | Label selector used to select nodes that carries the BFD Port work. the BFD Port binds an OVN HA Chassis Group of selected nodes and works in Active/Backup mode. If this field is not specified, Kube-OVN automatically selects up to three nodes. You can view all OVN HA Chassis Group resources by executing `kubectl ko nbctl list ha_chassis_group`. | - |
323+
| `nodeSelector.matchLabels` | `dict/map` | Yes | - | A map of {key,value} pairs. | - |
324+
| `nodeSelector.matchExpressions` | `object array` | Yes | - | A list of label selector requirements. The requirements are ANDed. | - |
325+
326+
#### VPC Egress Gateway
327+
328+
Spec:
329+
330+
| Fields | Type | Optional | Default Value | Description | Example |
331+
| :--- | :--- | :--- | :--- | :--- | :--- |
332+
| `vpc` | `string` | Yes | Name of the default VPC (ovn-cluster) | VPC name. | `vpc1` |
333+
| `replicas` | `integer/int32` | Yes | `1` | Replicas. | `2` |
334+
| `prefix` | `string` | Yes | - | Prefix of the workload deployment name. This field is immutable. | `veg-` |
335+
| `image` | `string` | Yes | - | The image used by the workload deployment. | `docker.io/kubeovn/kube-ovn:v1.14.0-debug` |
336+
| `internalSubnet` | `string` | Yes | Name of the default subnet within the VPC. | Name of the subnet used to access the VPC network. | `subnet1` |
337+
| `externalSubnet` | `string` | No | - | Name of the subnet used to access the external network. | `ext1` |
338+
| `internalIPs` | `string array` | Yes | - | IP addresses used for accessing the VPC network. IPv4, IPv6 and dual-stack are supported. The number of IPs specified must NOT be less than `replicas`. It is recommended to set the number to `<replicas> + 1` to avoid extreme cases where the Pod is not created properly. | `10.16.0.101` / `fd00::11` / `10.16.0.101,fd00::11` |
339+
| `externalIPs` | `string array` | Yes | - | IP addresses used for accessing the external network. IPv4, IPv6 and dual-stack are supported. The number of IPs specified must NOT be less than `replicas`. It is recommended to set the number to `<replicas> + 1` to avoid extreme cases where the Pod is not created properly. | `10.16.0.101` / `fd00::11` / `10.16.0.101,fd00::11` |
340+
| `bfd` | `object` | Yes | - | BFD Configuration.| - |
341+
| `policies` | `object array` | Yes | - | Egress policies. At least one policy must be configured. | - |
342+
| `nodeSelector` | `object array` | Yes | - | Node selector applied to the workload. The workload (Deployment/Pod) will run on the selected nodes. | - |
343+
344+
BFD Configuration:
345+
346+
| Fields | Type | Optional | Default Value | Description | Example |
347+
| :--- | :--- | :--- | :--- | :--- | :--- |
348+
| `enabled` | `boolean` | Yes | `false` | Whether to enable BFD. | `true` |
349+
| `minRX` | `integer/int32` | Yes | `1000` | BFD minRX in milliseconds. | `500` |
350+
| `minTX` | `integer/int32` | Yes | `1000` | BFD minTX in milliseconds. | `500` |
351+
| `multiplier` | `integer/int32` | Yes | `3` | BFD multiplier. | `1` |
352+
353+
Egress Policies:
354+
355+
| Fields | Type | Optional | Default Value | Description | Example |
356+
| :--- | :--- | :--- | :--- | :--- | :--- |
357+
| `snat` | `boolean` | Yes | `false` | Whether to enable SNAT/MASQUERADE. | `true` |
358+
| `ipBlocks` | `string array` | Yes | - | IP range segments applied to this Gateway. Both IPv4 and IPv6 are supported. | `192.168.0.1` / `192.168.0.0/24` |
359+
| `subnets` | `string array` | Yes | - | The VPC subnet name applied to this Gateway. IPv4, IPv6 and dual-stack subnets are supported. | `subnet1` |
360+
361+
Node selector:
362+
363+
| Fields | Type | Optional | Default Value | Description | Example |
364+
| :--- | :--- | :--- | :--- | :--- | :--- |
365+
| `matchLabels` | `dict/map` | Yes | - | A map of {key,value} pairs. | - |
366+
| `matchExpressions` | `object array` | Yes | - | A list of label selector requirements. The requirements are ANDed. | - |
367+
| `matchFields` | `object array` | Yes | - | A list of field selector requirements. The requirements are ANDed. | - |
368+
369+
Status:
370+
371+
| Fields | Type | Description | Example |
372+
| :--- | :--- | :--- | :--- |
373+
| `ready` | `boolean` | Whether the gateway is ready. | `true` |
374+
| `phase` | `string` | The gateway processing phase. | `Pending` / `Processing` / `Completed` |
375+
| `internalIPs` | `string array` | IP addresses used to access the VPC network. | - |
376+
| `externalIPs` | `string array` | IP addresses used to access the external network. | - |
377+
| `workload` | `object` | Workload information. | - |
378+
| `workload.apiVersion` | `string` | Workload API version. | `apps/v1` |
379+
| `workload.kind` | `string` | Workload kind. | `Deployment` |
380+
| `workload.name` | `string` | Workload name. | `gateway1` |
381+
| `workload.nodes` | `string array` | Names of the nodes where the workload resides. | - |
382+
| `conditions` | `object array` | - | - |

0 commit comments

Comments
 (0)