Skip to content

Commit 3070b45

Browse files
committed
Merge pull request kubernetes#8655 from thockin/lb-type-docs
Update docs about new Services work
2 parents e873387 + 3917776 commit 3070b45

File tree

1 file changed

+180
-78
lines changed

1 file changed

+180
-78
lines changed

docs/services.md

Lines changed: 180 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,15 @@ Enter `Services`.
1616

1717
A Kubernetes `Service` is an abstraction which defines a logical set of `Pods`
1818
and a policy by which to access them - sometimes called a micro-service. The
19-
set of `Pods` targeted by a `Service` is determined by a [`Label
20-
Selector`](labels.md).
19+
set of `Pods` targeted by a `Service` is (usually) determined by a [`Label
20+
Selector`](labels.md) (see below for why you might want a `Service` without a
21+
selector).
2122

2223
As an example, consider an image-processing backend which is running with 3
2324
replicas. Those replicas are fungible - frontends do not care which backend
2425
they use. While the actual `Pods` that compose the backend set may change, the
25-
frontend clients should not need to manage that themselves. The `Service`
26-
abstraction enables this decoupling.
26+
frontend clients should not need to be aware of that or keep track of the list
27+
of backends themselves. The `Service` abstraction enables this decoupling.
2728

2829
For Kubernetes-native applications, Kubernetes offers a simple `Endpoints` API
2930
that is updated whenever the set of `Pods` in a `Service` changes. For
@@ -37,16 +38,12 @@ REST objects, a `Service` definition can be POSTed to the apiserver to create a
3738
new instance. For example, suppose you have a set of `Pods` that each expose
3839
port 9376 and carry a label "app=MyApp".
3940

40-
4141
```json
4242
{
4343
"kind": "Service",
4444
"apiVersion": "v1beta3",
4545
"metadata": {
4646
"name": "my-service",
47-
"labels": {
48-
"environment": "testing"
49-
}
5047
},
5148
"spec": {
5249
"selector": {
@@ -64,22 +61,34 @@ port 9376 and carry a label "app=MyApp".
6461
```
6562

6663
This specification will create a new `Service` object named "my-service" which
67-
targets TCP port 9376 on any `Pod` with the "app=MyApp" label. Every `Service`
68-
is also assigned a virtual IP address (called the "portal IP"), which is used by
69-
the service proxies (see below). The `Service`'s selector will be evaluated
70-
continuously and the results will be posted in an `Endpoints` object also named
71-
"my-service".
64+
targets TCP port 9376 on any `Pod` with the "app=MyApp" label. This `Service`
65+
will also be assigned an IP address (sometimes called the "portal IP"), which
66+
is used by the service proxies (see below). The `Service`'s selector will be
67+
evaluated continuously and the results will be posted in an `Endpoints` object
68+
also named "my-service".
69+
70+
Note that a `Service` can map an incoming port to any `targetPort`. By default
71+
the `targetPort` is the same as the `port` field. Perhaps more interesting is
72+
that `targetPort` can be a string, referring to the name of a port in the
73+
backend `Pod`s. The actual port number assigned to that name can be different
74+
in each backend `Pod`. This offers a lot of flexibility for deploying and
75+
evolving your `Service`s. For example, you can change the port number that
76+
pods expose in the next version of your backend software, without breaking
77+
clients.
78+
79+
Kubernetes `Service`s support `TCP` and `UDP` for protocols. The default
80+
is `TCP`.
7281

7382
### Services without selectors
7483

75-
Services, in addition to providing abstractions to access `Pods`, can also
76-
abstract any kind of backend. For example:
84+
Services generally abstract access to Kubernetes `Pods`, but they can also
85+
abstract other kinds of backends. For example:
7786
- you want to have an external database cluster in production, but in test
78-
you use your own databases.
87+
you use your own databases
7988
- you want to point your service to a service in another
80-
[`Namespace`](namespaces.md) or on another cluster.
89+
[`Namespace`](namespaces.md) or on another cluster
8190
- you are migrating your workload to Kubernetes and some of your backends run
82-
outside of Kubernetes.
91+
outside of Kubernetes
8392

8493
In any of these scenarios you can define a service without a selector:
8594

@@ -102,7 +111,8 @@ In any of these scenarios you can define a service without a selector:
102111
}
103112
```
104113

105-
Then you can manually map the service to a specific endpoint(s):
114+
Because this has no selector, the corresponding `Endpoints` object will not be
115+
created. You can manually map the service to your own specific endpoints:
106116

107117
```json
108118
{
@@ -135,8 +145,8 @@ watches the Kubernetes master for the addition and removal of `Service`
135145
and `Endpoints` objects. For each `Service` it opens a port (random) on the
136146
local node. Any connections made to that port will be proxied to one of the
137147
corresponding backend `Pods`. Which backend to use is decided based on the
138-
AffinityPolicy of the `Service`. Lastly, it installs iptables rules which
139-
capture traffic to the `Service`'s `Port` on the `Service`'s portal IP (which
148+
`SessionAffinity` of the `Service`. Lastly, it installs iptables rules which
149+
capture traffic to the `Service`'s `Port` on the `Service`'s cluster IP (which
140150
is entirely virtual) and redirects that traffic to the previously described
141151
port.
142152

@@ -146,12 +156,59 @@ appropriate backend without the clients knowing anything about Kubernetes or
146156

147157
![Services overview diagram](services_overview.png)
148158

149-
By default, the choice of backend is random. Client-IP-based session affinity
150-
can be selected by setting `service.spec.sessionAffinity` to `"ClientIP"`.
159+
By default, the choice of backend is random. Client-IP based session affinity
160+
can be selected by setting `service.spec.sessionAffinity` to `"ClientIP"` (the
161+
default is `"None"`).
151162

152163
As of Kubernetes 1.0, `Service`s are a "layer 3" (TCP/UDP over IP) construct. We do not
153164
yet have a concept of "layer 7" (HTTP) services.
154165

166+
## Multi-Port Services
167+
168+
Many `Service`s need to expose more than one port. For this case, Kubernetes
169+
supports multiple port definitions on a `Service` object. When using multiple
170+
ports you must give all of your ports names, so that endpoints can be
171+
disambiguated. For example:
172+
173+
```json
174+
{
175+
"kind": "Service",
176+
"apiVersion": "v1beta3",
177+
"metadata": {
178+
"name": "my-service",
179+
},
180+
"spec": {
181+
"selector": {
182+
"app": "MyApp"
183+
},
184+
"ports": [
185+
{
186+
"name": "http",
187+
"protocol": "TCP",
188+
"port": 80,
189+
"targetPort": 9376
190+
},
191+
{
192+
"name": "https",
193+
"protocol": "TCP",
194+
"port": 443,
195+
"targetPort": 9377
196+
}
197+
]
198+
}
199+
}
200+
```
201+
202+
## Choosing your own PortalIP address
203+
204+
A user can specify their own `PortalIP` address as part of a `Service` creation
205+
request. For example, if they already have an existing DNS entry that they
206+
wish to replace, or legacy systems that are configured for a specific IP
207+
address and difficult to re-configure. The `PortalIP` address that a user
208+
chooses must be a valid IP address and within the portal_net CIDR range that is
209+
specified by flag to the API server. If the PortalIP value is invalid, the
210+
apiserver returns a 422 HTTP status code to indicate that the value is invalid.
211+
155212
### Why not use round-robin DNS?
156213

157214
A question that pops up every now and then is why we do all this stuff with
@@ -208,66 +265,104 @@ DNS records for each. If DNS has been enabled throughout the cluster then all
208265
For example, if you have a `Service` called "my-service" in Kubernetes
209266
`Namespace` "my-ns" a DNS record for "my-service.my-ns" is created. `Pods`
210267
which exist in the "my-ns" `Namespace` should be able to find it by simply doing
211-
a name lookup for "my-service". `Pods` which exist in other `Namespaces` must
268+
a name lookup for "my-service". `Pods` which exist in other `Namespace`s must
212269
qualify the name as "my-service.my-ns". The result of these name lookups is the
213-
virtual portal IP.
270+
cluster IP.
271+
272+
We will soon add DNS support for multi-port `Service`s in the form of SRV
273+
records.
214274

215275
## Headless Services
216276

217-
Sometimes you don't need or want a single virtual IP. In this case, you can
218-
create "headless" services by specifying "None" for the PortalIP. For such
219-
services, a virtual IP is not allocated, DNS is not configured (this will be
220-
fixed), and service-specific environment variables for pods are not created.
221-
Additionally, the kube proxy does not handle these services and there is no
222-
load balancing or proxying done by the platform for them. The endpoints
223-
controller will still create endpoint records in the API for such services.
224-
These services also take advantage of any UI, readiness probes, etc. that are
225-
applicable for services in general.
226-
227-
The tradeoff for a developer would be whether to couple to the Kubernetes API
228-
or to a particular discovery system. Applications can still use a
229-
self-registration pattern and adapters for other discovery systems could be
230-
built upon this API, as well.
277+
Sometimes you don't need or want a single service IP. In this case, you can
278+
create "headless" services by specifying `"None"` for the `PortalIP`. For such
279+
`Service`s, a cluster IP is not allocated and service-specific environment
280+
variables for `Pod`s are not created. DNS is configured to return multiple A
281+
records (addresses) for the `Service` name, which point directly to the `Pod`s
282+
backing the `Service`. Additionally, the kube proxy does not handle these
283+
services and there is no load balancing or proxying done by the platform for
284+
them. The endpoints controller will still create `Endpoints` records in the
285+
API.
286+
287+
This option allows developers to reduce coupling to the Kubernetes system, if
288+
they desire, but leaves them freedom to do discovery in their own way.
289+
Applications can still use a self-registration pattern and adapters for other
290+
discovery systems could easily be built upon this API.
231291

232292
## External Services
233293

234294
For some parts of your application (e.g. frontends) you may want to expose a
235295
Service onto an external (outside of your cluster, maybe public internet) IP
236-
address.
237-
238-
On cloud providers which support external load balancers, this should be as
239-
simple as setting the `createExternalLoadBalancer` flag of the `Service` spec
240-
to `true`. This sets up a cloud-specific load balancer and populates the
241-
`publicIPs` field of the spec (see below). Traffic from the external load
242-
balancer will be directed at the backend `Pods`, though exactly how that works
243-
depends on the cloud provider.
244-
245-
For cloud providers which do not support external load balancers, there is
246-
another approach that is a bit more "do-it-yourself" - the `publicIPs` field.
247-
Any address you put into the `publicIPs` array will be handled the same as the
248-
portal IP - the kube-proxy will install iptables rules which proxy traffic
249-
through to the backends. You are then responsible for ensuring that traffic to
250-
those IPs gets sent to one or more Kubernetes `Nodes`. As long as the traffic
251-
arrives at a Node, it will be be subject to the iptables rules.
252-
253-
An common situation is when a `Node` has both internal and an external network
254-
interfaces. If you put that `Node`'s external IP in `publicIPs`, you can
255-
then aim traffic at the `Service` port on that `Node` and it will be proxied to
256-
the backends. If you set all `Node`s' external IPs as `publicIPs` you can then
257-
reach a `Service` through any `Node`, which means you can build your own
258-
load-balancer or even just use DNS round-robin. The downside to this approach
259-
is that all such `Service`s share a port space - only one of them can have port
260-
80, for example.
296+
address. Kubernetes supports two ways of doing this: `NodePort`s and
297+
`LoadBalancer`s.
261298

262-
## Choosing your own PortalIP address
299+
Every `Service` has a `Type` field which defines how the `Service` can be
300+
accessed. Valid values for this field are:
301+
- ClusterIP: use a cluster-internal IP (portal) only - this is the default
302+
- NodePort: use a cluster IP, but also expose the service on a port on each
303+
node of the cluster (the same port on each)
304+
- LoadBalancer: use a ClusterIP and a NodePort, but also ask the cloud
305+
provider for a load balancer which forwards to the `Service`
263306

264-
A user can specify their own `PortalIP` address as part of a service creation
265-
request. For example, if they already have an existing DNS entry that they
266-
wish to replace, or legacy systems that are configured for a specific IP
267-
address and difficult to re-configure. The `PortalIP` address that a user
268-
chooses must be a valid IP address and within the portal net CIDR range that is
269-
specified by flag to the API server. If the PortalIP value is invalid, the
270-
apiserver returns a 422 HTTP status code to indicate that the value is invalid.
307+
Note that while `NodePort`s can be TCP or UDP, `LoadBalancer`s only support TCP
308+
as of Kubernetes 1.0.
309+
310+
### Type = NodePort
311+
312+
If you set the `type` field to `"NodePort"`, the Kubernetes master will
313+
allocate you a port (from a flag-configured range) on each node for each port
314+
exposed by your `Service`. That port will be reported in your `Service`'s
315+
`spec.ports[*].nodePort` field. If you specify a value in that field, the
316+
system will allocate you that port or else will fail the API transaction.
317+
318+
This gives developers the freedom to set up their own load balancers, to
319+
configure cloud environments that are not fully supported by Kubernetes, or
320+
even to just expose one or more nodes' IPs directly.
321+
322+
### Type = LoadBalancer
323+
324+
On cloud providers which support external load balancers, setting the `type`
325+
field to `"LoadBalancer"` will provision a load balancer for your `Service`.
326+
The actual creation of the load balancer happens asynchronously, and
327+
information about the provisioned balancer will be published in the `Service`'s
328+
`status.loadBalancer` field. For example:
329+
330+
```json
331+
{
332+
"kind": "Service",
333+
"apiVersion": "v1beta3",
334+
"metadata": {
335+
"name": "my-service",
336+
},
337+
"spec": {
338+
"selector": {
339+
"app": "MyApp"
340+
},
341+
"ports": [
342+
{
343+
"protocol": "TCP",
344+
"port": 80,
345+
"targetPort": 9376,
346+
"nodePort": 30061
347+
}
348+
],
349+
"portalIP": "10.0.171.239",
350+
"type": "LoadBalancer"
351+
},
352+
"status": {
353+
"loadBalancer": {
354+
"ingress": [
355+
{
356+
"ip": "146.148.47.155"
357+
}
358+
]
359+
}
360+
}
361+
}
362+
```
363+
364+
Traffic from the external load balancer will be directed at the backend `Pods`,
365+
though exactly how that works depends on the cloud provider.
271366

272367
## Shortcomings
273368

@@ -280,6 +375,13 @@ details.
280375
Using the kube-proxy obscures the source-IP of a packet accessing a `Service`.
281376
This makes some kinds of firewalling impossible.
282377

378+
LoadBalancers only support TCP, not UDP.
379+
380+
The `Type` field is designed as nested functionality - each level adds to the
381+
previous. This is not strictly required on all cloud providers (e.g. GCE does
382+
not need to allocate a `NodePort` to make `LoadBalancer` work, but AWS does)
383+
but the current API requires it.
384+
283385
## Future work
284386

285387
In the future we envision that the proxy policy can become more nuanced than
@@ -293,11 +395,11 @@ eliminate userspace proxying in favor of doing it all in iptables. This should
293395
perform better and fix the source-IP obfuscation, though is less flexible than
294396
arbitrary userspace code.
295397

296-
We hope to make the situation around external load balancers and public IPs
297-
simpler and easier to comprehend.
298-
299398
We intend to have first-class support for L7 (HTTP) `Service`s.
300399

400+
We intend to have more flexible ingress modes for `Service`s which encompass
401+
the current `ClusterIP`, `NodePort`, and `LoadBalancer` modes and more.
402+
301403
## The gory details of portals
302404

303405
The previous information should be sufficient for many people who just want to
@@ -348,9 +450,9 @@ When a client connects to the portal the iptables rule kicks in, and redirects
348450
the packets to the `Service proxy`'s own port. The `Service proxy` chooses a
349451
backend, and starts proxying traffic from the client to the backend.
350452

351-
This means that `Service` owners can choose any `Service` port they want without
352-
risk of collision. Clients can simply connect to an IP and port, without
353-
being aware of which `Pods` they are actually accessing.
453+
This means that `Service` owners can choose any port they want without risk of
454+
collision. Clients can simply connect to an IP and port, without being aware
455+
of which `Pod`s they are actually accessing.
354456

355457
![Services detailed diagram](services_detail.png)
356458

0 commit comments

Comments
 (0)