Skip to content

Commit 6f8d38d

Browse files
authored
docs: best practices (#907)
1 parent 9c07b6d commit 6f8d38d

File tree

1 file changed

+44
-29
lines changed

1 file changed

+44
-29
lines changed

docs/documentation/patterns-best-practices.md

+44-29
Original file line numberDiff line numberDiff line change
@@ -17,54 +17,69 @@ See also best practices in [Operator SDK](https://sdk.operatorframework.io/docs/
1717
### Reconcile All The Resources All the Time
1818

1919
The reconciliation can be triggered by events from multiple sources. It could be tempting to check the events and
20-
reconcile just the related resource or subset of resources that the controller manages. However, this is **considered as an
21-
anti-pattern** in operators. If triggered, all resources should be reconciled. Usually this means only
22-
comparing the target state with the current state in the cache for most of the resource.
23-
The reason behind this is events not reliable in generally, this means events can be lost. In addition to that operator
24-
can crash and while down will miss events.
20+
reconcile just the related resource or subset of resources that the controller manages. However, this is **considered as
21+
an anti-pattern** in operators. If triggered, all resources should be reconciled. Usually this means only comparing the
22+
target state with the current state in the cache for most of the resource. The reason behind this is events not reliable
23+
In general, this means events can be lost. In addition to that the operator can crash and while down will miss events.
2524

2625
In addition to that such approach might even complicate implementation logic in the `Reconciler`, since parallel
2726
execution of the reconciler is not allowed for the same custom resource, there can be multiple events received for the
2827
same resource or dependent resource during an ongoing execution, ordering those events could be also challenging.
2928

30-
Since there is a consensus regarding this in the industry, from v2 the events are not even accessible for
29+
Since there is a consensus regarding this in the industry, from v2 the events are not even accessible for
3130
the `Reconciler`.
3231

32+
### EventSources and Caching
33+
34+
As mentioned above during a reconciliation best practice is to reconcile all the dependent resources managed by the
35+
controller. This means that we want to compare a target state with the actual state of the cluster. Reading the actual
36+
state of a resource from the Kubernetes API Server directly all the time would mean a significant load. Therefore, it's
37+
a common practice to instead create a watch for the dependent resources and cache their latest state. This is done
38+
following the Informer pattern. In Java Operator SDK, informer is wrapped into an EventSource, to integrate it with the
39+
eventing system of the framework, resulting in `InformerEventSource`.
40+
41+
A new event that triggers the reconciliation is propagated when the actual resource is already in cache. So in
42+
reconciler what should be just done is to compare the target calculated state of a dependent resource of the actual
43+
state from the cache of the event source. If it is changed or not in the cache it needs to be created, respectively
44+
updated.
45+
3346
### Idempotency
3447

35-
Since all the resources are reconciled during an execution and an execution can be triggered quite often, also
36-
retries of a reconciliation can happen naturally in operators, the implementation of a `Reconciler`
37-
needs to be idempotent. Luckily, since operators are usually managing already declarative resources, this is trivial
38-
to do in most cases.
48+
Since all the resources are reconciled during an execution and an execution can be triggered quite often, also retries
49+
of a reconciliation can happen naturally in operators, the implementation of a `Reconciler`
50+
needs to be idempotent. Luckily, since operators are usually managing already declarative resources, this is trivial to
51+
do in most cases.
3952

4053
### Sync or Async Way of Resource Handling
4154

42-
In an implementation of reconciliation there can be a point when reconciler needs to wait a non-insignificant amount
43-
of time while a resource gets up and running. For example, reconciler would do some additional step only if a Pod is ready
44-
to receive requests. This problem can be approached in two ways synchronously or asynchronously.
55+
In an implementation of reconciliation there can be a point when reconciler needs to wait a non-insignificant amount of
56+
time while a resource gets up and running. For example, reconciler would do some additional step only if a Pod is ready
57+
to receive requests. This problem can be approached in two ways synchronously or asynchronously.
4558

46-
The async way is just return from the reconciler, if there are informers properly in place for the target resource,
59+
The async way is just return from the reconciler, if there are informers properly in place for the target resource,
4760
reconciliation will be triggered on change. During the reconciliation the pod can be read from the cache of the informer
48-
and a check on it's state can be conducted again. The benefit of this approach is that it will free up the thread,
49-
so it can be used to reconcile other resources.
61+
and a check on it's state can be conducted again. The benefit of this approach is that it will free up the thread, so it
62+
can be used to reconcile other resources.
5063

51-
The sync way would be to periodically poll the cache of the informer for the pod's state, until the target state
52-
is reached. This would block the thread until the state is reached, which in some cases could take quite long.
64+
The sync way would be to periodically poll the cache of the informer for the pod's state, until the target state is
65+
reached. This would block the thread until the state is reached, which in some cases could take quite long.
5366

5467
## Why to Have Automated Retries?
5568

56-
Automatic retries are in place by default, it can be fine-tuned, but in general it's not advised to turn
57-
of automatic retries. One of the reason is that, issues like network error naturally happens, and are usually
58-
solved by a retry. Another typical situation is for example when a dependent resource or the custom resource is updated,
59-
during the update usually there is optimistic version control in place. So if someone updated the resource during
60-
reconciliation, maybe using `kubectl` or another process, the update would fail on a conflict. A retry solves this
61-
problem simply by executing the reconciliation again.
69+
Automatic retries are in place by default, it can be fine-tuned, but in general it's not advised to turn of automatic
70+
retries. One of the reasons is that issues like network error naturally happen and are usually solved by a retry.
71+
Another typical situation is for example when a dependent resource or the custom resource is updated, during the update
72+
usually there is optimistic version control in place. So if someone updated the resource during reconciliation, maybe
73+
using `kubectl` or another process, the update would fail on a conflict. A retry solves this problem simply by executing
74+
the reconciliation again.
6275

6376
## Managing State
6477

65-
## Dependent Resources
66-
67-
### EventSources and Caching
68-
69-
### Why are Events Irrelevant?
78+
When managing only kubernetes resources an explicit state is not necessary about the resources. The state can be
79+
read/watched, also filtered using labels. Or just following some naming convention. However, when managing external
80+
resources, there can be a situation for example when the created resource can only be addressed by an ID generated when
81+
the resource was created. This ID needs to be stored, so on next reconciliation it could be used to addressing the
82+
resource. One place where it could go is the status sub-resource. On the other hand by definition status should be just
83+
the result of a reconciliation. Therefore, it's advised in general, to put such state into a separate resource usually a
84+
Kubernetes Secret or ConfigMap or a dedicated CustomResource, where the structure can be also validated.
7085

0 commit comments

Comments
 (0)