Skip to content

Commit fb5f142

Browse files
committed
docs: stop or not operator at startup in case of informer errors (#1577)
1 parent 15287e3 commit fb5f142

File tree

2 files changed

+20
-4
lines changed

2 files changed

+20
-4
lines changed

docs/documentation/patterns-best-practices.md

+18-2
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ possible to completely deactivate the feature, though we advise against it. The
8484
configure automatic retries for your `Reconciler` is due to the fact that errors occur quite
8585
often due to the distributed nature of Kubernetes: transient network errors can be easily dealt
8686
with by automatic retries. Similarly, resources can be modified by different actors at the same
87-
time so it's not unheard of to get conflicts when working with Kubernetes resources. Such
87+
time, so it's not unheard of to get conflicts when working with Kubernetes resources. Such
8888
conflicts can usually be quite naturally resolved by reconciling the resource again. If it's
8989
done automatically, the whole process can be completely transparent.
9090

@@ -94,7 +94,7 @@ Thanks to the declarative nature of Kubernetes resources, operators that deal on
9494
Kubernetes resources can operator in a stateless fashion, i.e. they do not need to maintain
9595
information about the state of these resources, as it should be possible to completely rebuild
9696
the resource state from its representation (that's what declarative means, after all).
97-
However, this usually doesn't hold true anymore when dealing with external resources and it
97+
However, this usually doesn't hold true anymore when dealing with external resources, and it
9898
might be necessary for the operator to keep track of this external state so that it is available
9999
when another reconciliation occurs. While such state could be put in the primary resource's
100100
status sub-resource, this could become quickly difficult to manage if a lot of state needs to be
@@ -105,3 +105,19 @@ advised to put such state into a separate resource meant for this purpose such a
105105
Kubernetes Secret or ConfigMap or even a dedicated Custom Resource, which structure can be more
106106
easily validated.
107107

108+
## Stopping (or not) Operator in case of Informer Errors
109+
110+
It can
111+
be [configured](https://github.com/java-operator-sdk/java-operator-sdk/blob/2cb616c4c4fd0094ee6e3a0ef2a0ea82173372bf/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationService.java#L168-L168)
112+
if the operator should stop in case of any informer error happens on startup. By default, if there ia an error on
113+
startup and the informer for example has no permissions list the target resources (both the primary resource or
114+
secondary resources) the operator will stop instantly. This behavior can be altered by setting the mentioned flag
115+
to `false`, so operator will start even some informers are not started. In this case - same as in case when an informer
116+
is started at first but experienced problems later - will continuously retry the connection indefinitely with an
117+
exponential backoff. The operator will just stop if there is a fatal
118+
error, [currently](https://github.com/java-operator-sdk/java-operator-sdk/blob/0e55c640bf8be418bc004e51a6ae2dcf7134c688/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java#L64-L66)
119+
that is when a resource cannot be deserialized. The typical use case for changing this flag is when a list of namespaces
120+
is watched by a controller. In is better to start up the operator, so it can handle other namespaces while there
121+
might be a permission issue for some resources in another namespace.
122+
123+

operator-framework/src/test/java/io/javaoperatorsdk/operator/InformerRelatedBehaviorITS.java

+2-2
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@
2222
import static org.junit.jupiter.api.Assertions.assertThrows;
2323

2424
/**
25-
* The test relies on a special minikube configuration: "min-request-timeout" to have a very low
26-
* value, see: "minikube start --extra-config=apiserver.min-request-timeout=3"
25+
* The test relies on a special api server configuration: "min-request-timeout" to have a very low
26+
* value, use: "minikube start --extra-config=apiserver.min-request-timeout=3"
2727
*
2828
* <p>
2929
* This is important when tests are affected by permission changes, since the watch permissions are

0 commit comments

Comments
 (0)