docs: stop or not operator at startup in case of informer errors (#1577)

csviri · csviri · commit fb5f142fdc24 · 2022-10-31T09:39:33.000+01:00
diff --git a/docs/documentation/patterns-best-practices.md b/docs/documentation/patterns-best-practices.md
@@ -84,7 +84,7 @@ possible to completely deactivate the feature, though we advise against it. The
 configure automatic retries for your `Reconciler` is due to the fact that errors occur quite
 often due to the distributed nature of Kubernetes: transient network errors can be easily dealt
 with by automatic retries. Similarly, resources can be modified by different actors at the same
-time so it's not unheard of to get conflicts when working with Kubernetes resources. Such
+time, so it's not unheard of to get conflicts when working with Kubernetes resources. Such
 conflicts can usually be quite naturally resolved by reconciling the resource again. If it's
 done automatically, the whole process can be completely transparent.
 
@@ -94,7 +94,7 @@ Thanks to the declarative nature of Kubernetes resources, operators that deal on
 Kubernetes resources can operator in a stateless fashion, i.e. they do not need to maintain
 information about the state of these resources, as it should be possible to completely rebuild
 the resource state from its representation (that's what declarative means, after all).
-However, this usually doesn't hold true anymore when dealing with external resources and it
+However, this usually doesn't hold true anymore when dealing with external resources, and it
 might be necessary for the operator to keep track of this external state so that it is available
 when another reconciliation occurs. While such state could be put in the primary resource's
 status sub-resource, this could become quickly difficult to manage if a lot of state needs to be
@@ -105,3 +105,19 @@ advised to put such state into a separate resource meant for this purpose such a
 Kubernetes Secret or ConfigMap or even a dedicated Custom Resource, which structure can be more
 easily validated.
 
+## Stopping (or not) Operator in case of Informer Errors
+
+It can
+be [configured](https://github.com/java-operator-sdk/java-operator-sdk/blob/2cb616c4c4fd0094ee6e3a0ef2a0ea82173372bf/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationService.java#L168-L168)
+if the operator should stop in case of any informer error happens on startup. By default, if there ia an error on
+startup and the informer for example has no permissions list the target resources (both the primary resource or
+secondary resources) the operator will stop instantly. This behavior can be altered by setting the mentioned flag
+to `false`, so operator will start even some informers are not started. In this case - same as in case when an informer
+is started at first but experienced problems later - will continuously retry the connection indefinitely with an
+exponential backoff. The operator will just stop if there is a fatal
+error, [currently](https://github.com/java-operator-sdk/java-operator-sdk/blob/0e55c640bf8be418bc004e51a6ae2dcf7134c688/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java#L64-L66)
+that is when a resource cannot be deserialized. The typical use case for changing this flag is when a list of namespaces
+is watched by a controller. In is better to start up the operator, so it can handle other namespaces while there
+might be a permission issue for some resources in another namespace.
+
+
diff --git a/operator-framework/src/test/java/io/javaoperatorsdk/operator/InformerRelatedBehaviorITS.java b/operator-framework/src/test/java/io/javaoperatorsdk/operator/InformerRelatedBehaviorITS.java
@@ -22,8 +22,8 @@
 import static org.junit.jupiter.api.Assertions.assertThrows;
 
 /**
- * The test relies on a special minikube configuration: "min-request-timeout" to have a very low
- * value, see: "minikube start --extra-config=apiserver.min-request-timeout=3"
+ * The test relies on a special api server configuration: "min-request-timeout" to have a very low
+ * value, use: "minikube start --extra-config=apiserver.min-request-timeout=3"
  *
  * <p>
  * This is important when tests are affected by permission changes, since the watch permissions are