doc(katib): update push-based metrics collector.

Signed-off-by: Electronic-Waste <[email protected]>
kubeflow · Sep 4, 2024 · 3d96660 · 3d96660
1 parent 89d8f79
commit 3d96660
Showing 1 changed file with 73 additions and 8 deletions.
diff --git a/content/en/docs/components/katib/user-guides/metrics-collector.md b/content/en/docs/components/katib/user-guides/metrics-collector.md
@@ -6,16 +6,23 @@ weight = 40
 
 This guide describes how Katib metrics collector works.
 
-## Metrics Collector
+## Overview
+
+There are two ways to collect metrics:
+
+1. Pull-based: collects the metrics using a _sidecar_ container. A sidecar is a utility container that supports
+the main container in the Kubernetes Pod.
+
+2. Push-based: users push the metrics directly to Katib DB in the training scripts.
 
 In the `metricsCollectorSpec` section of the Experiment YAML configuration file, you can
 define how Katib should collect the metrics from each Trial, such as the accuracy and loss metrics.
 
-Your training code can record the metrics into `StdOut` or into arbitrary output files. Katib
-collects the metrics using a _sidecar_ container. A sidecar is a utility container that supports
-the main container in the Kubernetes Pod.
+## Pull-based Metrics Collector
+
+Your training code can record the metrics into `StdOut` or into arbitrary output files. 
 
-To define the metrics collector for your Experiment:
+To define the pull-based metrics collector for your Experiment:
 
 1. Specify the collector type in the `.collector.kind` field.
    Katib's metrics collector supports the following collector types:
@@ -51,9 +58,6 @@ To define the metrics collector for your Experiment:
      in the `.collector.customCollector` field. Check the
      [custom metrics collector example](https://github.com/kubeflow/katib/blob/ea46a7f2b73b2d316b6b7619f99eb440ede1909b/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml#L14-L36).
 
-   - `None`: Specify this value if you don't need to use Katib's metrics collector. For example,
-     your training code may handle the persistent storage of its own metrics.
-
 2. Write code in your training container to print or save to the file metrics in the format
    specified in the `.source.filter.metricsFormat` field. The default metrics format value is:
 
@@ -79,3 +83,64 @@ To define the metrics collector for your Experiment:
    recall=0.55
    precision=.5
    ```
+
+## Push-based Metrics Collector
+
+Your training code needs to call [`report_metrics`](https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/report_metrics.py#L26) function in Python SDK to record metrics.
+
+To define the push-based metrics collector for your Experiment, you have two options:
+
+- YAML File
+
+    1. Specify the collector type `Push` in the `.collector.kind` field.
+
+    2. Write code in your training container to call `report_metrics` to report metrics.
+
+- [`tune`](https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L166) function
+
+    Use tune function and specify the `metrics_collector_config` field. You can reference to the following example:
+
+    ```
+    import kubeflow.katib as katib
+
+    # Step 1. Create an objective function with push-based metrics collection.
+    def objective(parameters):
+      # Import required packages.
+      import time
+      import kubeflow.katib as katib
+      time.sleep(5)
+      # Calculate objective function.
+      result = 4 * int(parameters["a"]) - float(parameters["b"]) ** 2
+      # Push metrics to Katib DB.
+      katib.report_metrics({"result": result})
+
+    # Step 2. Create HyperParameter search space.
+    parameters = {
+      "a": katib.search.int(min=10, max=20),
+      "b": katib.search.double(min=0.1, max=0.2)
+    }
+
+    # Step 3. Create Katib Experiment with 4 Trials and 2 CPUs per Trial.
+    # We choose to install the latest changes of Python SDK because `report_metrics` has not been supported yet. 
+    # Thus, the base image must have `git` command to download the package.
+    katib_client = katib.KatibClient(namespace="kubeflow")
+    name = "tune-experiment"
+    katib_client.tune(
+      name=name,
+      objective=objective,
+      parameters=parameters,
+      base_image="electronicwaste/push-metrics-collector:v0.0.9", # python:3.11-slim + git
+      objective_metric_name="result",
+      max_trial_count=4,
+      resources_per_trial={"cpu": "2"},
+      packages_to_install=["git+https://github.com/kubeflow/katib.git@master#subdirectory=sdk/python/v1beta1"],
+      # packages_to_install=["kubeflow-katib==0.18.0"],
+      metrics_collector_config={"kind": "Push"},
+    )
+
+    # Step 4. Wait until Katib Experiment is complete
+    katib_client.wait_for_experiment_condition(name=name)
+
+    # Step 5. Get the best HyperParameters.
+    print(katib_client.get_optimal_hyperparameters(name))
+    ```