Skip to content

Commit

Permalink
doc(katib): update push-based metrics collector.
Browse files Browse the repository at this point in the history
Signed-off-by: Electronic-Waste <[email protected]>
  • Loading branch information
Electronic-Waste committed Sep 4, 2024
1 parent 89d8f79 commit 3d96660
Showing 1 changed file with 73 additions and 8 deletions.
81 changes: 73 additions & 8 deletions content/en/docs/components/katib/user-guides/metrics-collector.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,23 @@ weight = 40

This guide describes how Katib metrics collector works.

## Metrics Collector
## Overview

There are two ways to collect metrics:

1. Pull-based: collects the metrics using a _sidecar_ container. A sidecar is a utility container that supports
the main container in the Kubernetes Pod.

2. Push-based: users push the metrics directly to Katib DB in the training scripts.

In the `metricsCollectorSpec` section of the Experiment YAML configuration file, you can
define how Katib should collect the metrics from each Trial, such as the accuracy and loss metrics.

Your training code can record the metrics into `StdOut` or into arbitrary output files. Katib
collects the metrics using a _sidecar_ container. A sidecar is a utility container that supports
the main container in the Kubernetes Pod.
## Pull-based Metrics Collector

Your training code can record the metrics into `StdOut` or into arbitrary output files.

To define the metrics collector for your Experiment:
To define the pull-based metrics collector for your Experiment:

1. Specify the collector type in the `.collector.kind` field.
Katib's metrics collector supports the following collector types:
Expand Down Expand Up @@ -51,9 +58,6 @@ To define the metrics collector for your Experiment:
in the `.collector.customCollector` field. Check the
[custom metrics collector example](https://github.com/kubeflow/katib/blob/ea46a7f2b73b2d316b6b7619f99eb440ede1909b/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml#L14-L36).
- `None`: Specify this value if you don't need to use Katib's metrics collector. For example,
your training code may handle the persistent storage of its own metrics.
2. Write code in your training container to print or save to the file metrics in the format
specified in the `.source.filter.metricsFormat` field. The default metrics format value is:
Expand All @@ -79,3 +83,64 @@ To define the metrics collector for your Experiment:
recall=0.55
precision=.5
```

## Push-based Metrics Collector

Your training code needs to call [`report_metrics`](https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/report_metrics.py#L26) function in Python SDK to record metrics.

To define the push-based metrics collector for your Experiment, you have two options:

- YAML File

1. Specify the collector type `Push` in the `.collector.kind` field.

2. Write code in your training container to call `report_metrics` to report metrics.

- [`tune`](https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L166) function

Use tune function and specify the `metrics_collector_config` field. You can reference to the following example:

```
import kubeflow.katib as katib
# Step 1. Create an objective function with push-based metrics collection.
def objective(parameters):
# Import required packages.
import time
import kubeflow.katib as katib
time.sleep(5)
# Calculate objective function.
result = 4 * int(parameters["a"]) - float(parameters["b"]) ** 2
# Push metrics to Katib DB.
katib.report_metrics({"result": result})
# Step 2. Create HyperParameter search space.
parameters = {
"a": katib.search.int(min=10, max=20),
"b": katib.search.double(min=0.1, max=0.2)
}
# Step 3. Create Katib Experiment with 4 Trials and 2 CPUs per Trial.
# We choose to install the latest changes of Python SDK because `report_metrics` has not been supported yet.
# Thus, the base image must have `git` command to download the package.
katib_client = katib.KatibClient(namespace="kubeflow")
name = "tune-experiment"
katib_client.tune(
name=name,
objective=objective,
parameters=parameters,
base_image="electronicwaste/push-metrics-collector:v0.0.9", # python:3.11-slim + git
objective_metric_name="result",
max_trial_count=4,
resources_per_trial={"cpu": "2"},
packages_to_install=["git+https://github.com/kubeflow/katib.git@master#subdirectory=sdk/python/v1beta1"],
# packages_to_install=["kubeflow-katib==0.18.0"],
metrics_collector_config={"kind": "Push"},
)
# Step 4. Wait until Katib Experiment is complete
katib_client.wait_for_experiment_condition(name=name)
# Step 5. Get the best HyperParameters.
print(katib_client.get_optimal_hyperparameters(name))
```

0 comments on commit 3d96660

Please sign in to comment.