Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): support volume mount in tune API #2508

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

truc0
Copy link
Contributor

@truc0 truc0 commented Feb 5, 2025

What this PR does / why we need it:

As discussed on #2247 , providing a clean and simple way for specifying volume mount in tune API of katib Python SDK will enhance develop experience.

This PR adds storage_per_trial argument to KatibClient.tune() method.

Design

The storage_per_trial argument is designed to have the following type:

List[TypedDict({
    "volume": client.V1Volume,
    "mount_path": client.V1VolumeMount,
})]

To simplify the usage, there are some enhancements:

  1. User can omit the outer list wrapper if there is only one storage to be mount. This optimization change the type of storage_per_trial to: Union[TuneStoragePerTrial, List[TuneStoragePerTrial]]
  2. User can specify a dict instead of a client.V1Volume object when the storage config is simple:
    {
         "name": "volume-name",  # Required: Name of the volume
         "type": "pvc|secret|config_map|empty_dir",  # Required: Volume type
         # Optional fields based on type:
         # For PVC:
         "claim_name": "pvc-name",
         "read_only": False,
         # For Secret:
         "secret_name": "secret-name", 
         "items": [...],
         "default_mode": 0644,
         "optional": False,
         # For ConfigMap:
         "config_map_name": "config-name",
         "items": [...],
         "default_mode": 0644,
         "optional": False,
         # For EmptyDir:
         "medium": None,
         "size_limit": None
     }
  3. User can specify mount_path with a str instead of client.V1VolumeMount

Example Usage

storage_per_trial = [
    {
        "volume": {
            "name": "data",
            "type": "pvc",
            "claim_name": "my-data-pvc"
        },
        "mount_path": "/data"
    },
    {
        "volume": {
            "name": "config",
            "type": "config_map",
            "config_map_name": "model-config"
        },
        "mount_path": "/etc/config"
    }
]
Full Example
import kubeflow.katib as katib
from kubernetes import client
import json
import base64

VOLUME_NAME = "katib-secret-volume"
SECRET_NAME = "katib-secret-test"

# create the secret (for testing purposes, optional)
secret = client.V1Secret(
    metadata=client.V1ObjectMeta(name=SECRET_NAME),
    data={
        "credentials.json": base64.b64encode(
            json.dumps({"username": "admin", "password": "password"}).encode()
        ).decode()
    },
)
# client.CoreV1Api().create_namespaced_secret(namespace="kubeflow", body=secret)


def objective(parameters):
    import os

    result = 4 * int(parameters["x"]) - float(parameters["y"]) ** 2
    # The result will be negative if the secret is mounted successfully
    if os.path.exists("/secret/credentials.json"):
        result *= -1
    print(f"result={result}")


parameters = {
    "x": katib.search.int(min=10, max=20),
    "y": katib.search.double(min=0.1, max=0.2),
}


katib_client = katib.KatibClient(namespace="kubeflow")

storage_per_trial = {
    "volume": {
        "name": VOLUME_NAME,
        "type": "secret",
        "secret_name": SECRET_NAME,
    },
    "mount_path": "/secret",
}

name = "katib-experiment-secret"
katib_client.tune(
    name=name,
    parameters=parameters,
    objective=objective,
    objective_metric_name="result",
    max_trial_count=12,
    resources_per_trial={"cpu": "2"},
    storage_per_trial=storage_per_trial,
)

katib_client.wait_for_experiment_condition(name=name)
print(katib_client.get_optimal_hyperparameters(name=name))

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2247

Checklist:

  • Docs included if any changes are user facing
  • Support mounting volume
  • Documenting usage of storage_per_trial

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@truc0 truc0 force-pushed the 2247-katib-python-sdk-specify-volume-mounts branch from a2d2c5f to 898d047 Compare February 7, 2025 14:14
@google-oss-prow google-oss-prow bot added size/M and removed size/S labels Feb 7, 2025
@truc0 truc0 force-pushed the 2247-katib-python-sdk-specify-volume-mounts branch from 898d047 to 222020d Compare February 7, 2025 14:18
@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Feb 7, 2025
@truc0 truc0 force-pushed the 2247-katib-python-sdk-specify-volume-mounts branch 3 times, most recently from a0fbd20 to ec38767 Compare February 7, 2025 14:27
@truc0 truc0 marked this pull request as ready for review February 7, 2025 14:43
@Electronic-Waste
Copy link
Member

@truc0 Amazing! Thanks for doing this. And I'm sorry for the late reply. I'll review this PR in this week.

cc @kubeflow/wg-automl-leads @helenxie-bit @mahdikhashan

@mahdikhashan
Copy link
Member

@truc0 thank you - would you please add unit tests and e2e for your changes -

unit tests in this path: sdk/python/v1beta1/kubeflow/katib/api/katib_client_test.py
e2e here: test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.py

I'll review the code and functionality after it. thanks for your time.

@truc0
Copy link
Contributor Author

truc0 commented Feb 14, 2025

Sure, I will add it soon

Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@truc0 I'm so sorry for the late response. Here are my initial comments for you.

Please also take a look at this PR if you are available @kubeflow/wg-automl-leads @helenxie-bit @mahdikhashan @Doris-xm

Comment on lines +45 to +52
TuneStoragePerTrialType = TypedDict(
"TuneStoragePerTrial",
{
"volume": Union[client.V1Volume, Dict[str, Any]],
"mount_path": Union[str, client.V1VolumeMount],
},
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@truc0 That's an amazing optimization from the perspective of Data Scientists. Could you also help with the review of kubeflow/trainer#2449 (comment), which might be simiar to this scenario?👀

@@ -198,6 +206,7 @@ def tune(
env_per_trial: Optional[
Union[Dict[str, str], List[Union[client.V1EnvVar, client.V1EnvFromSource]]]
] = None,
storage_per_trial: Optional[List[TuneStoragePerTrialType]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the name convention here. We already have storage_config parameters here and have similar functionality. Do we have any ideas dealing with them? Or combine them together?

- volume: Either a kubernetes.client.V1Volume object or a dictionary
containing volume configuration with required fields:
- name: Name of the volume
- type: One of "pvc", "secret", "config_map", or "empty_dir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- type: One of "pvc", "secret", "config_map", or "empty_dir"
- type: One of "pvc", "secret", "configmap", or "empty_dir"

Usually, we'll name it as configmap instead of config_map:)

Comment on lines +310 to +311
- For config_map: config_map_name, items (optional), default_mode
(optional), optional (optional)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same naming convention issue.

Comment on lines +579 to +583
elif volume_type == "config_map":
volume = client.V1Volume(
name=volume_name,
config_map=client.V1ConfigMapVolumeSource(
name=storage["volume"].get("config_map_name"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same naming convention issue.

@Electronic-Waste
Copy link
Member

/ok-to-test
/rerun-all

@Electronic-Waste
Copy link
Member

Could you please fix the pre-commit error? Now, you can rerun the CI test with /rerun-all on yourself.

@mahdikhashan
Copy link
Member

@truc0 I'm so sorry for the late response. Here are my initial comments for you.

Please also take a look at this PR if you are available @kubeflow/wg-automl-leads @helenxie-bit @mahdikhashan @Doris-xm

thanks @Electronic-Waste , i already have requested some changes here: #2508 (comment)

cc: @truc0

@mahdikhashan

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Katib Python SDK Specify Volume Mounts
3 participants