Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Add unit tests for tune API #2423

Merged
merged 27 commits into from
Jan 24, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
5dc6fc5
add unit tests for tune api
helenxie-bit Sep 5, 2024
04a7e39
update
helenxie-bit Sep 5, 2024
8c4d65a
fix format
helenxie-bit Sep 5, 2024
b0195a6
update unit tests and fix api errors
helenxie-bit Sep 5, 2024
a92de67
fix format
helenxie-bit Sep 5, 2024
7b7e347
test
helenxie-bit Sep 5, 2024
e4f7922
test
helenxie-bit Sep 5, 2024
e621fc6
update unit tests
helenxie-bit Sep 9, 2024
9c0a9e6
undo changes to Makefile
helenxie-bit Sep 9, 2024
f5c4bce
delete debug code
helenxie-bit Sep 9, 2024
5ddcc30
fix format
helenxie-bit Sep 9, 2024
4909456
update unit test
helenxie-bit Sep 11, 2024
1e78840
fix format
helenxie-bit Sep 11, 2024
e68fe38
update the version of training operator
helenxie-bit Sep 12, 2024
d3a3404
adjust 'list_namespaced_persistent_volume_claim' to be called with ke…
helenxie-bit Oct 9, 2024
6d5c20e
create constant for namespace when check pvc creation error
helenxie-bit Oct 9, 2024
b25f7ba
add type check for 'trainer_parameters'
helenxie-bit Oct 9, 2024
3ebbe76
fix format
helenxie-bit Oct 9, 2024
0498237
update test names
helenxie-bit Oct 10, 2024
15f6a7a
fix format
helenxie-bit Oct 10, 2024
86db6d5
add verification for key Experiment information & add 'kubeflow-train…
helenxie-bit Oct 22, 2024
b24b44f
rerun tests
helenxie-bit Oct 22, 2024
5dfd1a3
add verification for objective metric name
helenxie-bit Jan 23, 2025
a0dbeeb
resolve conflict
helenxie-bit Jan 23, 2025
018ec33
delete unnecessary changes
helenxie-bit Jan 23, 2025
8cd3d0c
unify objective function
helenxie-bit Jan 23, 2025
b1d07ce
unify objective function
helenxie-bit Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .github/workflows/test-python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,17 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.11
python-version: '3.10'

- name: Install Katib SDK
shell: bash
run: pip install --prefer-binary -e sdk/python/v1beta1

- name: Install Training Operator SDK
shell: bash
run: |
pip install git+https://github.com/kubeflow/[email protected]#subdirectory=sdk/python
pip install peft==0.3.0 datasets==2.15.0 transformers==4.38.0

- name: Run Python test
run: make pytest
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ pytest: prepare-pytest prepare-pytest-testdata
pytest ./test/unit/v1beta1/suggestion --ignore=./test/unit/v1beta1/suggestion/test_skopt_service.py
pytest ./test/unit/v1beta1/earlystopping
pytest ./test/unit/v1beta1/metricscollector
pytest ./test/unit/v1beta1/tune-api
cp ./pkg/apis/manager/v1beta1/python/api_pb2.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2.py
cp ./pkg/apis/manager/v1beta1/python/api_pb2_grpc.py ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
sed -i "s/api_pb2/kubeflow\.katib\.katib_api_pb2/g" ./sdk/python/v1beta1/kubeflow/katib/katib_api_pb2_grpc.py
Expand Down
18 changes: 15 additions & 3 deletions sdk/python/v1beta1/kubeflow/katib/api/katib_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,9 @@ class name in this argument.

# If users choose to use a custom objective function.
if objective is not None:
if not base_image or not parameters:
raise ValueError("One of the required parameters is None")

# Add metrics collector to the Katib Experiment.
# Up to now, we only support parameter `kind`, of which default value
# is `StdOut`, to specify the kind of metrics collector.
Expand Down Expand Up @@ -633,6 +636,8 @@ class name in this argument.
model_provider_parameters.model_uri,
"--transformer_type",
model_provider_parameters.transformer_type.__name__,
"--num_labels",
str(model_provider_parameters.num_labels),
"--model_dir",
VOLUME_PATH_MODEL,
"--dataset_dir",
Expand All @@ -643,7 +648,11 @@ class name in this argument.
f"'{training_args}'",
],
volume_mounts=[STORAGE_INITIALIZER_VOLUME_MOUNT],
resources=resources_per_trial.resources_per_worker,
resources=(
resources_per_trial.resources_per_worker
if resources_per_trial
else None
),
)

# Create the worker and the master pod.
Expand Down Expand Up @@ -677,7 +686,10 @@ class name in this argument.
),
)

if resources_per_trial.num_procs_per_worker:
if (
resources_per_trial is not None
and resources_per_trial.num_procs_per_worker
):
pytorchjob.spec.nproc_per_node = str(
resources_per_trial.num_procs_per_worker
)
Expand All @@ -689,7 +701,7 @@ class name in this argument.
)
)

if resources_per_trial.num_workers > 1:
if resources_per_trial is not None and resources_per_trial.num_workers > 1:
pytorchjob.spec.pytorch_replica_specs["Worker"] = (
training_models.KubeflowOrgV1ReplicaSpec(
replicas=resources_per_trial.num_workers - 1,
Expand Down
Loading
Loading