-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
katib: [USERGUIDE] LLM Hyperparameter Optimization API #3952
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @mahdikhashan. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
hi @andreyvelich , shall i keep it under |
Sure, I think we can create a new page for this feature. |
Part of: kubeflow/katib#2339 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
a120413
to
aa3b2be
Compare
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
@helenxie-bit: GitHub didn't allow me to request PR reviews from the following users: kubeflow/wg-automl-leads. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@varodrig @pdarshane @rimolive @hbelmiro Please help with reviewing this documentation page for LLM HP Tuning API.
content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md
Outdated
Show resolved
Hide resolved
Signed-off-by: mahdikhashan <[email protected]>
Thanks for the update! /lgtm |
or the [Kubeflow Katib GitHub](https://github.com/kubeflow/katib/issues). | ||
{{% /alert %}} | ||
|
||
This page describes Large Language Models hyperparameter (HP) optimization Python API that Katib supports and how to configure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
describes how to implement Hyperparameter optimization (HPO) using Python API ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. thank you.
+++ | ||
|
||
{{% alert title="Warning" color="warning" %}} | ||
This feature is in **alpha** stage and the Kubeflow community is looking for your feedback. Please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each web page has a feedback button at the bottom for users to add their feedback and creates an issue if needed.
cc @andreyvelich
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We explicitly added this warning for this guide, since this feature might be unstable, and we want to hear user feedback.
@@ -0,0 +1,351 @@ | |||
+++ | |||
title = "How to Optimize Hyperparameters of LLMs with Kubeflow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion:
**How to implement Hyperparameter optimization (HPO) **
@andreyvelich to add comments on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep this name:
How to Optimize Hyperparameters for LLMs Fine-Tuning with Kubeflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
- [Optimizing Hyperparameters of Large Language Models](#optimizing-hyperparameters-of-large-language-models) | ||
- [Example: Optimizing Hyperparameters of Llama-3.2 for Binary Classification on IMDB Dataset](#example-optimizing-hyperparameters-of-llama-32-for-binary-classification-on-imdb-dataset) | ||
|
||
## Prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank for including the prerequisites. I'm wondering if these prerequisites should be applied to all the docs/components/katib/user-guides/hp-tuning/ and in this case should be listed in this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure - I checked some of other similar docs under Katib, and I'd say for them it may not make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, we don't need it since this Prerequisites explained in the Getting Started guide.
@@ -0,0 +1,351 @@ | |||
+++ | |||
title = "How to Optimize Hyperparameters of LLMs with Kubeflow" | |||
description = "API description" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description could include more information about this page.
Additionally, it will be great to have short paragraph explaining more about this topic, what we are trying to achieve and why. And include a reference to this topic for the audience to learn more about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right - I'll extend it. thanks for reminding me of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| `parallel_trial_count` | Number of trials to run in parallel, set to `2`. | | ||
| `resources_per_trial` | Resources allocated for each trial: 2 GPUs, 4 CPUs, 10GB memory. | | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mahdikhashan if you haven't tested the code yet, we should mark this PR as hold . please let us know. thank you.
@varodrig thanks for you time and help with review it - I'll address your requested changes. nb example issue: kubeflow/katib#2480 there is a in progress pr related to this (regarding e2e tests, its not related specifically on this but I have hold on to incorporate the latest possible changes). |
Signed-off-by: mahdikhashan <[email protected]>
New changes are detected. LGTM label has been removed. |
Signed-off-by: mahdikhashan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this effort @mahdikhashan!
I left a few comments.
@@ -0,0 +1,351 @@ | |||
+++ | |||
title = "How to Optimize Hyperparameters of LLMs with Kubeflow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep this name:
How to Optimize Hyperparameters for LLMs Fine-Tuning with Kubeflow
@@ -0,0 +1,351 @@ | |||
+++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this guide under /user-guides/llm-hp-optimization.md
for now for more visibility.
WDYT @mahdikhashan @helenxie-bit @Electronic-Waste ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. done.
+++ | ||
|
||
{{% alert title="Warning" color="warning" %}} | ||
This feature is in **alpha** stage and the Kubeflow community is looking for your feedback. Please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We explicitly added this warning for this guide, since this feature might be unstable, and we want to hear user feedback.
This page describes how to implement Hyperparameter Optimization (HPO) using Python API that Katib supports and how to configure | ||
it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify this message to say that this page describes how to optimize HPs in the process of LLMs Fine-Tuning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
This page describes how to implement Hyperparameter Optimization (HPO) using Python API that Katib supports and how to configure | ||
it. | ||
|
||
## Sections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this Sections, since website has outline at the right panel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
) | ||
``` | ||
|
||
#### HuggingFaceModelParams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move these sections to the Training Operator doc and cross-reference it from this doc ?
https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/fine-tuning/
|
||
### Key Parameters for LLM Hyperparameter Tuning | ||
|
||
| **Parameter** | **Description** | **Required** | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all of these parameters should be used for LLMs.
Please exclude the ones that can't be used with LLM Trainer (e.g. objective)
secret_key="YOUR_SECRET_KEY" | ||
) | ||
``` | ||
## Optimizing Hyperparameters of Large Language Models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should clearly say that right now user can tune parameters from training_parameters
and lora_config
.
algorithm_name = "random", | ||
max_trial_count = 10, | ||
parallel_trial_count = 2, | ||
resources_per_trial={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, we should use TrainerResource here, isn't it @mahdikhashan @helenxie-bit ?
cl.wait_for_experiment_condition(name=exp_name) | ||
|
||
# Get the best hyperparameters. | ||
print(cl.get_optimal_hyperparameters(exp_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to show output for the Experiment here.
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
ref: #3951