katib: [USERGUIDE] LLM Hyperparameter Optimization API #3952

mahdikhashan · 2025-01-07T09:24:45Z

google-oss-prow · 2025-01-07T09:24:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gaocegege for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

content/en/docs/components/katib/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow · 2025-01-07T09:24:55Z

Hi @mahdikhashan. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mahdikhashan · 2025-01-07T09:25:33Z

hi @andreyvelich , shall i keep it under user-guides/hp-tuning/?

andreyvelich · 2025-01-07T14:30:58Z

Sure, I think we can create a new page for this feature.
FYI, please follow the contribution guide to sign the commits: https://www.kubeflow.org/docs/about/contributing/#getting-started
cc @helenxie-bit

andreyvelich · 2025-01-07T14:31:25Z

Part of: kubeflow/katib#2339

Arhell

/ok-to-test

Signed-off-by: mahdikhashan <[email protected]>

google-oss-prow · 2025-01-18T21:23:52Z

@helenxie-bit: GitHub didn't allow me to request PR reviews from the following users: kubeflow/wg-automl-leads.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Thanks for the great contribution!

/lgtm

/cc @kubeflow/wg-automl-leads @andreyvelich @Arhell

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

andreyvelich

@varodrig @pdarshane @rimolive @hbelmiro Please help with reviewing this documentation page for LLM HP Tuning API.

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

Signed-off-by: mahdikhashan <[email protected]>

helenxie-bit · 2025-01-27T19:15:04Z

Thanks for the update!

/lgtm

varodrig · 2025-02-04T04:32:56Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+or the [Kubeflow Katib GitHub](https://github.com/kubeflow/katib/issues).
+{{% /alert %}}
+
+This page describes Large Language Models hyperparameter (HP) optimization Python API that Katib supports and how to configure


describes how to implement Hyperparameter optimization (HPO) using Python API ...

done. thank you.

varodrig · 2025-02-04T04:35:53Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+++
+
+{{% alert title="Warning" color="warning" %}}
+This feature is in **alpha** stage and the Kubeflow community is looking for your feedback. Please


each web page has a feedback button at the bottom for users to add their feedback and creates an issue if needed.
cc @andreyvelich

We explicitly added this warning for this guide, since this feature might be unstable, and we want to hear user feedback.

varodrig · 2025-02-09T22:25:21Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

@@ -0,0 +1,351 @@
+++
+title = "How to Optimize Hyperparameters of LLMs with Kubeflow"


suggestion:

**How to implement Hyperparameter optimization (HPO) **

@andreyvelich to add comments on this.

Should we keep this name:

How to Optimize Hyperparameters for LLMs Fine-Tuning with Kubeflow

varodrig · 2025-02-09T22:33:11Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+- [Optimizing Hyperparameters of Large Language Models](#optimizing-hyperparameters-of-large-language-models)
+- [Example: Optimizing Hyperparameters of Llama-3.2 for Binary Classification on IMDB Dataset](#example-optimizing-hyperparameters-of-llama-32-for-binary-classification-on-imdb-dataset)
+
+## Prerequisites


thank for including the prerequisites. I'm wondering if these prerequisites should be applied to all the docs/components/katib/user-guides/hp-tuning/ and in this case should be listed in this page.

I'm not sure - I checked some of other similar docs under Katib, and I'd say for them it may not make sense.

Usually, we don't need it since this Prerequisites explained in the Getting Started guide.

varodrig · 2025-02-09T22:35:24Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

@@ -0,0 +1,351 @@
+++
+title = "How to Optimize Hyperparameters of LLMs with Kubeflow"
+description = "API description"


The description could include more information about this page.
Additionally, it will be great to have short paragraph explaining more about this topic, what we are trying to achieve and why. And include a reference to this topic for the audience to learn more about it.

yes, you are right - I'll extend it. thanks for reminding me of this.

varodrig · 2025-02-09T22:50:47Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+| `parallel_trial_count`     | Number of trials to run in parallel, set to `2`.                     |
+| `resources_per_trial`      | Resources allocated for each trial: 2 GPUs, 4 CPUs, 10GB memory.    |
+
+```python


@mahdikhashan if you haven't tested the code yet, we should mark this PR as hold . please let us know. thank you.

mahdikhashan · 2025-02-10T16:33:41Z

@varodrig thanks for you time and help with review it - I'll address your requested changes.
regarding the code, we have a nb example and we (@helenxie-bit) are collaborating on it together.

nb example issue: kubeflow/katib#2480

there is a in progress pr related to this (regarding e2e tests, its not related specifically on this but I have hold on to incorporate the latest possible changes).

Signed-off-by: mahdikhashan <[email protected]>

google-oss-prow · 2025-02-12T06:42:10Z

New changes are detected. LGTM label has been removed.

Signed-off-by: mahdikhashan <[email protected]>

andreyvelich

Thank you for this effort @mahdikhashan!
I left a few comments.

andreyvelich · 2025-02-13T17:18:54Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

@@ -0,0 +1,351 @@
+++
+title = "How to Optimize Hyperparameters of LLMs with Kubeflow"


Should we keep this name:

How to Optimize Hyperparameters for LLMs Fine-Tuning with Kubeflow

andreyvelich · 2025-02-13T17:20:28Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

@@ -0,0 +1,351 @@
+++


I would keep this guide under /user-guides/llm-hp-optimization.md for now for more visibility.
WDYT @mahdikhashan @helenxie-bit @Electronic-Waste ?

agreed. done.

andreyvelich · 2025-02-13T17:21:14Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+++
+
+{{% alert title="Warning" color="warning" %}}
+This feature is in **alpha** stage and the Kubeflow community is looking for your feedback. Please


We explicitly added this warning for this guide, since this feature might be unstable, and we want to hear user feedback.

andreyvelich · 2025-02-13T17:22:09Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+This page describes how to implement Hyperparameter Optimization (HPO) using Python API that Katib supports and how to configure
+it.


Modify this message to say that this page describes how to optimize HPs in the process of LLMs Fine-Tuning.

andreyvelich · 2025-02-13T17:22:38Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+This page describes how to implement Hyperparameter Optimization (HPO) using Python API that Katib supports and how to configure
+it.
+
+## Sections


We can remove this Sections, since website has outline at the right panel.

andreyvelich · 2025-02-13T17:26:29Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+)
+```
+
+#### HuggingFaceModelParams


Can we move these sections to the Training Operator doc and cross-reference it from this doc ?
https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/fine-tuning/

andreyvelich · 2025-02-13T17:28:54Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+
+### Key Parameters for LLM Hyperparameter Tuning
+
+| **Parameter**                   | **Description**                                                                 | **Required** |


Not all of these parameters should be used for LLMs.
Please exclude the ones that can't be used with LLM Trainer (e.g. objective)

andreyvelich · 2025-02-13T17:30:52Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+    secret_key="YOUR_SECRET_KEY"
+)
+```
+## Optimizing Hyperparameters of Large Language Models


We should clearly say that right now user can tune parameters from training_parameters and lora_config.

andreyvelich · 2025-02-13T17:31:38Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+	algorithm_name = "random",
+	max_trial_count = 10,
+	parallel_trial_count = 2,
+	resources_per_trial={


I guess, we should use TrainerResource here, isn't it @mahdikhashan @helenxie-bit ?

andreyvelich · 2025-02-13T17:31:54Z

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md

+cl.wait_for_experiment_condition(name=exp_name)
+
+# Get the best hyperparameters.
+print(cl.get_optimal_hyperparameters(exp_name))


We need to show output for the Experiment here.

Signed-off-by: mahdikhashan <[email protected]>

google-oss-prow bot added the do-not-merge/work-in-progress label Jan 7, 2025

google-oss-prow bot requested review from andreyvelich and johnugeorge January 7, 2025 09:24

google-oss-prow bot added the needs-ok-to-test label Jan 7, 2025

google-oss-prow bot added the size/S label Jan 7, 2025

Arhell reviewed Jan 8, 2025

View reviewed changes

google-oss-prow bot added ok-to-test and removed needs-ok-to-test labels Jan 8, 2025

mahdikhashan added 2 commits January 8, 2025 10:47

add base md

181719d

Signed-off-by: mahdikhashan <[email protected]>

update title and description

aa3b2be

Signed-off-by: mahdikhashan <[email protected]>

mahdikhashan force-pushed the llm-hp-optimization branch from a120413 to aa3b2be Compare January 8, 2025 09:48

google-oss-prow bot added size/M size/L and removed size/S size/M labels Jan 11, 2025

mahdikhashan marked this pull request as ready for review January 11, 2025 18:26

google-oss-prow bot removed the do-not-merge/work-in-progress label Jan 11, 2025

google-oss-prow bot requested a review from sperlingxx January 11, 2025 18:26

mahdikhashan added 7 commits January 11, 2025 19:47

add draft code

36f2c1e

Signed-off-by: mahdikhashan <[email protected]>

add prerequisites

b7120e0

Signed-off-by: mahdikhashan <[email protected]>

add huggingface api details,s3 api, update example

182b493

Signed-off-by: mahdikhashan <[email protected]>

remove redundant text

21646a8

Signed-off-by: mahdikhashan <[email protected]>

add HuggingFaceTrainerParams description

cc71dee

Signed-off-by: mahdikhashan <[email protected]>

update prerequisites

56e4e53

Signed-off-by: mahdikhashan <[email protected]>

update code example

6f36f1e

Signed-off-by: mahdikhashan <[email protected]>

google-oss-prow bot requested a review from Arhell January 18, 2025 21:23

google-oss-prow bot assigned helenxie-bit Jan 18, 2025

google-oss-prow bot added the lgtm label Jan 18, 2025

andreyvelich reviewed Jan 23, 2025

View reviewed changes

helenxie-bit reviewed Jan 24, 2025

View reviewed changes

content/en/docs/components/katib/user-guides/hp-tuning/llm-hp-optimization.md Outdated Show resolved Hide resolved

helenxie-bit mentioned this pull request Jan 27, 2025

[GSoC] Add e2e test for tune api with LLM hyperparameter optimization kubeflow/katib#2420

Open

1 task

update prerequisites

62dc499

Signed-off-by: mahdikhashan <[email protected]>

google-oss-prow bot removed the lgtm label Jan 27, 2025

google-oss-prow bot added the lgtm label Jan 27, 2025

helenxie-bit mentioned this pull request Jan 27, 2025

[GSoC] Project 4: Hyperparameter Optimization API in Katib for LLMs kubeflow/katib#2339

Open

6 tasks

varodrig reviewed Feb 10, 2025

View reviewed changes

update page into description

a07ccf7

Signed-off-by: mahdikhashan <[email protected]>

google-oss-prow bot removed the lgtm label Feb 12, 2025

update page description

0005734

Signed-off-by: mahdikhashan <[email protected]>

mahdikhashan requested review from varodrig and helenxie-bit February 12, 2025 07:05

andreyvelich changed the title ~~[USERGUIDE] LLM Hyperparameter Optimization API~~ katib: [USERGUIDE] LLM Hyperparameter Optimization API Feb 13, 2025

andreyvelich reviewed Feb 13, 2025

View reviewed changes

mahdikhashan added 6 commits February 14, 2025 12:59

update page title and adjust description letter

a52454b

Signed-off-by: mahdikhashan <[email protected]>

move the doc to the parent folder

dc2ef16

Signed-off-by: mahdikhashan <[email protected]>

remove section

bda7bb6

Signed-off-by: mahdikhashan <[email protected]>

modify message of the page

d99ca22

Signed-off-by: mahdikhashan <[email protected]>

fix typo

108c8a0

Signed-off-by: mahdikhashan <[email protected]>

fix typo

201afc8

Signed-off-by: mahdikhashan <[email protected]>

mahdikhashan mentioned this pull request Feb 16, 2025

Add mahdi khashan as a member kubeflow/internal-acls#751

Merged

mention Training Operator

b253c94

Signed-off-by: mahdikhashan <[email protected]>

		@@ -0,0 +1,351 @@
		+++
		title = "How to Optimize Hyperparameters of LLMs with Kubeflow"

		This page describes how to implement Hyperparameter Optimization (HPO) using Python API that Katib supports and how to configure
		it.


		### Key Parameters for LLM Hyperparameter Tuning

		\| Parameter \| Description \| Required \|

katib: [USERGUIDE] LLM Hyperparameter Optimization API #3952

Are you sure you want to change the base?

katib: [USERGUIDE] LLM Hyperparameter Optimization API #3952

Conversation

mahdikhashan commented Jan 7, 2025

google-oss-prow bot commented Jan 7, 2025

google-oss-prow bot commented Jan 7, 2025

mahdikhashan commented Jan 7, 2025 • edited Loading

andreyvelich commented Jan 7, 2025

andreyvelich commented Jan 7, 2025

Arhell left a comment

Choose a reason for hiding this comment

google-oss-prow bot commented Jan 18, 2025

andreyvelich left a comment

Choose a reason for hiding this comment

helenxie-bit commented Jan 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mahdikhashan commented Feb 10, 2025

google-oss-prow bot commented Feb 12, 2025

andreyvelich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mahdikhashan commented Jan 7, 2025 •

edited

Loading