From 49f5f02c377ee9e09fb9f2d0ae942bae05181beb Mon Sep 17 00:00:00 2001
From: Electronic-Waste <2690692950@qq.com>
Date: Wed, 19 Feb 2025 03:56:59 +0000
Subject: [PATCH] doc: update proposal & move FSDP config to alternatives.

Signed-off-by: Electronic-Waste <2690692950@qq.com>
---
 docs/proposals/2401-llm-trainer-v2/README.md | 72 ++++++++++----------
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/docs/proposals/2401-llm-trainer-v2/README.md b/docs/proposals/2401-llm-trainer-v2/README.md
index fde19eb7c1..5da6f1757d 100644
--- a/docs/proposals/2401-llm-trainer-v2/README.md
+++ b/docs/proposals/2401-llm-trainer-v2/README.md
@@ -41,11 +41,9 @@ By now, Kubeflow Training V1 has implemented a [Trainer for LLM](../2003-train-a
 `torchtune` is a PyTorch-native library for easily authoring, fine-tuning and experimenting with LLMs. It provides rich support for LLM fine-tuning:
 
 1. Modular native-PyTorch implementations of popular LLMs
-2. Training recipes for a variety of fine-tuning techniques
+2. Training [recipes](https://pytorch.org/torchtune/main/overview.html#key-concepts) for a variety of fine-tuning techniques
 3. Support for distributed training using [FSDP2](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md)
-4. YAML configs for easily configuring training runs
-
-`torchtune` is something like our LLM Trainer, because its [core concepts](https://pytorch.org/torchtune/main/overview.html#key-concepts) "recipes" and "configs" can be easily corresponded to our “LLM Trainer Script” and “[Trainer field in TrainJob](https://github.com/kubeflow/training-operator/blob/cf741267f8f8ec96592178532b6787bab3f11110/pkg/apis/kubeflow.org/v2alpha1/trainjob_types.go#L110-L111)”. **It’s the easiest way for us to implement the LLM Trainer**.
+4. YAML [configs](https://pytorch.org/torchtune/main/overview.html#key-concepts) for easily configuring training runs
 
 An example for using `torchtune`:
 
@@ -73,6 +71,8 @@ INFO:torchtune.utils.logging:Learning rate scheduler is initialized.
 1|52|Loss: 2.3697006702423096:   0%|▏                     | 52/25880 [00:24<3:55:01,  1.83it/s]
 ```
 
+By adopting `torchtune` as the low-level runtime for LLM fine-tuning, we can easily obtain the flexibility, efficiency and scalability brought by its unique "recipe-config" design, which will surely streamline and scale LLM fine-tuning on Kubernetes.
+
 ## Design Details
 
 ### `torchtune` Plugin
@@ -169,38 +169,6 @@ class QLoraConfig:
 
 ```
 
-#### FSDP Config
-
-The *FsdpConfig* represents the config of FSDP we use to fine-tune the model.
-
-| Parameters | What is it? |
-| - | - |
-| mixed_precision | Whether to enable mixed precision training |
-| use_fp16 | Whether to use FP16 during the mixed precision training |
-| fsdp_cpu_offload | Whether to offload some weights and optimizer states to cpu |
-| sharding_strategy | The sharding strategy for FSDP, e.g. FULL_SHARD (default), HYBRID_SHARD, SHARD_GRAD_OP, NO_SHARD. |
-| hsdp | Whether to enable Hybrid Shard Data Parallel (HSDP) |
-| sharding_group_size | Specify the GPU num in the sharding group when hsdp set to true |
-| replica_group_size | The number of sharding groups |
-| checkpoint_type | Specify the type of model checkpoints |
-| fsdp_activation_checkpointing | Whether to enable Activation Checkpointing |
-
-```python
-# FsdpConfig DataClass
-@dataclass
-class FsdpConfig:
-    mixed_precision: bool = True
-    use_fp16: bool = False
-    fsdp_cpu_offload: bool=False
-    sharding_strategy: ShardingStrategy = ShardingStrategy.FULL_SHARD
-    hsdp: bool = False
-    sharding_group_size: int = 0 # requires hsdp to be set.
-    replica_group_size: int = 0 #requires hsdp to be set.
-    checkpoint_type: StateDictType = StateDictType.SHARDED_STATE_DICT
-    fsdp_activation_checkpointing: bool = True
-
-```
-
 ## Implementation History
 
 - 2025-01-31: Create KEP-2401 doc
@@ -466,6 +434,38 @@ class PrefixConfig:
 
 ```
 
+**FSDP Config(TBD)**
+
+The *FsdpConfig* represents the config of FSDP we use to fine-tune the model.
+
+| Parameters | What is it? |
+| - | - |
+| mixed_precision | Whether to enable mixed precision training |
+| use_fp16 | Whether to use FP16 during the mixed precision training |
+| fsdp_cpu_offload | Whether to offload some weights and optimizer states to cpu |
+| sharding_strategy | The sharding strategy for FSDP, e.g. FULL_SHARD (default), HYBRID_SHARD, SHARD_GRAD_OP, NO_SHARD. |
+| hsdp | Whether to enable Hybrid Shard Data Parallel (HSDP) |
+| sharding_group_size | Specify the GPU num in the sharding group when hsdp set to true |
+| replica_group_size | The number of sharding groups |
+| checkpoint_type | Specify the type of model checkpoints |
+| fsdp_activation_checkpointing | Whether to enable Activation Checkpointing |
+
+```python
+# FsdpConfig DataClass
+@dataclass
+class FsdpConfig:
+    mixed_precision: bool = True
+    use_fp16: bool = False
+    fsdp_cpu_offload: bool=False
+    sharding_strategy: ShardingStrategy = ShardingStrategy.FULL_SHARD
+    hsdp: bool = False
+    sharding_group_size: int = 0 # requires hsdp to be set.
+    replica_group_size: int = 0 #requires hsdp to be set.
+    checkpoint_type: StateDictType = StateDictType.SHARDED_STATE_DICT
+    fsdp_activation_checkpointing: bool = True
+
+```
+
 **ZeRO Config(TBD)**
 
 The *ZeroConfig* represents the config of DeepSeed ZeRO we use to fine-tune the model.