TRC Welcome Page

Welcome to the TRC platform! Let's get started by setting up eopod and running some fine-tuning examples.

Note :

EasyDeL is more than just a collection of scripts. It was designed to be both hackable and performant. To unlock the full power of EasyDeL—its performance, speed, and flexibility—we encourage you to write your own code, scripts, or even create customized models and runtime environments. This approach allows you to tailor EasyDeL to your specific needs and fully leverage its capabilities.

Installation and Configuration

First, install eopod using pip:

pip install eopod

note

if you faced any error like eopod not found run following command
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc && source ~/.bashrc

Next, configure eopod with your project details:

eopod configure --project-id YOUR_PROJECT_ID --zone YOUR_ZONE --tpu-name YOUR_TPU_NAME

Install the necessary dependencies and easydel from the latest source:

eopod run pip install tensorflow tensorflow-datasets  # Required for training
eopod run pip install torch --index-url https://download.pytorch.org/whl/cpu  # Required for model conversion
eopod run pip install easydel

Login to your wandb and huggingface account:

eopod run "python -c 'from huggingface_hub import login; login(token=\"<API-TOKEN-HERE>\")'"
eopod run python -m wandb login <API-TOKEN-HERE>

Notes

Each of the following examples includes customizable parameters. To explore available options, use the --help flag. For example:

python -m easydel.scripts.finetune.dpo --help

Fine-Tuning Examples

1. DPO Fine-Tuning

eopod run python -m easydel.scripts.finetune.dpo \
  --repo_id meta-llama/Llama-3.1-8B-Instruct \
  --dataset_name trl-lib/ultrafeedback_binarized \
  --dataset_split "train[:90%]" \
  --refrence_model_repo_id meta-llama/Llama-3.3-70B-Instruct \
  --attn_mechanism auto \
  --beta 0.08 \
  --loss_type sigmoid \
  --max_length 2048 \
  --max_prompt_length 1024 \
  --ref_model_sync_steps 128 \
  --total_batch_size 16 \
  --learning_rate 1e-6 \
  --learning_rate_end 6e-7 \
  --log_steps 50 \
  --shuffle_train_dataset \
  --report_steps 1 \
  --progress_bar_type tqdm \
  --num_train_epochs 3 \
  --auto_shard_states \
  --optimizer adamw \
  --scheduler linear \
  --do_last_save \
  --save_steps 1000 \
  --use_wandb

2. ORPO Fine-Tuning

eopod run python -m easydel.scripts.finetune.orpo \
  --repo_id meta-llama/Llama-3.1-8B-Instruct \
  --dataset_name trl-lib/ultrafeedback_binarized \
  --dataset_split "train" \
  --attn_mechanism auto \
  --beta 0.12 \
  --max_length 2048 \
  --max_prompt_length 1024 \
  --total_batch_size 16 \
  --learning_rate 1e-6 \
  --learning_rate_end 6e-7 \
  --log_steps 50 \
  --shuffle_train_dataset \
  --report_steps 1 \
  --progress_bar_type json \
  --num_train_epochs 3 \
  --auto_shard_states \
  --optimizer adamw \
  --scheduler linear \
  --do_last_save \
  --save_steps 1000 \
  --use_wandb

3. Supervised Fine-Tuning (SFT)

eopod run python -m easydel.scripts.finetune.sft \
  --repo_id Qwen/Qwen2.5-VL-72B-Instruct \
  --dataset_name trl-lib/Capybara \
  --dataset_split "train" \
  --dataset_text_field messages \
  --sharding_axis 1,-1,1,1 \
  --attn_mechanism auto \
  --max_sequence_length 2048 \
  --total_batch_size 16 \
  --learning_rate 1e-6 \
  --learning_rate_end 6e-7 \
  --log_steps 50 \
  --shuffle_train_dataset \
  --report_steps 1 \
  --progress_bar_type json \
  --num_train_epochs 3 \
  --auto_shard_states \
  --optimizer adamw \
  --scheduler linear \
  --do_last_save \
  --save_steps 1000 \
  --use_wandb

4. GRPO GSM8K-OAI Fine-Tuning

eopod run python -m easydel.scripts.finetune.gsm8k_grpo \
  --repo_id meta-llama/Llama-3.1-8B-Instruct \
  --attn_mechanism auto \
  --sharding_axis 1,1,1,-1 \
  --max_prompt_length 2048 \
  --max_completion_length 1024 \
  --beta 0.04 \
  --top_p 0.95 \
  --top_k 50 \
  --num_return_sequences 4 \
  --xml_reward 0.125 \
  --xml_full_match_reward 0.5 \
  --xml_full_match_reject 0.0 \
  --correctness_reward 2.0 \
  --total_batch_size 16 \
  --learning_rate 1e-6 \
  --learning_rate_end 6e-7 \
  --log_steps 50 \
  --shuffle_train_dataset \
  --report_steps 1 \
  --progress_bar_type tqdm \
  --num_train_epochs 3 \
  --auto_shard_states \
  --optimizer adamw \
  --scheduler linear \
  --do_last_save \
  --save_steps 1000 \
  --use_wandb \
  --kv-cache-quantization 8bit

5. Reward Model Training

eopod run python -m easydel.scripts.finetune.reward \
  --repo_id meta-llama/Llama-3.1-8B-Instruct \
  --dataset_name trl-lib/ultrafeedback_binarized \
  --dataset_split "train" \
  --attn_mechanism vanilla \
  --max_sequence_length 2048 \
  --total_batch_size 16 \
  --learning_rate 1e-6 \
  --learning_rate_end 6e-7 \
  --log_steps 50 \
  --shuffle_train_dataset \
  --report_steps 1 \
  --progress_bar_type json \
  --num_train_epochs 3 \
  --auto_shard_states \
  --optimizer adamw \
  --scheduler linear \
  --do_last_save \
  --save_steps 1000 \
  --use_wandb

6. NuminaMath GRPO

eopod run python -m easydel.scripts.finetune.numinamath_grpo \
  --repo_id meta-llama/Llama-3.1-8B-Instruct \
  --attn_mechanism auto \
  --sharding_axis 1,1,1,-1 \
  --max_prompt_length 2048 \
  --max_completion_length 1024 \
  --beta 0.04 \
  --top_p 0.95 \
  --top_k 50 \
  --num_return_sequences 4 \
  --total_batch_size 16 \
  --learning_rate 1e-6 \
  --learning_rate_end 6e-7 \
  --log_steps 50 \
  --shuffle_train_dataset \
  --report_steps 1 \
  --progress_bar_type tqdm \
  --num_train_epochs 3 \
  --auto_shard_states \
  --optimizer adamw \
  --scheduler linear \
  --do_last_save \
  --save_steps 1000 \
  --use_wandb \
  --kv-cache-quantization 8bit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRC_README.md

TRC_README.md

TRC Welcome Page

Installation and Configuration

Notes

Fine-Tuning Examples

1. DPO Fine-Tuning

2. ORPO Fine-Tuning

3. Supervised Fine-Tuning (SFT)

4. GRPO GSM8K-OAI Fine-Tuning

5. Reward Model Training

6. NuminaMath GRPO

Files

TRC_README.md

Latest commit

History

TRC_README.md

File metadata and controls

TRC Welcome Page

Installation and Configuration

Notes

Fine-Tuning Examples

1. DPO Fine-Tuning

2. ORPO Fine-Tuning

3. Supervised Fine-Tuning (SFT)

4. GRPO GSM8K-OAI Fine-Tuning

5. Reward Model Training

6. NuminaMath GRPO