- This is a fork of RLHF-Reward-Modeling
- support models which can handle Japanese
- support Unsloth, which reduce VRAM when training and accelerte training efficiency
- support wandb
model | support |
---|---|
google/gemma-2b-it | ✅ |
llm-jp/llm-jp-3-1.8b-instruct | ✅ |
dataset | support |
---|---|
hendrydong/preference_700K | ✅ |
xxxx | - |
git clone https://github.com/ohashi3399/RLHF-Reward-Modeling.git && cd RLHF-Reward-Modeling
export HUGGINGFACE_API_KEY=<Your HUGGINGFACE_API token>
export WANDB_API_KEY=<Your WANDB_API token>
source setup.sh && cd bradley-terry-rm
source tune_bt_rm.sh