Newbie Qs: RLHF fine-tuning & dataset #6587
Unanswered
vtharmalingam
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First off, thank you for the awesome library!!
I want to train Qwen for RLHF fine-tuning
Here is the use case context: My LLM is responding to user queries, and both query and response are tracked for human validation. The human feedback is given as a scalar value between 0 and 1. That makes up the dataset for fine-tuning the model.
So the question here is:
What is the acceptable dataset format? Will the below format work for the finetuning? Also, please throw some lights as to how the dataset structure/format is flexible enough if I need to add an additional key/value in the JSON for my domain/context needs—does it give such flexibility? If yes, which Python file or configuration do I need to edit the new field?
Thanks,
Tharma
Beta Was this translation helpful? Give feedback.
All reactions