Newbie Qs: RLHF fine-tuning & dataset #6587

vtharmalingam · 2024-12-28T20:06:22Z

vtharmalingam
Dec 28, 2024

First off, thank you for the awesome library!!

I want to train Qwen for RLHF fine-tuning

Here is the use case context: My LLM is responding to user queries, and both query and response are tracked for human validation. The human feedback is given as a scalar value between 0 and 1. That makes up the dataset for fine-tuning the model.

So the question here is:

What is the acceptable dataset format? Will the below format work for the finetuning? Also, please throw some lights as to how the dataset structure/format is flexible enough if I need to add an additional key/value in the JSON for my domain/context needs—does it give such flexibility? If yes, which Python file or configuration do I need to edit the new field?

```json	
[
        {
            "query": "What are the benefits of regular exercise?",
            "response": "Regular exercise boosts physical health, improves mental health, and enhances overall well-being. It helps in weight management and reduces the risk of chronic diseases.",
            "feedback": 0.9
        },
        {
            "query": "Explain the theory of relativity in simple terms.",
            "response": "The theory of relativity states that the laws of physics are the same for all non-accelerating observers, and that the speed of light is constant no matter how fast you are moving. It includes both special and general relativity.",
            "feedback": 0.8
        },
```

Thanks,
Tharma

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newbie Qs: RLHF fine-tuning & dataset #6587

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Newbie Qs: RLHF fine-tuning & dataset #6587

vtharmalingam Dec 28, 2024

Replies: 0 comments

vtharmalingam
Dec 28, 2024