RLHF. Personalization and Reinforcement Learning. #679

Ph0rk0z · 2023-03-31T12:26:27Z

Ph0rk0z
Mar 31, 2023

So what are some ideas to incorporate reinforcement learning. Are there any projects that look promising and could work to have thumbs up/thumbs down for responses and personalize your LLM to you.

I found some stuff such as this: https://github.com/allenai/RL4LMs and TRL (used like:https://huggingface.co/blog/trl-peft).

Perhaps just collecting the rating data from the chat and then using the "good" generations to train a LoRA might be a better method? Especially since 4bit loras are working and the llama repo for them can be used to load (and probably train) more than just llama with a bit of modification.

Good idea? Bad idea?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RLHF. Personalization and Reinforcement Learning. #679

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

RLHF. Personalization and Reinforcement Learning. #679

Uh oh!

Ph0rk0z Mar 31, 2023

Replies: 0 comments

Ph0rk0z
Mar 31, 2023