You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So what are some ideas to incorporate reinforcement learning. Are there any projects that look promising and could work to have thumbs up/thumbs down for responses and personalize your LLM to you.
Perhaps just collecting the rating data from the chat and then using the "good" generations to train a LoRA might be a better method? Especially since 4bit loras are working and the llama repo for them can be used to load (and probably train) more than just llama with a bit of modification.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
So what are some ideas to incorporate reinforcement learning. Are there any projects that look promising and could work to have thumbs up/thumbs down for responses and personalize your LLM to you.
I found some stuff such as this: https://github.com/allenai/RL4LMs and TRL (used like:https://huggingface.co/blog/trl-peft).
Perhaps just collecting the rating data from the chat and then using the "good" generations to train a LoRA might be a better method? Especially since 4bit loras are working and the llama repo for them can be used to load (and probably train) more than just llama with a bit of modification.
Good idea? Bad idea?
Beta Was this translation helpful? Give feedback.
All reactions