-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Created a cookbook that walks you through finetuning a Model with GRPO #1559
base: master
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
thanks! @apokryphosx Look good to me,but there are something tha still need to be improved: https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#pull-request-item-stage |
2.can we add some chat history when using this GRPO model? |
Sure thing! I'll take care off it |
Thanks for the contribution @apokryphosx ! Could you leave the link of the colab notebook and make it public? That would be helpful for the review |
Description
I added a cookbook that walks a user through finetuning with GRPO
Motivation and Context
Finetuning Agents with RL is a necessary step towards AGI, and GRPO has emerged as a compute cheap alternative to PPO.
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!