docs: Created a cookbook that walks you through finetuning a Model with GRPO #1559

apokryphosx · 2025-02-06T02:14:13Z

Description

I added a cookbook that walks a user through finetuning with GRPO

Motivation and Context

Finetuning Agents with RL is a necessary step towards AGI, and GRPO has emerged as a compute cheap alternative to PPO.

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
[ x] New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of example)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

[ x] I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.

review-notebook-app · 2025-02-06T02:14:17Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

zjrwtx · 2025-02-06T10:22:54Z

thanks！ @apokryphosx Look good to me,but there are something tha still need to be improved:
1.about the pr title, it would be better: docs:Finetuning a Model with GRPO

please refer to the contributing docs:

https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#pull-request-item-stage

zjrwtx · 2025-02-06T10:35:19Z

2.can we add some chat history when using this GRPO model?
3.can we add the final preview when upload the model to huggingface?
4.can we have a directly use demo from the huggingface model which we upload?

apokryphosx · 2025-02-06T21:07:51Z

2.can we add some chat history when using this GRPO model? 3.can we add the final preview when upload the model to huggingface? 4.can we have a directly use demo from the huggingface model which we upload?

Sure thing! I'll take care off it

Wendong-Fan · 2025-02-10T16:07:17Z

Thanks for the contribution @apokryphosx ! Could you leave the link of the colab notebook and make it public? That would be helpful for the review

Created a cookbook that walks you through finetuning a Model with GRPO

f46ec63

zjrwtx assigned zjrwtx and unassigned zjrwtx Feb 6, 2025

zjrwtx self-requested a review February 6, 2025 10:19

Wendong-Fan assigned apokryphosx Feb 6, 2025

Wendong-Fan added the Data Related to camel data processing label Feb 6, 2025

Wendong-Fan added this to the Sprint 22 milestone Feb 6, 2025

Wendong-Fan linked an issue Feb 6, 2025 that may be closed by this pull request

[Feature Request] Implement GRPO, PPO and potentially other policy gradient methods to finetune LM Agents #1528

Open

2 tasks

Wendong-Fan changed the title ~~Created a cookbook that walks you through finetuning a Model with GRPO~~ docs: Created a cookbook that walks you through finetuning a Model with GRPO Feb 6, 2025

Wendong-Fan added the cookbook label Feb 9, 2025

Wendong-Fan requested a review from mohamadkav February 10, 2025 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Created a cookbook that walks you through finetuning a Model with GRPO #1559

docs: Created a cookbook that walks you through finetuning a Model with GRPO #1559

apokryphosx commented Feb 6, 2025 •

edited

Loading

review-notebook-app bot commented Feb 6, 2025

zjrwtx commented Feb 6, 2025 •

edited

Loading

zjrwtx commented Feb 6, 2025

apokryphosx commented Feb 6, 2025

Wendong-Fan commented Feb 10, 2025 •

edited

Loading

docs: Created a cookbook that walks you through finetuning a Model with GRPO #1559

Are you sure you want to change the base?

docs: Created a cookbook that walks you through finetuning a Model with GRPO #1559

Conversation

apokryphosx commented Feb 6, 2025 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

review-notebook-app bot commented Feb 6, 2025

zjrwtx commented Feb 6, 2025 • edited Loading

zjrwtx commented Feb 6, 2025

apokryphosx commented Feb 6, 2025

Wendong-Fan commented Feb 10, 2025 • edited Loading

apokryphosx commented Feb 6, 2025 •

edited

Loading

zjrwtx commented Feb 6, 2025 •

edited

Loading

Wendong-Fan commented Feb 10, 2025 •

edited

Loading