Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] TinyZero reproduction of R1-Zero: experience the Ahah moment yourself for < $30 #1323

Open
brragorn opened this issue Jan 31, 2025 · 1 comment
Labels
Feature triage This issue needs review by the core team.

Comments

@brragorn
Copy link
Contributor

Feature request

From: https://x.com/karpathy/status/1884678601704169965
TinyZero reproduction of R1-Zero
"experience the Ahah moment yourself for < $30"

Given a base model, the RL finetuning can be relatively very cheap and quite accessible.

From: https://x.com/jiayi_pirate/status/1882839370505621655
We reproduced DeepSeek R1-Zero in the CountDown game, and it just works

Through RL, the 3B base LM develops self-verification and search abilities all on its own

You can experience the Ahah moment yourself for < $30
Code: http://github.com/Jiayi-Pan/TinyZero

Motivation / references

is it possible for us to have quick notebooks that show this kind of work - https://x.com/karpathy/status/1884678601704169965 my guess is that many folks are going to be trying this and if we can make it very easy that will create a lot of buzz.

Your contribution

linguistic

@taenin taenin added Feature triage This issue needs review by the core team. labels Jan 31, 2025
@oelachqar oelachqar changed the title [Feature]: TinyZero reproduction of R1-Zero: experience the Ahah moment yourself for < $30 [Feature] TinyZero reproduction of R1-Zero: experience the Ahah moment yourself for < $30 Feb 4, 2025
@oelachqar oelachqar removed their assignment Feb 4, 2025
@eric-haibin-lin
Copy link

the verl community also would love to see verl used/integrated in oumi-ai!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature triage This issue needs review by the core team.
Projects
None yet
Development

No branches or pull requests

4 participants