Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding the evaluation of rollouts on SFT training set using WebArena-Lite reward function #40

Closed
weizhepei opened this issue Feb 10, 2025 · 4 comments

Comments

@weizhepei
Copy link

weizhepei commented Feb 10, 2025

@QZH-777 @Xiao9905 Hi Zehan and Xiao, thanks for the great work! I have some questions regarding the evaluation of rollouts on SFT training set using WebArena-Lite reward function. As discussed in issue-24, $D_0$ is provided in WebArena-Lite_info.json, however there was no reference answer/built-in WebArena-Lite reward functions associated with $D_0$ for evaluation.

Am I missing something here? If not, can you explain a bit about how the evaluation process described in Line 3 of Algorithm 1 was done?

Image

@QZH-777
Copy link
Collaborator

QZH-777 commented Feb 13, 2025

You can use the ORM we provide to score, the specific prompt can refer to issue 22

@QZH-777 QZH-777 closed this as completed Feb 13, 2025
@weizhepei
Copy link
Author

Thanks for the response! However, I'm still a bit confused about the implication here: are you suggesting that you were using the ORM to evaluate $D_{rollout}$ in Line 3, instead of using the WebArena-Lite reward functions?

In that case, I think this would be inconsistent with what's described in Algorithm 1?

@QZH-777
Copy link
Collaborator

QZH-777 commented Feb 14, 2025

The training data for the ORM consists of rollouts from WebArena-Lite tasks, with success determined by the corresponding reward functions. The ORM can potentially serve as an alternative to the reward function. For certain reasons, we do not plan to publicly release the WebArena-Lite data along with the reward function at this time.

@weizhepei
Copy link
Author

Thank you for the clarification and being transparent about the situation!

Looking forward to the potential future release of these resources, which would greatly help the community accurately reproduce the training process of WebRL and contribute to further extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants