Questions regarding the evaluation of rollouts on SFT training set using WebArena-Lite reward function #40

weizhepei · 2025-02-10T23:51:47Z

@QZH-777 @Xiao9905 Hi Zehan and Xiao, thanks for the great work! I have some questions regarding the evaluation of rollouts on SFT training set using WebArena-Lite reward function. As discussed in issue-24, $D_0$ is provided in WebArena-Lite_info.json, however there was no reference answer/built-in WebArena-Lite reward functions associated with $D_0$ for evaluation.

Am I missing something here? If not, can you explain a bit about how the evaluation process described in Line 3 of Algorithm 1 was done?

QZH-777 · 2025-02-13T13:37:48Z

You can use the ORM we provide to score, the specific prompt can refer to issue 22

weizhepei · 2025-02-13T18:58:36Z

Thanks for the response! However, I'm still a bit confused about the implication here: are you suggesting that you were using the ORM to evaluate $D_{rollout}$ in Line 3, instead of using the WebArena-Lite reward functions?

In that case, I think this would be inconsistent with what's described in Algorithm 1?

QZH-777 · 2025-02-14T03:12:30Z

The training data for the ORM consists of rollouts from WebArena-Lite tasks, with success determined by the corresponding reward functions. The ORM can potentially serve as an alternative to the reward function. For certain reasons, we do not plan to publicly release the WebArena-Lite data along with the reward function at this time.

weizhepei · 2025-02-14T03:38:20Z

Thank you for the clarification and being transparent about the situation!

Looking forward to the potential future release of these resources, which would greatly help the community accurately reproduce the training process of WebRL and contribute to further extensions.

QZH-777 closed this as completed Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding the evaluation of rollouts on SFT training set using WebArena-Lite reward function #40

Questions regarding the evaluation of rollouts on SFT training set using WebArena-Lite reward function #40

weizhepei commented Feb 10, 2025 •

edited

Loading

QZH-777 commented Feb 13, 2025

weizhepei commented Feb 13, 2025

QZH-777 commented Feb 14, 2025

weizhepei commented Feb 14, 2025

Questions regarding the evaluation of rollouts on SFT training set using WebArena-Lite reward function #40

Questions regarding the evaluation of rollouts on SFT training set using WebArena-Lite reward function #40

Comments

weizhepei commented Feb 10, 2025 • edited Loading

QZH-777 commented Feb 13, 2025

weizhepei commented Feb 13, 2025

QZH-777 commented Feb 14, 2025

weizhepei commented Feb 14, 2025

weizhepei commented Feb 10, 2025 •

edited

Loading