First, great work!
I would like to understand how well the performance "40.29%(Avg@8) on the GAIA-text-103 subset" is. You mentioned that this score was obtained by fine-tuning your MiroRL-14B-SingleAgent-Preview-v0.1 model with GRPO. But you didn't mention the original model's performance.
In addition, I found that MiroThinker-14B-SFT-v0.1 obtained a 44.4 score on the same benchmark. I assume that MiroRL-14B-SingleAgent-Preview-v0.1 should achieve a score lower than 40.29. Could you explain the differences between MiroRL-14B-SingleAgent-Preview-v0.1 and MiroThinker-14B-SFT-v0.1?
Thanks