The Details of Evaluation on Math Benchmark

Hi Multiverse Team,

Thank you for your remarkable work! Could you share more details on how you conducted the evaluation of Multiverse-32B on AIME24, AIME25, and MATH500—for example, the prompts used, generation configurations, and the maximum number of new tokens?