Skip to content

Actions: openai/evals

Actions

Run unit tests

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
192 workflow runs
192 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Updates for Solvers
Run unit tests #1602: Pull request #1461 opened by JunShern
January 26, 2024 08:22 2m 28s jun/solvers-update
January 26, 2024 08:22 2m 28s
Logged spec now includes overridden args (#1460)
Run unit tests #1601: Commit 3040d6f pushed by JunShern
January 26, 2024 07:12 2m 2s main
January 26, 2024 07:12 2m 2s
Add run_id to final_report from LocalRecorder (#1452)
Run unit tests #1600: Commit cf002f2 pushed by JunShern
January 26, 2024 07:07 2m 7s main
January 26, 2024 07:07 2m 7s
Logged spec now includes overridden args
Run unit tests #1599: Pull request #1460 opened by ojaffe
January 17, 2024 09:39 1m 57s ojaffe:ollie/logging_fix
January 17, 2024 09:39 1m 57s
Fix formatting/typing so pre-commit hooks pass (#1451)
Run unit tests #1596: Commit c66b5c1 pushed by etr2460
January 10, 2024 16:25 3m 34s main
January 10, 2024 16:25 3m 34s
icelandic gec eval (#1400)
Run unit tests #1595: Commit 105c2b9 pushed by etr2460
January 10, 2024 16:23 2m 11s main
January 10, 2024 16:23 2m 11s
Add eval yaml for Theory of Mind eval (#1453)
Run unit tests #1593: Commit 877d555 pushed by JunShern
January 9, 2024 02:34 2m 9s main
January 9, 2024 02:34 2m 9s
Add eval yaml for Theory of Mind eval
Run unit tests #1592: Pull request #1453 opened by ojaffe
January 8, 2024 10:37 1m 57s ojaffe:ollie/tom_fix
January 8, 2024 10:37 1m 57s
Improve MMMU performance with prompt engineering (#1450)
Run unit tests #1588: Commit 2981e65 pushed by etr2460
January 3, 2024 18:20 2m 14s main
January 3, 2024 18:20 2m 14s
Improve MMMU performance with prompt engineering
Run unit tests #1587: Pull request #1450 opened by etr2460
January 3, 2024 18:15 2m 6s erik/mmmu-tuning
January 3, 2024 18:15 2m 6s
Add eval japanese prime minister (#1422)
Run unit tests #1586: Commit f1bb7cb pushed by etr2460
January 3, 2024 16:49 2m 1s main
January 3, 2024 16:49 2m 1s
Solve #1394 (#1395)
Run unit tests #1585: Commit 10b02c6 pushed by logankilpatrick
January 3, 2024 16:48 2m 13s main
January 3, 2024 16:48 2m 13s
Add a recorder for function calls (#1389)
Run unit tests #1584: Commit 0647721 pushed by logankilpatrick
January 3, 2024 16:46 2m 31s main
January 3, 2024 16:46 2m 31s
Add gpt-3.5-turbo-16k support to ctx len getter (#1388)
Run unit tests #1583: Commit bbe26f8 pushed by logankilpatrick
January 3, 2024 16:45 2m 12s main
January 3, 2024 16:45 2m 12s
Fixed parameter incorrect (#1378)
Run unit tests #1582: Commit 1dd2ea2 pushed by logankilpatrick
January 3, 2024 16:43 2m 16s main
January 3, 2024 16:43 2m 16s
Log model and usage stats in record.sampling
Run unit tests #1581: Pull request #1449 opened by JunShern
January 3, 2024 04:12 2m 8s jun/log-token-counts
January 3, 2024 04:12 2m 8s
Randomly select MMMU answer when none is returned from the model (#1447)
Run unit tests #1580: Commit ded9382 pushed by etr2460
December 24, 2023 19:23 2m 8s main
December 24, 2023 19:23 2m 8s
Randomly select MMMU answer when none is returned from the model
Run unit tests #1579: Pull request #1447 opened by etr2460
December 24, 2023 05:40 2m 1s erik/mmmu-random
December 24, 2023 05:40 2m 1s
Change wrong kwargs name (#1435)
Run unit tests #1578: Commit 02f35cc pushed by etr2460
December 21, 2023 17:46 2m 12s main
December 21, 2023 17:46 2m 12s
Fix Pydantic warning on data_test run (#1445)
Run unit tests #1577: Commit dd38662 pushed by etr2460
December 21, 2023 17:41 2m 4s main
December 21, 2023 17:41 2m 4s
Fix small typo in oaieval run function (#1438)
Run unit tests #1576: Commit 23ae8ab pushed by etr2460
December 21, 2023 17:40 2m 5s main
December 21, 2023 17:40 2m 5s
Fix Pydantic warning on data_test run
Run unit tests #1575: Pull request #1445 opened by inwaves
December 21, 2023 10:44 2m 2s inwaves:fix/PydanticWarningOnTestRun
December 21, 2023 10:44 2m 2s