Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning Documentation/Templates for Future Automation of Evals #37

Open
3 tasks
jarumihooi opened this issue Dec 13, 2023 · 1 comment
Open
3 tasks
Milestone

Comments

@jarumihooi
Copy link
Contributor

jarumihooi commented Dec 13, 2023

Because

the goal is to improve the automation of future evaluation tasks (as opposed to updating current evaluations to become automatic), brainstorming of what components could be used to increase the automaticness of running the evaluations should be considered.

This issue will focus on what formats/templates/common practices should be adhered to allow for better automation into this process.

Done when

  • A template for future tasks in general, using goldretriever, having the same invocation should be created. This can then be followed for future tasks. Importantly, some error handling, especially at the level of basic sanity tests like outputting the same number of expected files should be present.
  • A template for readmes should be created to describe before full automation how to run the apps.
  • A template for reports generated by the eval should be created. However, the similarity of each task's report is still in discussion. It is still to be determined how much of the report can be automatically generated.

Additional context

Questions to consider for moving towards automation:

  • How does one make the process more automatic than downloading the eval repo, placing the mmif predictions in the correct place and running the code?
    • Should the evaluation code be wrapped in a docker app that also contains the needed environment and modules to run it?
    • Perhaps one could run one app, select the task for which to evaluate for, and the app could handle the rest?
  • Where should generated mmifs or other system outputs be placed. Should any of that be automatic?

What other todos and concerns should improve this process flow?

@clams-bot clams-bot added this to infra Dec 13, 2023
@github-project-automation github-project-automation bot moved this to Todo in infra Dec 13, 2023
@jarumihooi jarumihooi changed the title Updates and Planning for More Automation of Evals Planning Documentation/Templates for Future Automation of Evals Dec 13, 2023
@jarumihooi
Copy link
Contributor Author

How does one make the process more automatic than downloading the eval repo, placing the mmif predictions in the correct place and running the code?

It seems like the invocation of evaluation could be automatically triggered if a new commit of prediction mmifs is placed in a certain place, for instance, inside a subdir/task-dir of the aapb-evaluations repository. This could trigger an action that will run evaluation code automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants