You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the goal is to improve the automation of future evaluation tasks (as opposed to updating current evaluations to become automatic), brainstorming of what components could be used to increase the automaticness of running the evaluations should be considered.
This issue will focus on what formats/templates/common practices should be adhered to allow for better automation into this process.
Done when
A template for future tasks in general, using goldretriever, having the same invocation should be created. This can then be followed for future tasks. Importantly, some error handling, especially at the level of basic sanity tests like outputting the same number of expected files should be present.
A template for readmes should be created to describe before full automation how to run the apps.
A template for reports generated by the eval should be created. However, the similarity of each task's report is still in discussion. It is still to be determined how much of the report can be automatically generated.
Additional context
Questions to consider for moving towards automation:
How does one make the process more automatic than downloading the eval repo, placing the mmif predictions in the correct place and running the code?
Should the evaluation code be wrapped in a docker app that also contains the needed environment and modules to run it?
Perhaps one could run one app, select the task for which to evaluate for, and the app could handle the rest?
Where should generated mmifs or other system outputs be placed. Should any of that be automatic?
What other todos and concerns should improve this process flow?
The text was updated successfully, but these errors were encountered:
jarumihooi
changed the title
Updates and Planning for More Automation of Evals
Planning Documentation/Templates for Future Automation of Evals
Dec 13, 2023
How does one make the process more automatic than downloading the eval repo, placing the mmif predictions in the correct place and running the code?
It seems like the invocation of evaluation could be automatically triggered if a new commit of prediction mmifs is placed in a certain place, for instance, inside a subdir/task-dir of the aapb-evaluations repository. This could trigger an action that will run evaluation code automatically.
Because
the goal is to improve the automation of future evaluation tasks (as opposed to updating current evaluations to become automatic), brainstorming of what components could be used to increase the automaticness of running the evaluations should be considered.
This issue will focus on what formats/templates/common practices should be adhered to allow for better automation into this process.
Done when
Additional context
Questions to consider for moving towards automation:
What other todos and concerns should improve this process flow?
The text was updated successfully, but these errors were encountered: