|
| 1 | +# New Repository Proposal: eval |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +This document proposes a new repository under the `instructlab` GitHub organization: |
| 6 | + |
| 7 | +- `instructlab/eval` |
| 8 | + |
| 9 | +## Background |
| 10 | + |
| 11 | +The `instructlab/instructlab` repository currently includes no real implementation |
| 12 | +of Evaluation as described by the [LAB paper](https://arxiv.org/abs/2403.01081). The |
| 13 | +closest implementation currently in `instructlab/instructlab` via the `ilab test` command. |
| 14 | + |
| 15 | +`ilab test` as of this writing is only implemented for macOS with M-series chips. It uses |
| 16 | +a JSON Lines file and a LoRA adapter to compare output of a given model before and after |
| 17 | +LoRA training with MLX, thus the macOS M-series dependency. |
| 18 | + |
| 19 | +We desire to build out a library for methods that satisfy the evaluation described in the |
| 20 | +paper, using more high-level evaluation schemes such as |
| 21 | +[Multi-turn Benchmark](https://arxiv.org/abs/2306.05685) for skills and |
| 22 | +[Massive Multitask Language Understanding](https://arxiv.org/abs/2009.03300) (MMLU) for |
| 23 | +knowledge. We propose a new repository to house this code that publishes a new Python |
| 24 | +library called `instructlab-eval`. The reasoning for a new repository and library includes: |
| 25 | + |
| 26 | +- We expect multiple consumers of this code. The `ilab` CLI is one, but we also envision |
| 27 | +building a REST API around it to help support scaling out this functionality on a cluster. |
| 28 | +- We expect there is broader community interest in an open-source library and service for |
| 29 | +evaluation. We envision this library could support other evaluation techniques over time. |
| 30 | +- We also realize that much of model evaluation is generally useful outside the context of |
| 31 | +InstructLab. Other libraries may emerge in the broader ecosystem that handle parts of what |
| 32 | +we need, while this library will always remain to handle the InstructLab-specific details |
| 33 | +of how evaluation works in our workflow. |
| 34 | + |
| 35 | +## Maintainers |
| 36 | + |
| 37 | +The initial team of maintainers for this repository will be a copy of the |
| 38 | +`Backend Maintainers` GitHub team. |
| 39 | + |
| 40 | +## Alternatives Considered |
| 41 | + |
| 42 | +### Add to `instructlab/instructlab` |
| 43 | + |
| 44 | +We could add this code to the existing `instructlab/instructlab` repository. |
| 45 | + |
| 46 | +The primary argument against this approach is that we expect the scope of an |
| 47 | +`instructlab-eval` library to expand beyond the scope of what would be run by the |
| 48 | +`ilab` CLI. We instead envision a different community of contributors organizing |
| 49 | +around Evaluation specifically. |
0 commit comments