Skip to content

Commit 3ee5f52

Browse files
Merge pull request #79 from nathan-weinberg/new-eval
Proposal for new Evaluation repo
2 parents 92750e8 + 686870e commit 3ee5f52

File tree

2 files changed

+55
-3
lines changed

2 files changed

+55
-3
lines changed

.spellcheck-en-custom.txt

+6-3
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ Dropdown
3434
env
3535
EP
3636
Eval
37+
eval
3738
Excalidraw
3839
exfiltrate
3940
exfiltrating
@@ -52,6 +53,7 @@ Inferencing
5253
instructlab
5354
ISA
5455
JIT
56+
JSON
5557
Jupyter
5658
KAGGLE
5759
Kaggle
@@ -63,19 +65,20 @@ LLM
6365
llms
6466
LLVM
6567
lora
66-
md
6768
Markdownlint
69+
md
6870
Mergify
6971
Merlinite
7072
mimimum
7173
Miniforge
7274
Mixtral
7375
MLX
7476
mlx
77+
MMLU
7578
NVidia
7679
Nvidia
77-
ollama
7880
Ollama
81+
ollama
7982
orchestrator
8083
ots
8184
Pareja
@@ -104,12 +107,12 @@ RX
104107
safetensors
105108
Salawu
106109
SDG
107-
Sigstore
108110
sdg
109111
sexualized
110112
SHA
111113
Shivchander
112114
Signoff
115+
Sigstore
113116
Srivastava
114117
subdirectory
115118
Sudalairaj

docs/evaluation/eval-repo.md

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# New Repository Proposal: eval
2+
3+
## Summary
4+
5+
This document proposes a new repository under the `instructlab` GitHub organization:
6+
7+
- `instructlab/eval`
8+
9+
## Background
10+
11+
The `instructlab/instructlab` repository currently includes no real implementation
12+
of Evaluation as described by the [LAB paper](https://arxiv.org/abs/2403.01081). The
13+
closest implementation currently in `instructlab/instructlab` via the `ilab test` command.
14+
15+
`ilab test` as of this writing is only implemented for macOS with M-series chips. It uses
16+
a JSON Lines file and a LoRA adapter to compare output of a given model before and after
17+
LoRA training with MLX, thus the macOS M-series dependency.
18+
19+
We desire to build out a library for methods that satisfy the evaluation described in the
20+
paper, using more high-level evaluation schemes such as
21+
[Multi-turn Benchmark](https://arxiv.org/abs/2306.05685) for skills and
22+
[Massive Multitask Language Understanding](https://arxiv.org/abs/2009.03300) (MMLU) for
23+
knowledge. We propose a new repository to house this code that publishes a new Python
24+
library called `instructlab-eval`. The reasoning for a new repository and library includes:
25+
26+
- We expect multiple consumers of this code. The `ilab` CLI is one, but we also envision
27+
building a REST API around it to help support scaling out this functionality on a cluster.
28+
- We expect there is broader community interest in an open-source library and service for
29+
evaluation. We envision this library could support other evaluation techniques over time.
30+
- We also realize that much of model evaluation is generally useful outside the context of
31+
InstructLab. Other libraries may emerge in the broader ecosystem that handle parts of what
32+
we need, while this library will always remain to handle the InstructLab-specific details
33+
of how evaluation works in our workflow.
34+
35+
## Maintainers
36+
37+
The initial team of maintainers for this repository will be a copy of the
38+
`Backend Maintainers` GitHub team.
39+
40+
## Alternatives Considered
41+
42+
### Add to `instructlab/instructlab`
43+
44+
We could add this code to the existing `instructlab/instructlab` repository.
45+
46+
The primary argument against this approach is that we expect the scope of an
47+
`instructlab-eval` library to expand beyond the scope of what would be run by the
48+
`ilab` CLI. We instead envision a different community of contributors organizing
49+
around Evaluation specifically.

0 commit comments

Comments
 (0)