Skip to content

Commit a04079d

Browse files
jjasgharnathan-weinbergcdoern
authored
Updates for Dec 13th granite-3.0-8b-lab-community (#26)
* Updates for Dec 13th granite-3.0-8b-lab-community These are the direct steps for the build that happened on Dec 13th. Signed-off-by: JJ Asghar <[email protected]> * Update docs/cmb/build_process.md Co-authored-by: Nathan Weinberg <[email protected]> Signed-off-by: JJ Asghar <[email protected]> * Update docs/cmb/build_process.md Co-authored-by: Charlie Doern <[email protected]> Signed-off-by: JJ Asghar <[email protected]> * Update docs/cmb/build_process.md Co-authored-by: Charlie Doern <[email protected]> Signed-off-by: JJ Asghar <[email protected]> * Update docs/cmb/build_process.md Co-authored-by: Charlie Doern <[email protected]> Signed-off-by: JJ Asghar <[email protected]> * Update build_process.md Signed-off-by: JJ Asghar <[email protected]> --------- Signed-off-by: JJ Asghar <[email protected]> Co-authored-by: Nathan Weinberg <[email protected]> Co-authored-by: Charlie Doern <[email protected]>
1 parent 54385bb commit a04079d

File tree

1 file changed

+72
-84
lines changed

1 file changed

+72
-84
lines changed

docs/cmb/build_process.md

+72-84
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,32 @@
11

22
!!! note
33
This document is the Community Build Process, these are the general steps to get the cmb built.
4+
If you are looking for the [config.yaml](https://gist.github.com/jjasghar/436931fbee1d34f029f3c099311301c3) that worked for `granite-3.0-8b-base` there it is.
5+
46

57
## Community Model Build diagram
68

79
![](../images/instructlab_cmb_build.png)
810

9-
## Add the PRs to the local tree
11+
We have created a default `build.sh` script, which will live in a repository (soon). The actual commands are
12+
explained here, and this should be considered the source of truth.
13+
14+
## Add the PRs to the build machine's taxonomy tree
1015

11-
Add the PRs you want to be built into the run. Tag the PRs with "cmb-running."
16+
Add the [PRs](https://github.com/instructlab/taxonomy/pulls) you want to be built into the run. Tag the PRs with "cmb-running."
1217

18+
Example:
1319
```bash
1420
mkdir -p compositional_skills/general/synonyms
1521
vi compositional_skills/general/synonyms/attribution.txt
1622
vi compositional_skills/general/synonyms/qna.yaml
1723
```
24+
Or if you are pulling from GitHub:
25+
```bash
26+
cd ~/.local/share/instructlab/taxonomy
27+
git fetch origin pull/ID/head:BRANCH_NAME
28+
git checkout BRANCHNAME
29+
```
1830

1931
## Verify changes
2032
```bash
@@ -30,6 +42,50 @@ ilab taxonomy diff
3042
3rd, 2024. If you have different hardware you'll need a different profile, and different
3143
options.
3244

45+
## Reset the build directories
46+
47+
Move the old build directories away, or save them. Something along these lines:
48+
```bash
49+
mv /home/instructlab/.local/share/instructlab/phased/journalfile.yaml /home/instructlab/.local/share/instructlab/phased/journalfile.yaml_$DATE
50+
mv /home/instructlab/.local/share/instructlab/datasets /home/instructlab/.local/share/instructlab/datasets_$DATE
51+
mv /home/instructlab/.local/share/instructlab/phased /home/instructlab/.local/share/instructlab/phased_$DATE
52+
```
53+
54+
Create the directories you moved away:
55+
```bash
56+
mkdir /home/instructlab/.local/share/instructlab/phased
57+
mkdir /home/instructlab/.local/share/instructlab/datasets
58+
```
59+
60+
## Add the `instructlab_community` mixin
61+
For the community build, off the `base` model, you should add the community data set, these are the steps:
62+
```bash
63+
cd ~/.local/share/instructlab/datasets/
64+
wget https://huggingface.co/datasets/instructlab/InstructLabCommunity/resolve/main/instructlab_community.jsonl
65+
cd ~
66+
```
67+
## Modify your config
68+
`ilab config edit`
69+
70+
find the general section of your config and ensure it matches the following:
71+
72+
```yaml
73+
general:
74+
# Debug level for logging.
75+
# Default: 0
76+
debug_level: 0
77+
# Log format. https://docs.python.org/3/library/logging.html#logrecord-attributes
78+
# Default: %(levelname)s %(asctime)s %(name)s:%(lineno)d: %(message)s
79+
log_format: '%(levelname)s %(asctime)s %(name)s:%(lineno)d: %(message)s'
80+
# Log level for logging.
81+
# Default: INFO
82+
log_level: INFO
83+
# Use legacy IBM Granite chat template (default uses 3.0 Instruct template)
84+
# Default: False
85+
use_legacy_tmpl: true
86+
```
87+
88+
use_legacy_tmpl must be true in order to generate data for and train the granite-3.0-8b-base model
3389
## Create the data
3490
```bash
3591
# annouce the start of the SDG
@@ -40,19 +96,19 @@ ilab data generate --pipeline full --gpus 8
4096
## Run the training after the generate is complete
4197
```bash
4298
# annouce the start of the training
43-
ilab model train --strategy lab-multiphase --phased-phase1-data ~/.local/share/instructlab/datasets/knowledge_train_msgs_XXXXXXX.jsonl --phased-phase2-data ~/.local/share/instructlab/datasets/skills_train_msgs_XXXXXXX.jsonl --skip-user-confirm --pipeline accelerated --force-clear-phased-cache
99+
ilab model train --strategy lab-multiphase --phased-phase1-data /home/instructlab/.local/share/instructlab/datasets/knowledge_train_msgs_*.jsonl --phased-phase2-data /home/instructlab/.local/share/instructlab/datasets/skills_train_msgs_*.jsonl --skip-user-confirm --force-clear-phased-cache
44100
# annouce the completion of the training
45101
```
46102

47-
## Post training evaluation steps
103+
## (optional) Post training evaluation steps
48104

49105
If you want to send a sanity check, you can set these two variables to do a subset of the training:
50106
```bash
51107
export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # mtbench
52108
export INSTRUCTLAB_EVAL_MMLU_MIN_TASKS=true # mmlu
53109
```
54110

55-
(optional in case of sanity of a specific Sample Model creation)
111+
(In case of sanity of a specific Sample Model creation)
56112
```bash
57113
ilab model evaluate --benchmark mt_bench --model ~/.local/share/instructlab/checkpoints/hf_format/samples_XXXXXX
58114
```
@@ -63,6 +119,7 @@ ilab model evaluate --benchmark mt_bench --model ~/.local/share/instructlab/chec
63119

64120
- `mmlu`: general model knowledge, general facts, it's a knowledge number out of 100
65121
- `mt_bench`: is a skill based, extraction, etc, out of 10
122+
66123
!!! note
67124
we want around 7.1 for `mt_bench` average for a model candidate
68125

@@ -78,93 +135,24 @@ ilab model evaluate --benchmark mmlu_branch --model ~/.local/share/checkpoints/h
78135
ilab model evaluate --benchmark mt_bench_branch --model ~/.local/share/checkpoints/hf_format/<checkpoint> --taxonomy-path ~/.local/share/instructlab/taxonomy --judge-model ~/.cache/instructlab/models/prometheus-8x7b-v2-0 --base-model ~/.cache/instructlab/models/granite-7b-redhat-lab --base-branch main --branch main
79136
```
80137

81-
## Hosting the release candidates
82-
83-
!!! warning
84-
This needs to be revisited as a process, this was a hack to start.
85-
86-
rsync over the files
87-
```bash
88-
mkdir $(date +%F)
89-
cd $(date +%F)
90-
rsync --info=progress2 -avz -e <USERNAME>@<REMOTE>:~/.local/share/checkpoints/hf_format/samples_xxxxx ./
91-
```
92-
93-
Set up (if needed)
94-
```bash
95-
python3.11 -m venv venv
96-
source venv/bin/activate
97-
pip install vllm
98-
./run.sh
99-
```
138+
## Publish to Huggingface
100139

101-
`run.sh`
140+
Sanity check the model to make sure it does what you are expecting:
102141
```bash
103-
#!/bin/bash
104-
105-
DIRECTORY=$1
106-
107-
DATE=$(date +%F)
108-
RANDOM_STRING=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 10; echo)
109-
RANDOM_PORT=$(shuf -i 8001-8800 -n 1)
110-
API_KEY=$RANDOM_STRING-$DATE
111-
112-
echo "$DIRECTORY,$API_KEY,$RANDOM_PORT" >> model_hosting.csv
113-
114-
echo "ilab model chat --endpoint-url http://cmb-staging.DOMAIN.xx:$RANDOM_PORT/v1 --api-key $API_KEY --model $DIRECTORY" >> model_ilab_scripting.sh
115-
116-
python -m vllm.entrypoints.openai.api_server --model $DIRECTORY --api-key $API_KEY --host 0.0.0.0 --port $RANDOM_PORT --tensor-parallel-size 2
117-
```
118-
119-
Find the `ilab` random command to host the model, send that on after the PR letter
142+
ilab model chat --model /home/instructlab/.local/share/instructlab/phased/phase2/checkpoints/hf_format/samples_XXXXX
120143
```
121-
cat model_ilab_scripting.sh
122-
```
123-
124-
## Form letter for PRs
125-
126-
Hi! 👋
127-
Thank you for submitting this PR. We are ready to do some validation now, and we have a few candidates to see if they improve the model.
128-
We some resources to run these release candidates, but we need _you_ to help us. Can you reach out to me either on Slack (@awesome) or email me at awesomeATinstructlab.ai so I can get you access via `ilab model chat`?
129-
We can only run these models for a "week" or so, so please reach out as soon as possible and tell me which one is best for you on this PR.
130-
131-
## With confirmed success
132-
133-
With confirmed success, tag the PR with "ready-for-merge" and remove the "community-build-ready" tags. Wait till the "week" before shutting down the staging instance, and merge in all the ones that have been tagged.
134-
135-
## Steps to Merge and Release
136-
137-
After you have merged in the PRs to the taxonomy, now you need to push this to huggingface, if you don't have access to HuggingFace, you will need to find someone to add you to it ;).
138-
139-
1) Clone down the repository on the staging box if you haven't already
140144

145+
Copy the checkpoint to the repository directory:
141146
```bash
142-
git clone https://huggingface.co/instructlab/granite-7b-lab
143-
cd granite-7b-lab
144-
vi .git/config
145-
# url = [email protected]:instructlab/granite-7b-lab
146-
# verify you can authenticate with hf.com: ssh -T [email protected]
147+
cp /home/instructlab/.local/share/instructlab/phased/phase2/checkpoints/hf_format/samples_XXXX/* ~/huggingface_repos/granite-3.0-8b-lab-community/
147148
```
148149

149-
2) Copy in the `samples_xxxx` into the granite-7b-lab
150-
3) `git add . && git commit`
151-
4) Write up a good commit message
152-
5) tag and push
153-
150+
Add and commit the changes to the repository:
154151
```bash
155-
git tag cmb-run-XXXXX
152+
cd ~/huggingface_repos/granite-3.0-8b-lab-community/
153+
git add .
154+
git commit -s
156155
git push origin main
157-
git push origin cmb-run-XXXXX
158-
```
159-
160-
## Convert to `gguf`
161-
```bash
162-
git clone https://github.com/ggerganov/llama.cpp.git
163-
cd llama.cpp/
164-
pip install -r requirements.txt
165-
make -j8
166-
./convert_hf_to_gguf.py ../granite-7b-lab --outfile granite-7b-fp16.gguf
167-
./llama-quantize granite-7b-fp16.gguf granite-7b-_Q4_K_M.gguf Q4_K_M/
168-
./llama-cli -m granite-7b-_Q4_K_M.gguf -p "who is batman?" -n 128
169156
```
170157

158+
Congratulations, this is the core steps to building out the safe-tensors to publish to hugging face.

0 commit comments

Comments
 (0)