Skip to content

Commit 4a862cf

Browse files
committed
Reorganize custom qnas and e2e workflow
Signed-off-by: Kelly Brown <[email protected]>
1 parent 6174543 commit 4a862cf

File tree

8 files changed

+117
-16
lines changed

8 files changed

+117
-16
lines changed
+99
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
---
2+
title: About taxonomy trees and QnA YAMLs
3+
description: The overview of 🐶 InstructLab's Taxonomy.
4+
logo: images/ilab_dog.png
5+
---
6+
## About taxonomy trees and QnA YAMLs
7+
8+
InstructLab 🐶 uses a novel synthetic data-based alignment tuning method for
9+
Large Language Models (LLMs.) The "**lab**" in Instruct**Lab** 🐶 stands for
10+
[**L**arge-Scale **A**lignment for Chat**B**ots](https://arxiv.org/abs/2403.01081) [1].
11+
12+
The LAB method is driven by taxonomies, which are largely created manually and
13+
with care.
14+
15+
This repository contains a taxonomy tree that allows you to create models
16+
tuned with your data (enhanced via synthetic data generation) using the LAB 🐶
17+
method.
18+
19+
[1] Shivchander Sudalairaj*, Abhishek Bhandwaldar*, Aldo Pareja*, Kai Xu, David D. Cox, Akash Srivastava*. "LAB: Large-Scale Alignment for ChatBots", arXiv preprint arXiv: 2403.01081, 2024. (* denotes equal contributions)
20+
21+
## Intro to skills and knowledge
22+
23+
Skill and knowledge are the types of data that you can add to the taxonomy tree. You can then use these types to train a model on data it might not already know.
24+
25+
### Knowledge
26+
27+
Knowledge for an AI model consists of data and facts. When creating knowledge sets for a model, you are providing it with additional data and information so the model can answer questions more accurately. Where skills are the information that trains an AI model on how to do something, knowledge is based on the model’s ability to answer questions that involve facts, data, or references. For example, you can create a data set that includes a product’s documentation and the model can learn the information provided in that documentation.
28+
29+
### Skills
30+
31+
A skill is a capability domain that intends to train the AI model on submitted information. When you make a skill, you are teaching the model how to do a task. Skills on RHEL AI are split into categories:
32+
33+
* Composition skill: Compositional skills allow AI models to perform specific tasks or functions. There are two types of compositional skills:
34+
** Freeform compositional skills: These are performative skills that do not require additional context or information to function.
35+
** Grounded compositional skills: These are performative skills that require additional context. For example, you can teach the model to read a table, where the additional context is an example of the table layout.
36+
Foundation skills: Foundational skills are skills that involve math, reasoning, and coding.
37+
38+
## InstructLab QnA YAML files
39+
40+
You can teach LLMs new information by creating a `qna.yaml` file that contains information of your knowledge or details of your skill.
41+
42+
For more information on creating skills and knowledge YAML files, see:
43+
44+
* [Skills overview](skills/index.md)
45+
* [Knowledge overview](knowledge/index.md)
46+
47+
## Choosing domains for the taxonomy
48+
49+
In general, we use the Dewey Decimal Classification (DDC) System to determine our domains (and subdomains) in the taxonomy. This [DDC SUMMARIES document](https://www.oclc.org/content/dam/oclc/dewey/resources/summaries/deweysummaries.pdf) is a great resource for determining where a topic might be classified.
50+
51+
If you are unsure where to put your knowledge or compositional skill, create a folder in the `miscellaneous_unknown` folder under the `knowledge` or `compositional_skills` folders.
52+
53+
## Taxonomy tree Layout
54+
55+
The taxonomy tree is organized in a cascading directory structure. At the end of
56+
each branch, there is a YAML file (qna.yaml) that contains the examples for that
57+
domain. Maintainers can decide to change the names of the existing branches or to add new branches.
58+
59+
!!! important
60+
Folder names do not have spaces. Use underscores between words.
61+
62+
## Taxonomy diagram
63+
64+
!!! note
65+
These diagrams shows a subset of the taxonomy. It is not a complete representation.
66+
67+
```mermaid
68+
flowchart TD;
69+
na[not accepting contributions\n at this time]:::na
70+
taxonomy --> foundational_skill & compositional_skills & knowledge
71+
72+
foundational_skill:::na --> reasoning:::na
73+
reasoning:::na --> common_sense_reasoning:::na
74+
reasoning:::na --> mathematical_reasoning:::na
75+
reasoning:::na --> theory_of_mind:::na
76+
77+
compositional_skills --> engineering
78+
compositional_skills --> grounded
79+
compositional_skills --> lingustics
80+
81+
grounded --> grounded/arts
82+
grounded --> grounded/geography
83+
grounded --> grounded/history
84+
grounded --> grounded/science
85+
86+
knowledge --> knowledge/arts
87+
88+
knowledge --> knowledge/miscellaneous_unknown
89+
knowledge --> knowledge/science
90+
knowledge --> knowledge/technology
91+
knowledge/science --> animals --> birds --> black_capped_chickadee --> black_capped_chikadee-a & black_capped_chikadee-q
92+
knowledge/science --> astronomy --> constellations --> phoenix --> phoenix-a & phoenix-q
93+
94+
black_capped_chikadee-a{attribution.txt}
95+
black_capped_chikadee-q{qna.yaml}
96+
phoenix-a{attribution.txt}
97+
phoenix-q{qna.yaml}
98+
classDef na fill:#EEE
99+
```

docs/taxonomy/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -131,4 +131,4 @@ To contribute to this repo, you'll use the *Fork and Pull* model common in many
131131
This taxonomy repository will be used as the seed to synthesize the training
132132
data for InstructLab-trained models. We intend to retrain the model(s) using the main
133133
branch as often as possible (at least weekly).
134-
Fast iteration of the model(s) benefits the open source community and enables model developers who do not have access to the necessary compute infrastructure.
134+
Fast iteration of the model(s) benefits the open source community and enables model developers who do not have access to the necessary compute infrastructure.

mkdocs.yml

+17-15
Original file line numberDiff line numberDiff line change
@@ -17,32 +17,34 @@ nav:
1717
- Initialize InstructLab: getting-started/initilize_ilab.md
1818
- Download models: getting-started/download_models.md
1919
- Intro to serve and chat: getting-started/serve_and_chat.md
20+
- Creating new knowledge or skills:
21+
- About taxonomy trees and QnA YAMLs: creating-skills-knowledge/index.md
22+
- Skills Creation: creating-skills-knowledge/skills/index.md
23+
- Skills Guidelines: creating-skills-knowledge/skills/skills_guide.md
24+
- Knowledge Creation: creating-skills-knowledge/knowledge/index.md
25+
- Knowledge Guidelines: creating-skills-knowledge/knowledge/guide.md
2026
- Adding data to a model:
27+
- Teaching a model new skills and knowledge: adding-data-to-model/creating_new_knowledge_or_skills.md
2128
- Creating a new Wikipedia based qna.yaml: adding-data-to-model/creating_new_wikipedia_based_qna.md
22-
- Creating New Knowledge or Skills: adding-data-to-model/creating_new_knowledge_or_skills.md
23-
- Community:
24-
- Code of Conduct: community/CODE_OF_CONDUCT.md
25-
- Code of Conduct Committee: community/CODE_OF_CONDUCT_COMMITTEE.md
26-
- FAQ: community/FAQ.md
27-
- Governance Policy: community/GOVERNANCE.md
28-
- Slack Guide: community/InstructLab_SLACK_GUIDE.md
29-
- Slack Moderation Guide: community/InstructLab_SLACK_MODERATION_GUIDE.md
30-
- Taxonomy:
29+
- Contributing to the InstructLab taxonomy::
3130
- About Taxonomy: taxonomy/index.md
32-
- Skills Overview: taxonomy/skills/index.md
33-
- Skills Guide: taxonomy/skills/skills_guide.md
34-
- Knowledge Overview: taxonomy/knowledge/index.md
35-
- Knowledge Guide: taxonomy/knowledge/guide.md
3631
- Knowledge Contribution Details: taxonomy/knowledge/contribution_details.md
3732
- User Interface:
3833
- UI Overview: user-interface/ui_overview.md
3934
- Chat with the model: user-interface/playground_chat.md
4035
- Contribute knowledge to community model: user-interface/knowledge_contributions.md
4136
- Contribute skills to community model: user-interface/skills_contributions.md
4237
- UI Configurations: user-interface/env_oauth_config.md
43-
- Community Model Build:
44-
- About Community Model Build: cmb/index.md
38+
- Community Model Builds:
39+
- About Community Model Builds: cmb/index.md
4540
- Community Model Build Process: cmb/build_process.md
41+
- Community:
42+
- Code of Conduct: community/CODE_OF_CONDUCT.md
43+
- Code of Conduct Committee: community/CODE_OF_CONDUCT_COMMITTEE.md
44+
- FAQ: community/FAQ.md
45+
- Governance Policy: community/GOVERNANCE.md
46+
- Slack Guide: community/InstructLab_SLACK_GUIDE.md
47+
- Slack Moderation Guide: community/InstructLab_SLACK_MODERATION_GUIDE.md
4648
- References:
4749
- Additional Resources: resources/RESOURCES.md
4850
- Contributors: resources/CONTRIBUTORS.md

0 commit comments

Comments
 (0)