Skip to content

Commit 8b2ad4c

Browse files
rebase (#1714)
rebase fix notes
1 parent b5a4994 commit 8b2ad4c

File tree

7 files changed

+246
-461
lines changed

7 files changed

+246
-461
lines changed

.ci/spellcheck/.pyspelling.wordlist.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ AutoTokenizer
3939
backend
4040
backends
4141
Baevski
42+
Baichuan
43+
baichuan
4244
BaseSpeakerTTS
4345
BasicUNet
4446
bboxes
@@ -551,6 +553,7 @@ PyTorchVideo
551553
QFormer
552554
Qianwen
553555
Qi
556+
QKV
554557
qrcode
555558
quant
556559
quantized
@@ -687,6 +690,7 @@ surya
687690
svc
688691
SVTR
689692
Swin
693+
SwiGLU
690694
SwinV
691695
TaskManager
692696
TartanAir

notebooks/254-llm-chatbot/254-llm-chatbot.ipynb

Lines changed: 110 additions & 170 deletions
Large diffs are not rendered by default.

notebooks/254-llm-chatbot/254-rag-chatbot.ipynb

Lines changed: 97 additions & 263 deletions
Large diffs are not rendered by default.

notebooks/254-llm-chatbot/README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,18 @@ The available options are:
1515
* **tiny-llama-1b-chat** - This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T). The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens with the adoption of the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. More details about model can be found in [model card](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0).
1616
* **mini-cpm-2b-dpo** - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. After Direct Preference Optimization (DPO) fine-tuning, MiniCPM outperforms many popular 7b, 13b and 70b models. More details can be found in [model_card](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp16).
1717
* **red-pajama-3b-chat** - A 2.8B parameter pretrained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).
18-
* **llama-2-7b-chat** - LLama 2 is the second generation of LLama models developed by Meta. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. llama-2-7b-chat is 7 billions parameters version of LLama 2 finetuned and optimized for dialogue use case. More details about model can be found in the [paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/), [repository](https://github.com/facebookresearch/llama) and [HuggingFace model card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
18+
* **llama-2-7b-chat** - LLama 2 is the second generation of LLama models developed by Meta. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. llama-2-7b-chat is 7 billions parameters version of LLama 2 finetuned and optimized for dialogue use case. More details about model can be found in the [paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/), [repository](https://github.com/facebookresearch/llama) and [HuggingFace model card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
19+
* **qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model card](https://huggingface.co/Qwen/Qwen1.5-7B-Chat).
1920
>**Note**: run model with demo, you will need to accept license agreement.
2021
>You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), carefully read terms of usage and click accept button. You will need to use an access token for downloading model. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
2122
* **mpt-7b-chat** - MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases ([ALiBi](https://arxiv.org/abs/2108.12409)). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT-7B-chat is a chatbot-like model for dialogue generation. It was built by finetuning MPT-7B on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets. More details about model can be found in [blog post](https://www.mosaicml.com/blog/mpt-7b), [repository](https://github.com/mosaicml/llm-foundry/) and [HuggingFace model card](https://huggingface.co/mosaicml/mpt-7b-chat).
22-
* **qwen-7b-chat** - Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. For more details about Qwen, please refer to the [GitHub](https://github.com/QwenLM/Qwen) code repository.
2323
* **chatglm3-6b** - ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. ChatGLM3-6B adopts a newly designed [Prompt format](https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md), in addition to the normal multi-turn dialogue. You can find more details about model in the [model card](https://huggingface.co/THUDM/chatglm3-6b)
2424
* **mistral-7b** - The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. You can find more details about model in the [paper](https://arxiv.org/abs/2310.06825) and [release blog post](https://mistral.ai/news/announcing-mistral-7b/).
2525
* **zephyr-7b-beta** - Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-beta is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) that was trained on on a mix of publicly available, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). You can find more details about model in [technical report](https://arxiv.org/abs/2310.16944) and [HuggingFace model card](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta).
2626
* **neural-chat-7b-v3-1** - Mistral-7b model fine-tuned using Intel Gaudi. The model fine-tuned on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) and aligned with [Direct Preference Optimization (DPO) algorithm](https://arxiv.org/abs/2305.18290). More details can be found in [model card](https://huggingface.co/Intel/neural-chat-7b-v3-3) and [blog post](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).
2727
* **notus-7b-v1** - Notus is a collection of fine-tuned models using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). and related [RLHF](https://huggingface.co/blog/rlhf) techniques. This model is the first version, fine-tuned with DPO over zephyr-7b-sft. Following a data-first approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. Proposed approach for dataset creation helps to effectively fine-tune Notus-7b that surpasses Zephyr-7B-beta and Claude 2 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). More details about model can be found in [model card](https://huggingface.co/argilla/notus-7b-v1).
28-
* **youri-7b-chat** - Youri-7b-chat is a Llama2 based model. [Rinna Co., Ltd.](https://rinna.co.jp/) conducted further pre-training for the Llama2 model with a mixture of English and Japanese datasets to improve Japanese task capability. The model is publicly released on Hugging Face hub. You can find detailed information at the [rinna/youri-7b-chat project page](https://huggingface.co/rinna/youri-7b).
28+
* **youri-7b-chat** - Youri-7b-chat is a Llama2 based model. [Rinna Co., Ltd.](https://rinna.co.jp/) conducted further pre-training for the Llama2 model with a mixture of English and Japanese datasets to improve Japanese task capability. The model is publicly released on Hugging Face hub. You can find detailed information at the [rinna/youri-7b-chat project page](https://huggingface.co/rinna/youri-7b).
29+
* **baichuan2-7b-chat** - Baichuan 2 is the new generation of large-scale open-source language models launched by [Baichuan Intelligence inc](https://www.baichuan-ai.com/home). It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size.
2930

3031
The image below illustrates the provided user instruction and model answer examples.
3132

notebooks/254-llm-chatbot/config.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -129,14 +129,13 @@ def youri_partial_text_processor(partial_text, new_text):
129129
Context: {context}
130130
Answer: <im_end><|im_start|>assistant""",
131131
},
132-
"qwen-7b-chat": {
133-
"model_id": "Qwen/Qwen-7B-Chat",
134-
"remote": True,
132+
"qwen1.5-7b-chat": {
133+
"model_id": "Qwen/Qwen1.5-7B-Chat",
134+
"remote": False,
135135
"start_message": f"<|im_start|>system\n {DEFAULT_SYSTEM_PROMPT_CHINESE }<|im_end|>",
136136
"history_template": "<|im_start|>user\n{user}<im_end><|im_start|>assistant\n{assistant}<|im_end|>",
137137
"current_message_template": '"<|im_start|>user\n{user}<im_end><|im_start|>assistant\n{assistant}',
138138
"stop_tokens": ["<|im_end|>", "<|endoftext|>"],
139-
"revision": "2abd8e5777bb4ce9c8ab4be7dbbd0fe4526db78d",
140139
"prompt_template": f"""<|im_start|>system
141140
{DEFAULT_RAG_PROMPT_CHINESE }<|im_end|>"""
142141
+ """
@@ -224,6 +223,20 @@ def youri_partial_text_processor(partial_text, new_text):
224223
"tokenizer_kwargs": {"add_special_tokens": False},
225224
"partial_text_processor": youri_partial_text_processor,
226225
},
226+
"baichuan2-7b-chat": {
227+
"model_id": "baichuan-inc/Baichuan2-7B-Chat",
228+
"remote": True,
229+
"start_message": f"{DEFAULT_SYSTEM_PROMPT_CHINESE }",
230+
"roles": [195, 196],
231+
"tokenizer_kwargs": {"add_special_tokens": False},
232+
"stop_tokens": [2],
233+
"prompt_template": f"""{DEFAULT_RAG_PROMPT_CHINESE }"""
234+
+ """
235+
问题: {question}
236+
已知内容: {context}
237+
回答:
238+
""",
239+
},
227240
}
228241

229242
SUPPORTED_EMBEDDING_MODELS = {

notebooks/254-llm-chatbot/converter.py

Lines changed: 9 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
def register_configs():
2020
from optimum.exporters.tasks import TasksManager
2121
TasksManager._SUPPORTED_MODEL_TYPE["minicpm"] = TasksManager._SUPPORTED_MODEL_TYPE["llama"]
22+
TasksManager._SUPPORTED_MODEL_TYPE["qwen2"] = TasksManager._SUPPORTED_MODEL_TYPE["llama"]
2223

2324
def patch_stateful(ov_model, model_type):
2425
key_value_input_names = [
@@ -142,48 +143,39 @@ def ts_patched_forward(
142143
del pt_model
143144

144145

145-
def _update_qwen_rotary_embedding_cache(model):
146-
model.transformer.rotary_emb(2048)
147-
148-
149-
def convert_qwen(pt_model: torch.nn.Module, model_path: Path):
146+
def convert_baichuan(pt_model: torch.nn.Module, model_path: Path):
150147
"""
151-
Qwen model conversion function
152-
148+
Baichuan model conversion function
153149
Params:
154150
pt_model: PyTorch model
155151
model_path: path for saving model
156152
Returns:
157153
None
158154
"""
159-
_update_qwen_rotary_embedding_cache(pt_model)
160155
ov_out_path = Path(model_path) / "openvino_model.xml"
161156
pt_model.config.save_pretrained(ov_out_path.parent)
162157
pt_model.config.use_cache = True
163158
outs = pt_model(
164159
input_ids=torch.ones((1, 10), dtype=torch.long),
165160
attention_mask=torch.ones((1, 10), dtype=torch.long),
166161
)
167-
inputs = ["input_ids"]
162+
inputs = ["input_ids", "attention_mask"]
168163
outputs = ["logits"]
169164

170165
dynamic_shapes = {
171166
"input_ids": {0: "batch_size", 1: "seq_len"},
172167
"attention_mask": {0: "batch_size", 1: "seq_len"},
173-
"token_type_ids": {0: "batch_size", 1: "seq_len"},
174168
}
175169
for idx in range(len(outs.past_key_values)):
176170
inputs.extend([f"past_key_values.{idx}.key", f"past_key_values.{idx}.value"])
177-
dynamic_shapes[inputs[-1]] = {0: "batch_size", 1: "past_sequence + sequence"}
178-
dynamic_shapes[inputs[-2]] = {0: "batch_size", 1: "past_sequence + sequence"}
171+
dynamic_shapes[inputs[-1]] = {0: "batch_size", 2: "past_sequence + sequence"}
172+
dynamic_shapes[inputs[-2]] = {0: "batch_size", 2: "past_sequence + sequence"}
179173
outputs.extend([f"present.{idx}.key", f"present.{idx}.value"])
180174

181-
inputs += ["attention_mask", "token_type_ids"]
182175
dummy_inputs = {
183176
"input_ids": torch.ones((1, 2), dtype=torch.long),
184-
"past_key_values": outs.past_key_values,
185177
"attention_mask": torch.ones((1, 12), dtype=torch.long),
186-
"token_type_ids": torch.ones((1, 2), dtype=torch.long),
178+
"past_key_values": outs.past_key_values,
187179
}
188180
pt_model.config.torchscript = True
189181
ov_model = ov.convert_model(pt_model, example_input=dummy_inputs)
@@ -205,7 +197,7 @@ def convert_qwen(pt_model: torch.nn.Module, model_path: Path):
205197

206198
ov_model.validate_nodes_and_infer_types()
207199
if make_stateful is not None:
208-
patch_stateful(ov_model, "qwen")
200+
patch_stateful(ov_model, "baichuan")
209201
ov.save_model(ov_model, ov_out_path)
210202
del ov_model
211203
cleanup_torchscript_cache()
@@ -410,8 +402,8 @@ def convert_bert(pt_model: torch.nn.Module, model_path: Path):
410402
converters = {
411403
# LLM models
412404
"mpt": convert_mpt,
413-
"qwen": convert_qwen,
414405
"chatglm3": convert_chatglm,
406+
"baichuan2": convert_baichuan,
415407
# embedding models
416408
"all-mpnet-base-v2": convert_mpnet,
417409
"text2vec-large-chinese": convert_bert,

notebooks/254-llm-chatbot/ov_llm_model.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ def _from_pretrained(
204204
)
205205

206206

207-
class OVQWENModel(OVModelForCausalLM):
207+
class OVBAICHUANModel(OVModelForCausalLM):
208208
"""
209209
Optimum intel compatible model wrapper for QWEN
210210
"""
@@ -219,7 +219,7 @@ def __init__(
219219
model_save_dir: Optional[Union[str, Path]] = None,
220220
**kwargs,
221221
):
222-
NormalizedConfigManager._conf["qwen"] = NormalizedTextConfig.with_args(
222+
NormalizedConfigManager._conf["baichuan"] = NormalizedTextConfig.with_args(
223223
num_layers="num_hidden_layers",
224224
num_attention_heads="num_attention_heads",
225225
hidden_size="hidden_size",
@@ -270,12 +270,13 @@ def _from_pretrained(
270270
)
271271

272272
model = cls.load_model(model_cache_path, load_in_8bit=load_in_8bit)
273-
init_cls = OVQWENModel
273+
init_cls = OVBAICHUANModel
274274

275275
return init_cls(
276276
model=model, config=config, model_save_dir=model_cache_path.parent, **kwargs
277277
)
278278

279+
279280
class OVCHATGLMModel(OVModelForCausalLM):
280281
"""
281282
Optimum intel compatible model wrapper for CHATGLM2
@@ -364,6 +365,6 @@ def _from_pretrained(
364365

365366
model_classes = {
366367
"mpt": OVMPTModel,
367-
"qwen": OVQWENModel,
368+
"baichuan2": OVBAICHUANModel,
368369
"chatglm3": OVCHATGLMModel,
369370
}

0 commit comments

Comments
 (0)