Skip to content

Commit 4caee85

Browse files
Merge branch 'latest' into nm/update_outetts
2 parents f057369 + f0deb2a commit 4caee85

File tree

6 files changed

+262
-326
lines changed

6 files changed

+262
-326
lines changed

notebooks/deepseek-r1/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The tutorial supports different models, you can select one from the provided opt
1111
* **DeepSeek-R1-Distill-Llama-8B** is a distilled model based on [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B), that prioritizes high performance and advanced reasoning capabilities, particularly excelling in tasks requiring mathematical and factual precision. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) for more info.
1212
* **DeepSeek-R1-Distill-Qwen-1.5B** is the smallest DeepSeek-R1 distilled model based on [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B). Despite its compact size, the model demonstrates strong capabilities in solving basic mathematical tasks, at the same time its programming capabilities are limited. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for more info.
1313
* **DeepSeek-R1-Distill-Qwen-7B** is a distilled model based on [Qwen-2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B). The model demonstrates a good balance between mathematical and factual reasoning and can be less suited for complex coding tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for more info.
14-
* **DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-15B) for more info.
14+
* **DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) for more info.
1515

1616
## Notebook Contents
1717

notebooks/deepseek-r1/deepseek-r1.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@
109109
"* **DeepSeek-R1-Distill-Llama-8B** is a distilled model based on [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B), that prioritizes high performance and advanced reasoning capabilities, particularly excelling in tasks requiring mathematical and factual precision. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) for more info.\n",
110110
"* **DeepSeek-R1-Distill-Qwen-1.5B** is the smallest DeepSeek-R1 distilled model based on [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B). Despite its compact size, the model demonstrates strong capabilities in solving basic mathematical tasks, at the same time its programming capabilities are limited. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for more info.\n",
111111
"* **DeepSeek-R1-Distill-Qwen-7B** is a distilled model based on [Qwen-2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B). The model demonstrates a good balance between mathematical and factual reasoning and can be less suited for complex coding tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for more info.\n",
112-
"* **DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-15B) for more info.\n",
112+
"* **DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) for more info.\n",
113113
"\n",
114114
"[Weight compression](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html) is a technique for enhancing the efficiency of models, especially those with large memory requirements. This method reduces the model’s memory footprint, a crucial factor for Large Language Models (LLMs). We provide several options for model weight compression:\n",
115115
"\n",

notebooks/hugging-face-hub/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
The Hugging Face (HF) Model Hub is a central repository for pre-trained deep learning models. It allows exploration and provides access to thousands of models for a wide range of tasks, including text classification, question answering, and image classification.
66
Hugging Face provides Python packages that serve as APIs and tools to easily download and fine tune state-of-the-art pretrained models, namely [transformers] and [diffusers] packages.
77

8-
![](https://github.com/huggingface/optimum-intel/raw/main/readme_logo.png)
8+
![](https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/logo/hf_intel_logo.png)
99

1010
## Contents:
1111
Throughout this notebook we will learn:

notebooks/hugging-face-hub/hugging-face-hub.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"The Hugging Face (HF) [Model Hub](https://huggingface.co/models) is a central repository for pre-trained deep learning models. It allows exploration and provides access to thousands of models for a wide range of tasks, including text classification, question answering, and image classification.\n",
1111
"Hugging Face provides Python packages that serve as APIs and tools to easily download and fine tune state-of-the-art pretrained models, namely [transformers](https://github.com/huggingface/transformers) and [diffusers](https://github.com/huggingface/diffusers) packages.\n",
1212
"\n",
13-
"![](https://github.com/huggingface/optimum-intel/raw/main/readme_logo.png)\n",
13+
"![](https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/logo/hf_intel_logo.png)\n",
1414
"\n",
1515
"Throughout this notebook we will learn:\n",
1616
"1. How to load a HF pipeline using the `transformers` package and then convert it to OpenVINO.\n",

notebooks/llm-rag-langchain/ov_langchain_helper.py

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,6 @@ def from_model_path(
6262
model_path: str,
6363
device: str = "CPU",
6464
tokenizer: Any = None,
65-
draft_model_path: Optional[str] = None,
66-
draft_model_device: Optional[str] = "CPU",
6765
**kwargs: Any,
6866
) -> OpenVINOLLM:
6967
"""Construct the oepnvino object from model_path"""
@@ -206,11 +204,7 @@ def put(self, token_id: int) -> bool:
206204
return False
207205
return super().put(token_id)
208206

209-
if draft_model_path is not None:
210-
draft_model = openvino_genai.draft_model(draft_model_path, draft_model_device)
211-
pipe = openvino_genai.LLMPipeline(model_path, device, draft_model=draft_model)
212-
else:
213-
pipe = openvino_genai.LLMPipeline(model_path, device)
207+
pipe = openvino_genai.LLMPipeline(model_path, device, **kwargs)
214208

215209
config = pipe.get_generation_config()
216210
if tokenizer is None:
@@ -245,7 +239,7 @@ def _call(
245239
input_ids = tokens["input_ids"]
246240
attention_mask = tokens["attention_mask"]
247241
prompt = openvino_genai.TokenizedInputs(ov.Tensor(input_ids), ov.Tensor(attention_mask))
248-
output = self.pipe.generate(prompt, self.config)
242+
output = self.pipe.generate(prompt, self.config, **kwargs)
249243
if not isinstance(self.tokenizer, openvino_genai.Tokenizer):
250244
output = self.tokenizer.batch_decode(output.tokens, skip_special_tokens=True)[0]
251245
return output
@@ -280,7 +274,7 @@ def generate_and_signal_complete() -> None:
280274
genration function for single thread
281275
"""
282276
self.streamer.reset()
283-
self.pipe.generate(prompt, self.config, self.streamer)
277+
self.pipe.generate(prompt, self.config, self.streamer, **kwargs)
284278
stream_complete.set()
285279
self.streamer.end()
286280

notebooks/sdxl-turbo/sdxl-turbo.ipynb

Lines changed: 255 additions & 313 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)