Merge branch 'latest' into nm/update_outetts

nikita-malininn · web-flow · commit 4caee85747b9 · 2025-02-06T12:52:37.000+01:00
diff --git a/notebooks/deepseek-r1/README.md b/notebooks/deepseek-r1/README.md
@@ -11,7 +11,7 @@ The tutorial supports different models, you can select one from the provided opt
 * **DeepSeek-R1-Distill-Llama-8B** is a distilled model based on [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B), that prioritizes high performance and advanced reasoning capabilities, particularly excelling in tasks requiring mathematical and factual precision. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) for more info.
 * **DeepSeek-R1-Distill-Qwen-1.5B** is the smallest DeepSeek-R1 distilled model based on [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B). Despite its compact size, the model demonstrates strong capabilities in solving basic mathematical tasks, at the same time its programming capabilities are limited. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for more info.
 * **DeepSeek-R1-Distill-Qwen-7B** is a distilled model based on [Qwen-2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B). The model demonstrates a good balance between mathematical and factual reasoning and can be less suited for complex coding tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for more info.
-* **DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks.  Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-15B) for more info.
+* **DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks.  Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) for more info.
 
 ## Notebook Contents
 
diff --git a/notebooks/deepseek-r1/deepseek-r1.ipynb b/notebooks/deepseek-r1/deepseek-r1.ipynb
@@ -109,7 +109,7 @@
     "* **DeepSeek-R1-Distill-Llama-8B** is a distilled model based on [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B), that prioritizes high performance and advanced reasoning capabilities, particularly excelling in tasks requiring mathematical and factual precision. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) for more info.\n",
     "* **DeepSeek-R1-Distill-Qwen-1.5B** is the smallest DeepSeek-R1 distilled model based on [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B). Despite its compact size, the model demonstrates strong capabilities in solving basic mathematical tasks, at the same time its programming capabilities are limited. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for more info.\n",
     "* **DeepSeek-R1-Distill-Qwen-7B** is a distilled model based on [Qwen-2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B). The model demonstrates a good balance between mathematical and factual reasoning and can be less suited for complex coding tasks. Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) for more info.\n",
-    "* **DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks.  Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-15B) for more info.\n",
+    "* **DeepSeek-R1-Distil-Qwen-14B** is a distilled model based on [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) that has great competence in factual reasoning and solving complex mathematical tasks.  Check [model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) for more info.\n",
     "\n",
     "[Weight compression](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html) is a technique for enhancing the efficiency of models, especially those with large memory requirements. This method reduces the model’s memory footprint, a crucial factor for Large Language Models (LLMs). We provide several options for model weight compression:\n",
     "\n",
diff --git a/notebooks/hugging-face-hub/README.md b/notebooks/hugging-face-hub/README.md
@@ -5,7 +5,7 @@
 The Hugging Face (HF) Model Hub is a central repository for pre-trained deep learning models. It allows exploration and provides access to thousands of models for a wide range of tasks, including text classification, question answering, and image classification.
 Hugging Face provides Python packages that serve as APIs and tools to easily download and fine tune state-of-the-art pretrained models, namely [transformers] and [diffusers] packages.
 
-![](https://github.com/huggingface/optimum-intel/raw/main/readme_logo.png)
+![](https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/logo/hf_intel_logo.png)
 
 ## Contents: 
 Throughout this notebook we will learn:
diff --git a/notebooks/hugging-face-hub/hugging-face-hub.ipynb b/notebooks/hugging-face-hub/hugging-face-hub.ipynb
@@ -10,7 +10,7 @@
     "The Hugging Face (HF) [Model Hub](https://huggingface.co/models) is a central repository for pre-trained deep learning models. It allows exploration and provides access to thousands of models for a wide range of tasks, including text classification, question answering, and image classification.\n",
     "Hugging Face provides Python packages that serve as APIs and tools to easily download and fine tune state-of-the-art pretrained models, namely [transformers](https://github.com/huggingface/transformers) and [diffusers](https://github.com/huggingface/diffusers) packages.\n",
     "\n",
-    "![](https://github.com/huggingface/optimum-intel/raw/main/readme_logo.png)\n",
+    "![](https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/logo/hf_intel_logo.png)\n",
     "\n",
     "Throughout this notebook we will learn:\n",
     "1. How to load a HF pipeline using the `transformers` package and then convert it to OpenVINO.\n",
diff --git a/notebooks/llm-rag-langchain/ov_langchain_helper.py b/notebooks/llm-rag-langchain/ov_langchain_helper.py
@@ -62,8 +62,6 @@ def from_model_path(
         model_path: str,
         device: str = "CPU",
         tokenizer: Any = None,
-        draft_model_path: Optional[str] = None,
-        draft_model_device: Optional[str] = "CPU",
         **kwargs: Any,
     ) -> OpenVINOLLM:
         """Construct the oepnvino object from model_path"""
@@ -206,11 +204,7 @@ def put(self, token_id: int) -> bool:
                     return False
                 return super().put(token_id)
 
-        if draft_model_path is not None:
-            draft_model = openvino_genai.draft_model(draft_model_path, draft_model_device)
-            pipe = openvino_genai.LLMPipeline(model_path, device, draft_model=draft_model)
-        else:
-            pipe = openvino_genai.LLMPipeline(model_path, device)
+        pipe = openvino_genai.LLMPipeline(model_path, device, **kwargs)
 
         config = pipe.get_generation_config()
         if tokenizer is None:
@@ -245,7 +239,7 @@ def _call(
             input_ids = tokens["input_ids"]
             attention_mask = tokens["attention_mask"]
             prompt = openvino_genai.TokenizedInputs(ov.Tensor(input_ids), ov.Tensor(attention_mask))
-        output = self.pipe.generate(prompt, self.config)
+        output = self.pipe.generate(prompt, self.config, **kwargs)
         if not isinstance(self.tokenizer, openvino_genai.Tokenizer):
             output = self.tokenizer.batch_decode(output.tokens, skip_special_tokens=True)[0]
         return output
@@ -280,7 +274,7 @@ def generate_and_signal_complete() -> None:
             genration function for single thread
             """
             self.streamer.reset()
-            self.pipe.generate(prompt, self.config, self.streamer)
+            self.pipe.generate(prompt, self.config, self.streamer, **kwargs)
             stream_complete.set()
             self.streamer.end()
 
diff --git a/notebooks/sdxl-turbo/sdxl-turbo.ipynb b/notebooks/sdxl-turbo/sdxl-turbo.ipynb