Skip to content

Commit d045c75

Browse files
committed
Merge branch 'ea/flex2' of https://github.com/eaidova/openvino_notebooks into ea/flex2
2 parents 7cfdd06 + 337a6b8 commit d045c75

File tree

9 files changed

+1404
-3
lines changed

9 files changed

+1404
-3
lines changed

.ci/ignore_treon_docker.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ notebooks/stable-diffusion-ip-adapter/stable-diffusion-ip-adapter.ipynb
3939
notebooks/kosmos2-multimodal-large-language-model/kosmos2-multimodal-large-language-model.ipynb
4040
notebooks/photo-maker/photo-maker.ipynb
4141
notebooks/openvoice/openvoice.ipynb
42+
notebooks/openvoice2-and-melotts/openvoice2-and-melotts.ipynb
4243
notebooks/surya-line-level-text-detection/surya-line-level-text-detection.ipynb
4344
notebooks/instant-id/instant-id.ipynb
4445
notebooks/stable-diffusion-keras-cv/stable-diffusion-keras-cv.ipynb

.ci/skipped_notebooks.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -546,3 +546,7 @@
546546
skips:
547547
- python:
548548
- "3.9"
549+
- notebook: notebooks/openvoice2-and-melotts/openvoice2-and-melotts.ipynb
550+
skips:
551+
- os:
552+
- macos-13

.ci/spellcheck/.pyspelling.wordlist.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -543,6 +543,7 @@ md
543543
MediaPipe
544544
medprob
545545
mel
546+
MeloTTS
546547
Mels
547548
MERCHANTABILITY
548549
MF

notebooks/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949
- [Text-to-image generation using PhotoMaker and OpenVINO](./photo-maker/photo-maker.ipynb)
5050
- [Multimodal assistant with Phi-4-multimodal and OpenVINO](./phi-4-multimodal/phi-4-multimodal.ipynb)
5151
- [Visual-language assistant with Phi3-Vision and OpenVINO](./phi-3-vision/phi-3-vision.ipynb)
52+
- [Voice tone cloning with OpenVoice2 and MeloTTS for Text-to-Speech by OpenVINO](./openvoice2-and-melotts/openvoice2-and-melotts.ipynb)
5253
- [Voice tone cloning with OpenVoice and OpenVINO](./openvoice/openvoice.ipynb)
5354
- [Running OpenCLIP models using OpenVINO™](./open-clip/open-clip.ipynb)
5455
- [Screen Parsing with OmniParser-v2.0 and OpenVINO](./omniparser/omniparser.ipynb)
@@ -147,6 +148,7 @@
147148
- [Line-level text detection with Surya](./surya-line-level-text-detection/surya-line-level-text-detection.ipynb)
148149
- [Convert a PyTorch Model to OpenVINO™ IR](./pytorch-to-openvino/pytorch-to-openvino.ipynb)
149150
- [Convert a PaddlePaddle Model to OpenVINO™ IR](./paddle-to-openvino/paddle-to-openvino-classification.ipynb)
151+
- [Voice tone cloning with OpenVoice2 and MeloTTS for Text-to-Speech by OpenVINO](./openvoice2-and-melotts/openvoice2-and-melotts.ipynb)
150152
- [Voice tone cloning with OpenVoice and OpenVINO](./openvoice/openvoice.ipynb)
151153
- [OpenVINO Tokenizers: Incorporate Text Processing Into OpenVINO Pipelines](./openvino-tokenizers/openvino-tokenizers.ipynb)
152154
- [Object detection and masking from prompts with GroundedSAM (GroundingDINO + SAM) and OpenVINO](./grounded-segment-anything/grounded-segment-anything.ipynb)
@@ -178,6 +180,7 @@
178180
- [Person Tracking with OpenVINO™](./person-tracking-webcam/person-tracking.ipynb)
179181
- [Person Counting System using YOLOV8 and OpenVINO™](./person-counting-webcam/person-counting.ipynb)
180182
- [PaddleOCR with OpenVINO™](./paddle-ocr-webcam/paddle-ocr-webcam.ipynb)
183+
- [Voice tone cloning with OpenVoice2 and MeloTTS for Text-to-Speech by OpenVINO](./openvoice2-and-melotts/openvoice2-and-melotts.ipynb)
181184
- [Voice tone cloning with OpenVoice and OpenVINO](./openvoice/openvoice.ipynb)
182185
- [Live Object Detection with OpenVINO™](./object-detection-webcam/object-detection.ipynb)
183186
- [CLIP model with Jina CLIP and OpenVINO](./jina-clip/jina-clip.ipynb)
@@ -250,6 +253,7 @@
250253
- [Text-to-speech (TTS) with Parler-TTS and OpenVINO](./parler-tts-text-to-speech/parler-tts-text-to-speech.ipynb)
251254
- [Text-to-Speech synthesis using OuteTTS and OpenVINO](./outetts-text-to-speech/outetts-text-to-speech.ipynb)
252255
- [Optical Character Recognition (OCR) with OpenVINO™](./optical-character-recognition/optical-character-recognition.ipynb)
256+
- [Voice tone cloning with OpenVoice2 and MeloTTS for Text-to-Speech by OpenVINO](./openvoice2-and-melotts/openvoice2-and-melotts.ipynb)
253257
- [Voice tone cloning with OpenVoice and OpenVINO](./openvoice/openvoice.ipynb)
254258
- [Running OpenCLIP models using OpenVINO™](./open-clip/open-clip.ipynb)
255259
- [Universal Segmentation with OneFormer and OpenVINO](./oneformer-segmentation/oneformer-segmentation.ipynb)
@@ -344,6 +348,7 @@
344348
- [Quantization Aware Training with NNCF, using PyTorch framework](./pytorch-quantization-aware-training/pytorch-quantization-aware-training.ipynb)
345349
- [Post-Training Quantization of PyTorch models with NNCF](./pytorch-post-training-quantization-nncf/pytorch-post-training-quantization-nncf.ipynb)
346350
- [Optimize Preprocessing](./optimize-preprocessing/optimize-preprocessing.ipynb)
351+
- [Voice tone cloning with OpenVoice2 and MeloTTS for Text-to-Speech by OpenVINO](./openvoice2-and-melotts/openvoice2-and-melotts.ipynb)
347352
- [Voice tone cloning with OpenVoice and OpenVINO](./openvoice/openvoice.ipynb)
348353
- [OpenVINO Tokenizers: Incorporate Text Processing Into OpenVINO Pipelines](./openvino-tokenizers/openvino-tokenizers.ipynb)
349354
- [Quantize NLP models with Post-Training Quantization ​in NNCF](./language-quantize-bert/language-quantize-bert.ipynb)

notebooks/llm-rag-langchain/llm-rag-langchain-genai.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -880,7 +880,7 @@
880880
},
881881
{
882882
"cell_type": "code",
883-
"execution_count": 16,
883+
"execution_count": null,
884884
"id": "d0bab20b",
885885
"metadata": {},
886886
"outputs": [],
@@ -892,7 +892,7 @@
892892
"display(Markdown(f\"`{export_command}`\"))\n",
893893
"\n",
894894
"if not Path(rerank_model_id.value).exists():\n",
895-
" optimum_cli(rerank_model_configuration[\"model_id\"], str(rerank_model_id.value), show_command=False, additional_args={\"task\": \"text-classificaton\"})"
895+
" optimum_cli(rerank_model_configuration[\"model_id\"], str(rerank_model_id.value), show_command=False, additional_args={\"task\": \"text-classification\"})"
896896
]
897897
},
898898
{

notebooks/llm-rag-llamaindex/llm-rag-llamaindex.ipynb

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1224,7 +1224,7 @@
12241224
},
12251225
{
12261226
"cell_type": "code",
1227-
"execution_count": 26,
1227+
"execution_count": null,
12281228
"id": "f7f708db-8de1-4efd-94b2-fcabc48d52f4",
12291229
"metadata": {},
12301230
"outputs": [
@@ -1288,7 +1288,9 @@
12881288
"import openvino.properties as props\n",
12891289
"import openvino.properties.hint as hints\n",
12901290
"import openvino.properties.streams as streams\n",
1291+
"import openvino\n",
12911292
"\n",
1293+
"core = openvino.Core()\n",
12921294
"\n",
12931295
"if model_to_run.value == \"INT4\":\n",
12941296
" model_dir = int4_model_dir\n",
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Voice tone cloning with OpenVoice2 and MeloTTS for Text-to-Speech by OpenVINO
2+
3+
4+
5+
<!-- TODO: insert link with the image/gif -->
6+
![sdf](https://github.com/openvinotoolkit/openvino_notebooks/assets/5703039/ca7eab80-148d-45b0-84e8-a5a279846b51)
7+
8+
OpenVoice2 is a versatile system for instant voice tone transferring and generating speech in various languages with just a brief audio snippet from the source speaker, using MeloTTS as the base speakers. OpenVoice2 includes all features from V1 and introduces several enhancements: (i) better audio quality: OpenVoice2 adopts a different training strategy that delivers superior audio quality. (ii) native multi-lingual support: English, Spanish, French, Chinese, Japanese, and Korean are natively supported. (iii) free commercial use: starting from April 2024, both V2 and V1 are released under the MIT License, allowing free commercial use.
9+
10+
OpenVoice2 retains the core strengths of OpenVoice1, including accurate tone color cloning, flexible voice style control, and zero-shot cross-lingual voice cloning.
11+
12+
More details about model can be found in [project web page](https://research.myshell.ai/open-voice), [paper](https://arxiv.org/abs/2312.01479), and official [repository](https://github.com/myshell-ai/OpenVoice)
13+
14+
In this tutorial we will explore how to convert and run OpenVoice2 and MeloTTS using OpenVINO.
15+
16+
## Notebook Contents
17+
18+
This notebook demonstrates voice tone cloning with [OpenVoice](https://github.com/myshell-ai/OpenVoice) in OpenVINO.
19+
20+
The tutorial consists of following steps:
21+
- Install prerequisites
22+
- Load PyTorch model
23+
- Convert Model to Openvino Intermediate Representation format
24+
- Run OpenVINO model inference on a single example
25+
- Launch interactive demo
26+
27+
## Installation Instructions
28+
29+
This is a self-contained example that relies solely on its own code.</br>
30+
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
31+
For details, please refer to [Installation Guide](../../README.md).
32+
33+
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/openvoice2-and-melotts/README.md" />
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
from typing import Callable
2+
import gradio as gr
3+
4+
5+
description = """
6+
# OpenVoice2 accelerated by OpenVINO:
7+
8+
a versatile instant voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. OpenVoice also achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set.
9+
"""
10+
11+
content = """
12+
<div>
13+
<strong>If the generated voice does not sound like the reference voice, please refer to <a href='https://github.com/myshell-ai/OpenVoice/blob/main/docs/QA.md'>this QnA</a>.</strong> <strong>For multi-lingual & cross-lingual examples, please refer to <a href='https://github.com/myshell-ai/OpenVoice/blob/main/demo_part2.ipynb'>this jupyter notebook</a>.</strong>
14+
This online demo mainly supports <strong>English</strong>. The <em>default</em> style also supports <strong>Chinese</strong>. But OpenVoice can adapt to any other language as long as a base speaker is provided.
15+
</div>
16+
"""
17+
wrapped_markdown_content = f"<div style='border: 1px solid #000; padding: 10px;'>{content}</div>"
18+
19+
20+
examples = [
21+
[
22+
"Did you ever hear a folk tale about a giant turtle?",
23+
"en_latest",
24+
"OpenVoice/resources/demo_speaker0.mp3",
25+
True,
26+
],
27+
[
28+
"我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。",
29+
"zh_default",
30+
"OpenVoice/resources/demo_speaker1.mp3",
31+
True,
32+
],
33+
]
34+
35+
36+
def make_demo(fn: Callable):
37+
with gr.Blocks(analytics_enabled=False) as demo:
38+
with gr.Row():
39+
gr.Markdown(description)
40+
with gr.Row():
41+
gr.HTML(wrapped_markdown_content)
42+
43+
with gr.Row():
44+
with gr.Column():
45+
input_text_gr = gr.Textbox(
46+
label="Text Prompt",
47+
info="One or two sentences at a time is better. Up to 50 text characters.",
48+
value="The bustling city square bustled with street performers, tourists, and local vendors.",
49+
)
50+
style_gr = gr.Dropdown(
51+
label="Style",
52+
info="Select a style of output audio for the synthesised speech. (Chinese only support 'default' now)",
53+
choices=[
54+
"en_latest",
55+
"zh_default",
56+
],
57+
max_choices=1,
58+
value="en_latest",
59+
)
60+
ref_gr = gr.Audio(
61+
label="Reference Audio",
62+
# info="Click on the button to upload your own target speaker audio",
63+
type="filepath",
64+
value="OpenVoice/resources/demo_speaker0.mp3",
65+
)
66+
tos_gr = gr.Checkbox(
67+
label="Agree",
68+
value=False,
69+
info="I agree to the terms of the MIT license-: https://github.com/myshell-ai/OpenVoice/blob/main/LICENSE",
70+
)
71+
72+
tts_button = gr.Button("Send", elem_id="send-btn", visible=True)
73+
74+
with gr.Column():
75+
out_text_gr = gr.Text(label="Info")
76+
audio_gr = gr.Audio(label="Synthesised Audio", autoplay=True)
77+
ref_audio_gr = gr.Audio(label="Reference Audio Used")
78+
79+
gr.Examples(
80+
examples,
81+
label="Examples",
82+
inputs=[input_text_gr, style_gr, ref_gr, tos_gr],
83+
outputs=[out_text_gr, audio_gr, ref_audio_gr],
84+
fn=fn,
85+
cache_examples=False,
86+
)
87+
tts_button.click(
88+
fn,
89+
[input_text_gr, style_gr, ref_gr, tos_gr],
90+
outputs=[out_text_gr, audio_gr, ref_audio_gr],
91+
)
92+
return demo

0 commit comments

Comments
 (0)