End-to-end Text to Speech

End-to-end processing engine from website text to speech audio output.

Features:

Extract main text from websites using trafilatura
Preprocess text with NVIDIA NeMO and some custom code
Split text into sentences, taking maximum number of tokens into account
Generate speech with Coqui XTTS-v2
Validate speech samples with whisper-timestamped, regenerating sample if necessary
Concat sentences into one WAV
Enhance WAV with noisereduce

Setup

Install the following dependencies: pip install noisereduce requests beautifulsoup4 trafilatura nemo_toolkit[all] whisper-timestamped TTS

Due to the dependency on whisper-timestamped, the whole project is licensed under APGL-3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
medium.py		medium.py
preprocessing.py		preprocessing.py
sentence_tts.py		sentence_tts.py