Darkest Dungeon Text-to-Speech Voice Cloning

TTS application with voice cloning capabilities from Darkest Dungeon game audio samples. Uses Coqui TTS (XTTS v2) to generate deep, narrator-style voice that matches the game's atmosphere.

Features

✅ Voice Cloning from Darkest Dungeon audio samples
✅ Deep Voice Processing - Automatically deepens voice with pitch shifting and formant adjustment
✅ Web interface with Gradio (easy to use)
✅ CPU support (no GPU required)
✅ Automatic text splitting for long texts
✅ Automatic selection of best voice samples
✅ 100+ Darkest Dungeon-themed example texts shuffled on each app start
✅ Beautiful English UI focused on Darkest Dungeon atmosphere

Yêu cầu hệ thống

Python 3.10 hoặc cao hơn
Hệ điều hành: Windows, Linux, hoặc macOS
RAM: Tối thiểu 4GB (khuyến nghị 8GB+)
Ổ cứng: ~5GB để lưu models và dependencies

Cài đặt

1. Clone repository và vào thư mục

cd Project_Sound_DD

2. Kích hoạt virtual environment (nếu có)

Windows:

.\venv\Scripts\Activate.ps1

Linux/Mac:

source venv/bin/activate

3. Cài đặt dependencies

Cách 1: Sử dụng script tự động (Khuyến nghị)

Windows PowerShell:

.\install_dependencies.ps1

Method 2: Manual installation

pip install -e .
pip install bnnumerizer gruut

Lưu ý quan trọng:

Một số packages (gruut, jieba) cần được cài từ source để tránh lỗi missing files
Script install_dependencies.ps1 sẽ tự động xử lý điều này
Package bnnumerizer có thể gặp vấn đề cài đặt. Nếu gặp lỗi, tạo file stub:
- Tạo file venv\Lib\site-packages\bnnumerizer.py với nội dung:
```
def numerize(text):
    return text
```
Lần đầu tiên chạy sẽ tự động tải TTS model (~2GB), có thể mất vài phút.

Xử lý lỗi dependencies:

Nếu gặp lỗi ModuleNotFoundError với gruut hoặc jieba, cài lại từ source:

.\venv\Scripts\pip.exe install --force-reinstall --no-cache-dir gruut==2.2.3
.\venv\Scripts\pip.exe install --force-reinstall --no-cache-dir jieba

Nếu gặp lỗi cannot import name 'BeamSearchScorer' from 'transformers', downgrade transformers:
```
.\venv\Scripts\pip.exe install "transformers>=4.33.0,<4.40.0"
```
Nếu gặp lỗi Weights only load failed hoặc weights_only với PyTorch, downgrade PyTorch:
```
.\venv\Scripts\pip.exe install "torch>=2.0.0,<2.6.0" "torchaudio>=2.0.0,<2.6.0"
```

Usage

Run the web application

python app.py

Then open your browser and navigate to: http://localhost:7860

Use in Python code

English TTS (with voice cloning):

from src.voice_cloner import VoiceCloner

# Initialize voice cloner
cloner = VoiceCloner(sound_dir="Sound")
cloner.initialize()

# Generate audio from English text
audio = cloner.synthesize_simple(
    text="Hello, this is a text-to-speech application with voice cloning.",
    output_path="output.wav"
)

Deep Voice Processing

The application automatically applies deep voice processing to make the narrator voice deeper and more atmospheric:

Pitch Shifting: Lowers pitch by 4 semitones for a deeper sound
Formant Shifting: Adjusts vocal tract characteristics for a deeper timbre
Uses librosa for high-quality audio processing

Cấu trúc project

Project_Sound_DD/
├── Sound/                 # Thư mục chứa các file .wav mẫu (481 files)
├── src/
│   ├── __init__.py
│   ├── audio_processor.py   # Xử lý và chuẩn hóa file âm thanh
│   ├── tts_engine.py         # TTS engine với Coqui TTS
│   ├── text_processor.py     # Xử lý văn bản tiếng Anh
│   └── voice_cloner.py       # Module voice cloning chính
├── tests/                   # Unit tests
├── app.py                   # Ứng dụng Gradio
├── pyproject.toml          # Dependencies và cấu hình
└── README.md

Tối ưu hóa cho CPU

Model được tối ưu để chạy trên CPU
Sử dụng threading giới hạn để tránh quá tải
Tự động chia nhỏ văn bản dài
Caching models để tái sử dụng

Xử lý lỗi thường gặp

Lỗi: "No voice samples available"

Đảm bảo thư mục Sound có chứa các file .wav
Kiểm tra quyền truy cập file

Lỗi: "TTS model not initialized"

Kiểm tra kết nối internet (lần đầu cần tải model)
Đảm bảo đã cài đặt đầy đủ dependencies

Chạy chậm

Bình thường khi chạy trên CPU, mỗi câu có thể mất 10-30 giây
Có thể giảm độ dài văn bản để tăng tốc

Phát triển

Chạy tests

pytest tests/

Cài đặt development dependencies

pip install -e ".[dev]"

Giấy phép

MIT License

Tác giả

Dự án phát triển cho Vietnamese TTS với voice cloning.

Notes

Voice Cloning: Uses XTTS v2 model with voice cloning from Darkest Dungeon audio samples in the Sound directory.
Deep Voice Processing: Automatically applies pitch shifting (-4 semitones) and formant adjustment to create a deep, narrator-style voice.
Voice quality depends on the quality of sample files in the Sound directory.
First-time use requires time to download and initialize the XTTS v2 model (~2GB).
Examples are shuffled on each app start for variety.
All examples are themed around Darkest Dungeon for an immersive experience.

Audio Processing

The deep voice processing includes:

Pitch Shifting: Lowers pitch using librosa's pitch_shift function
Formant Shifting: Adjusts frequency spectrum for deeper vocal characteristics
Automatic normalization to prevent clipping
Maintains audio quality and naturalness

Adjust parameters in src/voice_cloner.py to customize the depth of voice:

pitch_shift_semitones: How much to lower pitch (default: -4.0)
formant_shift: Formant adjustment factor (default: 0.88)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Darkest Dungeon Text-to-Speech Voice Cloning

Features

Yêu cầu hệ thống

Cài đặt

1. Clone repository và vào thư mục

2. Kích hoạt virtual environment (nếu có)

3. Cài đặt dependencies

Usage

Run the web application

Use in Python code

Deep Voice Processing

Cấu trúc project

Tối ưu hóa cho CPU

Xử lý lỗi thường gặp

Lỗi: "No voice samples available"

Lỗi: "TTS model not initialized"

Chạy chậm

Phát triển

Chạy tests

Cài đặt development dependencies

Giấy phép

Tác giả

Notes

Audio Processing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.cursor/rules		.cursor/rules
Sound		Sound
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
install_dependencies.ps1		install_dependencies.ps1
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Darkest Dungeon Text-to-Speech Voice Cloning

Features

Yêu cầu hệ thống

Cài đặt

1. Clone repository và vào thư mục

2. Kích hoạt virtual environment (nếu có)

3. Cài đặt dependencies

Usage

Run the web application

Use in Python code

Deep Voice Processing

Cấu trúc project

Tối ưu hóa cho CPU

Xử lý lỗi thường gặp

Lỗi: "No voice samples available"

Lỗi: "TTS model not initialized"

Chạy chậm

Phát triển

Chạy tests

Cài đặt development dependencies

Giấy phép

Tác giả

Notes

Audio Processing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages