Local-VoiceMode-LLM/.env.example at main · groxaxo/Local-VoiceMode-LLM · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# xAI TTS (cloud — last-resort fallback, used only if every local engine fails)
XAI_API_KEY=xai-your-key-here
XAI_TTS_VOICE=eve

# --- Remote providers (for slow CPUs — offload TTS/STT to a cloud endpoint) ---
# Local ONNX is CPU-tuned, but on an old/slow CPU even 8-step Supertonic can lag a
# live conversation. Point at any OpenAI-compatible endpoint (OpenAI, a hosted
# provider, or your own remote box) to offload. Full matrix: docs/providers.md.

# Generic OpenAI-compatible TTS (TTS_ENGINE=openai). Streams by sentence.
# OPENAI_TTS_URL=https://api.openai.com/v1   # any OpenAI-compatible base URL
# OPENAI_API_KEY=sk-your-key-here            # or OPENAI_TTS_KEY
# OPENAI_TTS_MODEL=gpt-4o-mini-tts           # e.g. tts-1, tts-1-hd, gpt-4o-mini-tts
# OPENAI_TTS_VOICE=alloy                     # alloy, echo, fable, onyx, nova, shimmer

# Inworld expressive cloud TTS (TTS_ENGINE=inworld). Per-sentence steering tags.
# INWORLD_API_KEY=base64-basic-key           # or INWORLD_TTS_API — platform.inworld.ai/api-keys
# INWORLD_TTS_VOICE=Ashley                   # 260 voices
# INWORLD_STEER=auto                         # auto (on for tts-2) | 1 | 0 (disable → faster, flatter)

# Remote STT (point Parakeet's slot at OpenAI Whisper or any compatible endpoint)
# STT_ENGINE=remote
# STT_REMOTE_URL=https://api.openai.com/v1/audio/transcriptions
# STT_REMOTE_MODEL=whisper-1
# STT_API_KEY=sk-your-key-here               # bearer key; or STT_REMOTE_KEY / OPENAI_API_KEY

# --- Local backends (auto-installed by setup.sh) ---

# Parakeet STT (ONNX, CPU, :5093) — local default on every platform
# STT_ENGINE=local                                  # local Parakeet ONNX/CPU (default)
# STT_MODEL=parakeet-tdt-0.6b-v3                     # CPU ONNX model (same everywhere)
# STT_URL=http://127.0.0.1:5093/v1/audio/transcriptions

# Supertonic TTS (ONNX, :8766) — local default
# TTS_ENGINE=supertonic
# SUPERTONIC_URL=http://127.0.0.1:8766   # override if your server runs elsewhere,
#                                        # e.g. :8765 when another service holds :8766
# SUPERTONIC_VOICE_STYLE=voice_styles/F4.json

# Supertonic 2 TTS (ONNX, :8880) — OPTIONAL, opt-in backend
# Supertonic Express 2 (onnx-community/Supertonic-TTS-2-ONNX) — same OpenAI-compatible
# API as Supertonic 3, multilingual (en/ko/es/pt/fr). Not auto-installed; run:
#   bash integrations/supertonic2/install.sh
# SUPERTONIC2_URL=http://127.0.0.1:8880
# SUPERTONIC2_VOICE=M1        # F1–F5 / M1–M5
# SUPERTONIC2_STEPS=8         # denoising steps 1–20 (defaults to the TTS_QUALITY preset)
# SUPERTONIC2_SPEED=1.05

# --- TTS engine selection ---
# Default is supertonic (local CPU); every other engine is secondary/opt-in. With
# a LOCAL primary, the local engines are always tried before any cloud. Choosing a
# remote engine (openai/inworld/xai) honors that choice first, then falls back local.
#   supertonic → neutts → xai                  (default)
#   neutts     → supertonic → xai
#   qwen       → supertonic → neutts → xai      (opt-in, Apple Silicon MLX)
#   openai     → supertonic → neutts            (remote; slow-CPU offload)
#   inworld    → qwen → supertonic → neutts     (remote; expressive)
#   xai        → supertonic → neutts            (remote; last resort)
# TTS_ENGINE=supertonic    # local ONNX, :8766 (default, auto-installed)
# TTS_ENGINE=neutts        # local GGUF, :8020 (requires separate NeuTTS backend)
# TTS_ENGINE=qwen          # local MLX, Apple Silicon (opt-in; separate backend)
# TTS_ENGINE=openai        # remote OpenAI-compatible /v1/audio/speech (slow-CPU offload)
# TTS_ENGINE=inworld       # remote expressive cloud (per-sentence steering)
# TTS_ENGINE=xai           # cloud xAI, last resort (requires XAI_API_KEY above)

# --- Talk mode ---
# TALK_AUTO_LISTEN=1      # After speak, run listen automatically
# TALK_BARGE_IN=0         # Set to 1 to interrupt TTS on user speech (echo-cancel required)
# TALK_IDLE_TIMEOUT_S=30  # Exit listen after N seconds of no speech