Feature Request
The built-in whisper_stt extension works well but only supports Whisper models. A FunASR/SenseVoice STT extension would provide a faster, smaller alternative with richer output.
Why SenseVoice is compelling for text-generation-webui
|
Whisper small.en |
SenseVoice-Small |
| Params |
244M |
234M |
| Architecture |
Autoregressive |
Non-autoregressive |
| Languages |
English only |
50+ languages |
| Speed |
Baseline |
Up to 25x faster |
| Extras |
— |
Emotion + audio events |
SenseVoice's non-autoregressive architecture gives constant-time decoding — faster response means more natural voice conversations.
Implementation
The extension structure would mirror whisper_stt, with FunASR as the backend:
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall")
def do_stt(audio):
result = model.generate(input=audio_array)
return result[0]["text"]
The funasr package installs via pip and supports both CPU and CUDA.
Happy to contribute a PR for this extension if there's interest.
References:
Feature Request
The built-in
whisper_sttextension works well but only supports Whisper models. A FunASR/SenseVoice STT extension would provide a faster, smaller alternative with richer output.Why SenseVoice is compelling for text-generation-webui
SenseVoice's non-autoregressive architecture gives constant-time decoding — faster response means more natural voice conversations.
Implementation
The extension structure would mirror
whisper_stt, with FunASR as the backend:The
funasrpackage installs via pip and supports both CPU and CUDA.Happy to contribute a PR for this extension if there's interest.
References: