Skip to content

Feature: FunASR/SenseVoice STT extension as Whisper alternative #7593

@LauraGPT

Description

@LauraGPT

Feature Request

The built-in whisper_stt extension works well but only supports Whisper models. A FunASR/SenseVoice STT extension would provide a faster, smaller alternative with richer output.

Why SenseVoice is compelling for text-generation-webui

Whisper small.en SenseVoice-Small
Params 244M 234M
Architecture Autoregressive Non-autoregressive
Languages English only 50+ languages
Speed Baseline Up to 25x faster
Extras Emotion + audio events

SenseVoice's non-autoregressive architecture gives constant-time decoding — faster response means more natural voice conversations.

Implementation

The extension structure would mirror whisper_stt, with FunASR as the backend:

from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall")

def do_stt(audio):
    result = model.generate(input=audio_array)
    return result[0]["text"]

The funasr package installs via pip and supports both CPU and CUDA.

Happy to contribute a PR for this extension if there's interest.

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions