Replies: 1 comment 1 reply
-
|
Based on our tests, at present only Gemini 2.5/3.x Flash/Pro can reliably transcribe stably without audio preprocessing. As far as I know, there is currently no offline model that can simultaneously support speech + tools + structured output; using Gemini is recommended. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, thanks for the great project.
I’m currently testing AFFiNE in a self-hosted/company environment, so I would prefer not to call external APIs directly. Because of that, I’m trying to use local models through Ollama instead of cloud providers.
I noticed that for Transcript audio, the official/default configuration seems to use gemini-2.5-flash. I want to ask:
Besides gemini-2.5-flash, what other models can be used for Transcript audio?
Does AFFiNE support using Ollama models for this feature?
If yes, are there any recommended models on Ollama that work well for audio transcription / transcript-related flows?
Is there any requirement for the model capability, such as:
audio input support
tool calling support
structured output support
specific OpenAI-compatible API behavior
My use case is:
self-hosted AFFiNE
internal company network
avoid external network/API access as much as possible
prefer local deployment with Ollama
I have already tried replacing the default model with an Ollama model in some AI-related configuration, but I’m not sure whether Transcript audio has stricter requirements than normal chat/structured generation.
So I’d like to confirm:
which models are officially supported for Transcript audio
whether local Ollama models are feasible
and if there are any known working model examples
Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions