Low-latency voice dictation for Linux Wayland/X11.
It runs as a PyQt5 tray app, keeps a faster-whisper worker warm in the background, listens to your microphone with sounddevice, and types the cleaned transcript into the currently focused window with wtype.
Example tray appearance on Hyprland. tray_app.py The tray icon/menu (the literal microphone, fourth from the right) is the main control surface for starting dictation, switching microphones, and quitting the app.
- starts from a system tray icon
- prewarms the speech model once so toggling dictation is fast after launch
- records audio from the selected microphone
- detects speech with a lightweight VAD loop
- transcribes with
faster-whisper - cleans up noisy transcript formatting before typing
- types into the focused app with
wtype - exposes a local Unix socket so external keybinds can toggle dictation
- repairs/installs a Hyprland
Ctrl+Alt+Fruntime bind automatically when possible
tray_app.py— PyQt5 tray app, worker lifecycle, device menu, socket listener, Hyprland bind repairwhisper_dictate.py— warm worker, audio capture, VAD, transcription, transcript cleanup, typingtoggle_dictation.py— tiny CLI that sendstoggleover the Unix sockethotkey_listener.py— legacy raw-evdev listener forCtrl+Alt+Fwhen running with elevated input permissionsmic-on.png/mic-off.png— tray iconsexample.jpg— screenshot of the tray on Hyprlandutils/benchmark_latency.py— benchmark helper for worker boot/start/stop timingdocs/performance-audit-20260408-v1.md— current performance audit and optimization plan
There are two long-lived processes in the normal tray workflow:
-
tray_app.py- creates the tray icon and context menu
- loads/saves
config.json - listens on
.dictation.sock - prewarms the worker at startup
- sends
start,stop, andquitcommands to the worker
-
whisper_dictate.py --controlled- loads
faster-whisperonce - waits for commands on stdin
- starts/stops audio sessions without reloading the model
- writes state lines like
Worker ready,Listening..., andSession stopped
- loads
That warm-worker design is what removes the big per-toggle startup penalty.
System packages you need:
wtypeportaudio/libportaudio2- Python 3
Python packages used by the code:
PyQt5sounddevicenumpyfaster-whisperevdev(legacy raw hotkey path)ctranslate2viafaster-whisper
Use the project venv if you have one:
python tray_app.pyOr from the repo directory:
python tray_app.pyWhen the tray app starts it:
- opens the tray icon
- creates the local control socket
- prewarms the worker
- starts showing
mic-off.png - allows left-click toggle and right-click menu actions
The intended Wayland path is a compositor bind that triggers toggle_dictation.py.
Hyprland example:
exec = python /home/pry/wl-dictate/tray_app.py
bind = CTRL ALT, f, exec, python /home/pry/wl-dictate/toggle_dictation.pyNotes:
- the tray app should be launched once per session
- the bind talks to the tray app over the Unix socket
- the tray app also tries to repair/install the runtime Hyprland bind automatically if there is no conflicting bind already using
Ctrl+Alt+F
If you just want to trigger dictation from a shell or another launcher:
python toggle_dictation.pyThat command does not transcribe anything by itself. It only asks the running tray app to toggle state.
hotkey_listener.py still exists, but it reads raw /dev/input/event* devices. That means it is not the normal Hyprland path.
Run it only if you specifically want the raw-input mode and have the required device permissions:
sudo python hotkey_listener.pyYou can still run the transcription worker directly:
python whisper_dictate.py
python whisper_dictate.py 0
python whisper_dictate.py "USB Microphone"That bypasses the tray workflow and runs a direct dictation session.
The tray menu shows all input devices returned by sounddevice.query_devices().
Behavior:
- your selected device index is saved in
config.json - if the saved device disappears, the tray app falls back to the current default input device
- if no working input device exists, dictation will not start
Current config.json format:
{"input_device": 2}Before typing, the worker normalizes the transcript.
Current cleanup includes:
- removing bracketed and parenthesized noise like
[noise]or(music) - trimming leading whitespace, including weird Unicode whitespace
- collapsing repeated whitespace to a single space
- normalizing ellipses
- inserting missing spaces after sentence punctuation in cases like:
Testing.How are we doing today?One, two, three.- becomes
Testing. How are we doing today? One, two, three.
- preserving common numeric punctuation like decimals such as
3.14
Important current behavior from the codebase:
- the worker is prewarmed at tray startup
- dictation start now reuses the loaded model instead of spawning a fresh one each time
- VAD block duration is
0.2s - tray worker polling interval is
100ms - debug logging is off by default
- enable debug logs with:
WL_DICTATE_DEBUG=1 python tray_app.pyLatency benchmark helper:
python utils/benchmark_latency.pyTyping depends on wtype, which needs a valid Wayland session environment.
The worker caches:
WAYLAND_DISPLAYXDG_RUNTIME_DIR
If they are missing, the worker tries to guess them from the runtime directory.
Check:
- the tray app is actually running
.dictation.sockexists in the project directory- your Hyprland bind points to the current repo path, not an old directory
Check your live binds:
hyprctl -j binds | grep toggle_dictation.pyIf the bind points at an old path, update it and reload Hyprland.
Check:
wtypeis installed- a text field is focused
- the worker has valid
WAYLAND_DISPLAYandXDG_RUNTIME_DIR
The worker is loading the model during prewarm. After that, repeated toggles should be much faster.
The tray app validates the saved device. If it is gone, it falls back to the default input device and updates config.json.
If CUDA is unavailable, the worker falls back to CPU/int8 mode.
Useful checks:
python -m py_compile tray_app.py whisper_dictate.py toggle_dictation.py hotkey_listener.py utils/benchmark_latency.py
python utils/benchmark_latency.pyUse the tray app for normal use.
Use the Hyprland Ctrl+Alt+F bind to toggle it.
Keep the tray app running so the warm worker stays loaded and dictation starts fast.
