wl-dictate

Low-latency voice dictation for Linux Wayland/X11.

It runs as a PyQt5 tray app, keeps a faster-whisper worker warm in the background, listens to your microphone with sounddevice, and types the cleaned transcript into the currently focused window with wtype.

Example tray appearance on Hyprland. tray_app.py The tray icon/menu (the literal microphone, fourth from the right) is the main control surface for starting dictation, switching microphones, and quitting the app.

What it does

starts from a system tray icon
prewarms the speech model once so toggling dictation is fast after launch
records audio from the selected microphone
detects speech with a lightweight VAD loop
transcribes with faster-whisper
cleans up noisy transcript formatting before typing
types into the focused app with wtype
exposes a local Unix socket so external keybinds can toggle dictation
repairs/installs a Hyprland Ctrl+Alt+F runtime bind automatically when possible

Project files

tray_app.py — PyQt5 tray app, worker lifecycle, device menu, socket listener, Hyprland bind repair
whisper_dictate.py — warm worker, audio capture, VAD, transcription, transcript cleanup, typing
toggle_dictation.py — tiny CLI that sends toggle over the Unix socket
hotkey_listener.py — legacy raw-evdev listener for Ctrl+Alt+F when running with elevated input permissions
mic-on.png / mic-off.png — tray icons
example.jpg — screenshot of the tray on Hyprland
utils/benchmark_latency.py — benchmark helper for worker boot/start/stop timing
docs/performance-audit-20260408-v1.md — current performance audit and optimization plan

Runtime architecture

There are two long-lived processes in the normal tray workflow:

tray_app.py
- creates the tray icon and context menu
- loads/saves config.json
- listens on .dictation.sock
- prewarms the worker at startup
- sends start, stop, and quit commands to the worker
whisper_dictate.py --controlled
- loads faster-whisper once
- waits for commands on stdin
- starts/stops audio sessions without reloading the model
- writes state lines like Worker ready, Listening..., and Session stopped

That warm-worker design is what removes the big per-toggle startup penalty.

Requirements

System packages you need:

wtype
portaudio / libportaudio2
Python 3

Python packages used by the code:

PyQt5
sounddevice
numpy
faster-whisper
evdev (legacy raw hotkey path)
ctranslate2 via faster-whisper

Run the tray app

Use the project venv if you have one:

python tray_app.py

Or from the repo directory:

python tray_app.py

When the tray app starts it:

opens the tray icon
creates the local control socket
prewarms the worker
starts showing mic-off.png
allows left-click toggle and right-click menu actions

Hyprland setup

The intended Wayland path is a compositor bind that triggers toggle_dictation.py.

Hyprland example:

exec = python /home/pry/wl-dictate/tray_app.py
bind = CTRL ALT, f, exec, python /home/pry/wl-dictate/toggle_dictation.py

Notes:

the tray app should be launched once per session
the bind talks to the tray app over the Unix socket
the tray app also tries to repair/install the runtime Hyprland bind automatically if there is no conflicting bind already using Ctrl+Alt+F

Manual toggle command

If you just want to trigger dictation from a shell or another launcher:

python toggle_dictation.py

That command does not transcribe anything by itself. It only asks the running tray app to toggle state.

Legacy raw hotkey listener

hotkey_listener.py still exists, but it reads raw /dev/input/event* devices. That means it is not the normal Hyprland path.

Run it only if you specifically want the raw-input mode and have the required device permissions:

sudo python hotkey_listener.py

CLI worker mode

You can still run the transcription worker directly:

python whisper_dictate.py
python whisper_dictate.py 0
python whisper_dictate.py "USB Microphone"

That bypasses the tray workflow and runs a direct dictation session.

Microphone selection

The tray menu shows all input devices returned by sounddevice.query_devices().

Behavior:

your selected device index is saved in config.json
if the saved device disappears, the tray app falls back to the current default input device
if no working input device exists, dictation will not start

Current config.json format:

{"input_device": 2}

Transcript cleanup behavior

Before typing, the worker normalizes the transcript.

Current cleanup includes:

removing bracketed and parenthesized noise like [noise] or (music)
trimming leading whitespace, including weird Unicode whitespace
collapsing repeated whitespace to a single space
normalizing ellipses
inserting missing spaces after sentence punctuation in cases like:
- Testing.How are we doing today?One, two, three.
- becomes Testing. How are we doing today? One, two, three.
preserving common numeric punctuation like decimals such as 3.14

Performance notes

Important current behavior from the codebase:

the worker is prewarmed at tray startup
dictation start now reuses the loaded model instead of spawning a fresh one each time
VAD block duration is 0.2s
tray worker polling interval is 100ms
debug logging is off by default
enable debug logs with:

WL_DICTATE_DEBUG=1 python tray_app.py

Latency benchmark helper:

python utils/benchmark_latency.py

Wayland typing notes

Typing depends on wtype, which needs a valid Wayland session environment.

The worker caches:

WAYLAND_DISPLAY
XDG_RUNTIME_DIR

If they are missing, the worker tries to guess them from the runtime directory.

Troubleshooting

Tray app starts but toggle does nothing

Check:

the tray app is actually running
.dictation.sock exists in the project directory
your Hyprland bind points to the current repo path, not an old directory

`Ctrl+Alt+F` does nothing on Hyprland

Check your live binds:

hyprctl -j binds | grep toggle_dictation.py

If the bind points at an old path, update it and reload Hyprland.

Dictation starts but nothing gets typed

Check:

wtype is installed
a text field is focused
the worker has valid WAYLAND_DISPLAY and XDG_RUNTIME_DIR

Dictation feels slow the first time after launching the tray

The worker is loading the model during prewarm. After that, repeated toggles should be much faster.

Saved microphone broke after unplugging hardware

The tray app validates the saved device. If it is gone, it falls back to the default input device and updates config.json.

CUDA fails

If CUDA is unavailable, the worker falls back to CPU/int8 mode.

Development notes

Useful checks:

python -m py_compile tray_app.py whisper_dictate.py toggle_dictation.py hotkey_listener.py utils/benchmark_latency.py
python utils/benchmark_latency.py

Summary

Use the tray app for normal use. Use the Hyprland Ctrl+Alt+F bind to toggle it. Keep the tray app running so the warm worker stays loaded and dictation starts fast.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wl-dictate

What it does

Project files

Runtime architecture

Requirements

Run the tray app

Hyprland setup

Manual toggle command

Legacy raw hotkey listener

CLI worker mode

Microphone selection

Transcript cleanup behavior

Performance notes

Wayland typing notes

Troubleshooting

Tray app starts but toggle does nothing

`Ctrl+Alt+F` does nothing on Hyprland

Dictation starts but nothing gets typed

Dictation feels slow the first time after launching the tray

Saved microphone broke after unplugging hardware

CUDA fails

Development notes

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
docs		docs
utils		utils
.gitignore		.gitignore
AGENTS.md		AGENTS.md
AUDIT_REPORT.md		AUDIT_REPORT.md
README.md		README.md
build.sh		build.sh
config.json		config.json
example.jpg		example.jpg
hotkey_listener.py		hotkey_listener.py
mic-off.png		mic-off.png
mic-on.png		mic-on.png
pyproject.toml		pyproject.toml
toggle_dictation.py		toggle_dictation.py
tray_app.py		tray_app.py
whisper_dictate.py		whisper_dictate.py
wl_dictate.py		wl_dictate.py

Folders and files

Latest commit

History

Repository files navigation

wl-dictate

What it does

Project files

Runtime architecture

Requirements

Run the tray app

Hyprland setup

Manual toggle command

Legacy raw hotkey listener

CLI worker mode

Microphone selection

Transcript cleanup behavior

Performance notes

Wayland typing notes

Troubleshooting

Tray app starts but toggle does nothing

Ctrl+Alt+F does nothing on Hyprland

Dictation starts but nothing gets typed

Dictation feels slow the first time after launching the tray

Saved microphone broke after unplugging hardware

CUDA fails

Development notes

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Ctrl+Alt+F` does nothing on Hyprland

Packages