Para-Speak

Local speech-to-text CLI tool powered by NVIDIA Parakeet model. Minimal idle footprint, powerful customizable shortcuts, audio feedback, and extensible controller API for custom integrations.

Built in Rust for speed and minimal resource usage, it integrates Python ML models (Parakeet MLX) optimized for Apple Silicon.

Note: Para-speak is in its early stages and available on macOS only. Many decisions are still being made, and it will mature over time.

Quick Start

# 1. Set up environment and download model (first time only)
cargo run -p verify-cli

# 2. Run Para-speak:
./para-speak

# Note: Direct `cargo run -p para-speak-cli` requires PYO3_PYTHON env var

Features

Global keyboard shortcuts with advanced pattern support
Automatic text insertion at cursor position
Audio feedback for recording states
Spotify volume control during recording
Pause/resume recording

Configuration

All settings are configured via environment variables. Create a .env.local file or export them in your shell.

Important: Para-speak only listens for keyboard shortcuts - it doesn't consume them! The keypress events still pass through to your system, so choose shortcuts that won't conflict with your other applications.

Default Shortcuts

Para-speak comes with built-in default shortcuts:

Start recording: ControlLeft + ControlLeft (double tap)
Stop recording: ControlLeft
Cancel recording: Escape + Escape (double tap)
Pause/resume: No default shortcut

Note: Make sure double Control doesn't conflict with macOS dictation shortcut at Keyboard > Dictation > Shortcut

Custom Configuration

You can override the defaults using environment variables. Create a .env.local file in the root of the project directory:

# Keyboard shortcuts
PARA_START_KEYS="double(ControlLeft, 300); CommandLeft+ShiftLeft+KeyY"
PARA_STOP_KEYS="ControlLeft; CommandLeft+ShiftLeft+KeyY"
PARA_CANCEL_KEYS="double(Escape, 300)"
PARA_PAUSE_KEYS="CommandLeft+Alt+Shift+KeyU"

# Core functionality
PARA_PASTE=true                          # Auto-paste transcribed text at cursor

# Spotify integration
PARA_SPOTIFY_RECORDING_VOLUME=30         # Set Spotify to specific volume (0-100)
PARA_SPOTIFY_REDUCE_BY=50                # OR reduce volume by amount (0-100)

# Transcription behavior
PARA_TRANSCRIBE_ON_PAUSE=true            # Experimental: transcribe when pausing (not just on stop)

# Transcription post-processing
PARA_REPLACE="uh;Uh;um:;ui:UI"           # Replace/remove words in transcriptions

# Advanced
PARA_SHORTCUT_RESOLUTION_DELAY_MS=50     # Delay for resolving shortcut conflicts
PARA_MEMORY_MONITOR=true                 # Enable memory usage reporting

# Debugging
PARA_DEBUG=true                          # Enable debug mode with verbose output

All Configuration Options

Option	Environment Variable	Description	Default
`--paste`	`PARA_PASTE`	Automatically paste transcribed text at cursor	`false`
`--start-keys`	`PARA_START_KEYS`	Semicolon-separated list of key combinations to start recording	`double(ControlLeft, 300)`
`--stop-keys`	`PARA_STOP_KEYS`	Semicolon-separated list of key combinations to stop recording	`ControlLeft`
`--cancel-keys`	`PARA_CANCEL_KEYS`	Semicolon-separated list of key combinations to cancel recording	`double(Escape, 300)`
`--pause-keys`	`PARA_PAUSE_KEYS`	Semicolon-separated list of key combinations to pause recording	None
`--model`	`PARA_MODEL`	ML model to use for transcription	`mlx-community/parakeet-tdt-0.6b-v3`
`--force`	`PARA_FORCE`	Force using an unsupported model	`false`
`--spotify-recording-volume`	`PARA_SPOTIFY_RECORDING_VOLUME`	Set Spotify to specific volume (0-100) during recording	None
`--spotify-reduce-by`	`PARA_SPOTIFY_REDUCE_BY`	Reduce Spotify volume by amount (0-100) during recording	None
`--transcribe-on-pause`	`PARA_TRANSCRIBE_ON_PAUSE`	Transcribe when pausing (not just on stop)	`false`
`--realtime`	`PARA_REALTIME`	Experimental: Enable real-time transcription with streaming output during recording	`false`
`--replace`	`PARA_REPLACE`	Text replacements for transcription post-processing. Format: `"from:to"` for replacement, `"from:"` or `"from"` for removal. Separate multiple with semicolons. Example: `"uh;Uh;um:;ui:UI"`	None
`--shortcut-resolution-delay-ms`	`PARA_SHORTCUT_RESOLUTION_DELAY_MS`	Delay for resolving shortcut conflicts (ms)	`50`
`--debug`	`PARA_DEBUG`	Enable debug mode with verbose logging	`false`
`--memory-monitor`	`PARA_MEMORY_MONITOR`	Enable memory usage reporting	`false`

Model Configuration

You can specify which ML model to use via the PARA_MODEL environment variable:

PARA_MODEL="mlx-community/parakeet-tdt-1.1b" ./para-speak

To use an unsupported model, add the --force flag:

PARA_MODEL="custom/model" ./para-speak --force

Shortcut Syntax

The shortcut system supports complex patterns:

Single key: "F1" or "Escape" or "ControlLeft"
Combination: "Cmd+Shift+A" or "CommandLeft+ShiftLeft+KeyY" (all pressed together)
Double-tap: "double(ControlLeft, 300)" (double-tap within 300ms)

Multiple shortcuts can be assigned to each action using semicolons:

export PARA_START_KEYS="F1; double(ControlLeft, 300)"  # F1 OR double-tap control

Extensibility & Controllers

Para-speak uses a controller system that makes it easy to extend functionality. C

The Spotify controller is one example - it adjusts music volume during recording. The same pattern can be used to build any type of asynchronous integration, or trigger any automation after recording is transcribed.

Usage

Optionally set up custom environment variables (or use defaults)
Run the application:
```
./para-speak
```
Use your configured shortcut for recording start, stop, pause and resume.
Text appears at your cursor (if paste is enabled), copied to clipboard and printed to console (if debug is enabled)

Architecture

┌─────────────────┐         ┌──────────────────┐
│   Rust Core     │  PyO3   │  Python ML       │
├─────────────────┤◄───────►├──────────────────┤
│ • Audio capture │         │ • Parakeet MLX   │
│ • Shortcuts     │         │ • Model loading  │
│ • System APIs   │         │ • Transcription  │
│ • Components    │         │                  │
└─────────────────┘         └──────────────────┘

The Rust core handles all system integration and performance-critical paths, while Python handles ML inference using MLX framework optimized for Apple Silicon.

Platform Support

Para-speak is designed to be cross-platform with support for multiple models in future, though currently available for macOS only.

Required Permissions

Microphone: For audio capture
Accessibility: For global keyboard shortcuts to work system-wide

Verify CLI

Tool for managing ML models:

cargo run -p verify-cli - Download and verify ML models
cargo run -p verify-cli list - List downloaded models with sizes

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
crates		crates
python		python
.env		.env
.gitignore		.gitignore
APP_STAT.md		APP_STAT.md
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
para-speak		para-speak

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Para-Speak

Quick Start

Features

Configuration

Default Shortcuts

Custom Configuration

All Configuration Options

Model Configuration

Shortcut Syntax

Extensibility & Controllers

Usage

Architecture

Platform Support

Required Permissions

Verify CLI

License

About

Uh oh!

Releases

Packages

Languages

License

elv1n/para-speak

Folders and files

Latest commit

History

Repository files navigation

Para-Speak

Quick Start

Features

Configuration

Default Shortcuts

Custom Configuration

All Configuration Options

Model Configuration

Shortcut Syntax

Extensibility & Controllers

Usage

Architecture

Platform Support

Required Permissions

Verify CLI

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages