Lightweight CLI for transcribing audio files using OpenAI Whisper on OS X. It automatically sets up a Python virtual environment, installs Whisper, and generates .txt transcripts.
whisperer/
├── app.py # Main script
├── media/ # Audio files
└── venv/ # Auto-created virtual environment
- Lists audio files in
media/and lets you choose - Download audio from URLs (direct audio links or YouTube videos)
- VLC integration - Adds audio to existing VLC playlist or launches new instance (if installed)
- Transcribes using Whisper and saves as
.txt - Skips files that are already transcribed
- Automatically creates a Python venv
- Installs Whisper and yt-dlp if not present
- Default language: French (
fr)
- Python 3.11+ (built and tested with Python 3.13)
ffmpeginstalled and available in your system path- VLC (optional) - For automatic audio playback
-
Install Homebrew
-
Run:
brew install [email protected] ffmpeg
-
Verify:
python3 --version ffmpeg -version
-
(Optional) Install VLC for audio playback:
brew install --cask vlc
The app automatically manages its Python dependencies via requirements.txt:
- PyTorch for CPU inference
- OpenAI Whisper for transcription
- yt-dlp for YouTube downloads
- Additional dependencies (numpy, tiktoken, etc.)
All dependencies are installed automatically when you first run the app.
-
Place your audio files (
.mp3,.wav,.m4a,.flac, etc.) in themedia/folder, or use the URL download feature. -
From terminal:
cd whisperer ./whisperer
On first run, the script will:
- Create
venv/ - Install Whisper and yt-dlp
- Show menu with options to download from URL or select existing files
- Generate a
.txttranscript
By default, transcripts are automatically formatted to improve readability by joining sentence fragments and removing excessive line breaks.
If you want to preserve the original Whisper output without any formatting, you can manually create a .noformat.txt file:
- After transcription, rename the generated
.txtfile to.noformat.txt - The app will detect this file and display the transcript without any formatting changes
- This is useful for song lyrics or other content where you want to preserve the original line structure
The app automatically detects and separates different speakers in conversations:
- Speaker detection is enabled by default for all transcriptions
- Each speaker's content appears on a new line
- No blank lines or speaker labels - just clean line separation
Example:
media/
├── conversation.mp3
├── conversation.txt # Speaker-separated transcript
└── conversation.noformat.txt # Unformatted transcript (manually created)
Speaker-separated output example:
l'acétamipride
pour trois ans,
ils voyaient un peu d'espoir
par rapport à leurs concurrents
File Priority:
.noformat.txt(if manually created).txt(speaker-separated transcript)
Speaker Detection Logic:
- Detects speaker changes based on line breaks, questions, and common phrases
- Works best with clear conversation patterns
- For more accurate detection, consider using dedicated speaker diarization tools
The app supports downloading audio from URLs:
- Supports direct links to audio files (
.mp3,.wav,.m4a,.flac, etc.) - Downloads the file directly to the
media/folder - Automatically generates filenames if none are provided
- Supports YouTube URLs (youtube.com, youtu.be, etc.)
- Uses yt-dlp to extract audio tracks
- Converts to MP3 format for optimal compatibility
- Downloads to the
media/folder with timestamped filenames
- Run the app:
./whisperer - Select option "1. Download audio from URL"
- Enter the URL (YouTube or direct audio link)
- The file will be downloaded, automatically added to VLC playlist (or launched if not running), and transcribed
- Run the app:
./whisperer - Select option "2. Change language (currently 'fr')"
- Choose from the available languages:
fr(French) - defaulten(English)it(Italian)de(German)
- Your selection is automatically saved to
settings.json(not tracked bygit)
Your language choice is saved in settings.json
Install via Homebrew:
brew install ffmpegThe app uses CPU for transcription, which provides reliable and consistent performance across all systems. While GPU acceleration would be faster, the current CPU implementation ensures maximum compatibility and stability.
Note: The warning about FP16/FP32 is normal and expected when using CPU.
The app now supports Python 3.11+ and has been tested with Python 3.13.
If YouTube downloads fail:
- Check your internet connection
- Verify the YouTube URL is valid and accessible
- Check the logs in
logs/whisperer.logfor detailed error messages - Some videos may be restricted or unavailable in your region
If direct audio downloads fail:
- Verify the URL is accessible and points to an audio file
- Check that the server allows direct downloads
- Ensure the file format is supported (
.mp3,.wav,.m4a,.flac)
# Run all tests
python tests/runners/tests.pyMIT
