"Because Auto-Tune is expensive and your bathroom acoustics only get you so far."
WakkaQt is a free, open-source karaoke recording and production studio built with Qt6. Load a karaoke video, grab a mic, sing your heart out, and walk away with a finished, mixed, pitch-corrected MP4 β complete with a webcam feed, vocal overlay, and a pitch indicator that will mercilessly show the world every flat note you tried to sneak past.
No subscriptions. No cloud. No judgment. (Well, maybe a little judgment from the pitch monitor.)
Current version: 2.1.3
Windows binaries β ready to run, no setup required: π https://gu.pro.br/WakkaQt
Linux users β you get to build it yourself, which is a feature, not a bug. See build instructions below.
Drop in any MP4, MKV, WebM, AVI, MOV, MP3, WAV, FLAC, or OPUS file. If Qt6 Multimedia can play it, WakkaQt will play it. You can also paste a YouTube URL and download the video directly from inside the app (powered by yt-dlp).
Select your microphone from a list of all detected devices. Hit π€ SING. Optionally capture your webcam at the same time β for those who want to remember exactly what they looked like belting out Bohemian Rhapsody at 2 AM.
Before rendering, the vocal track runs through a full DSP pipeline:
- Pitch correction β phase-vocoder pitch shifting with adjustable strength (0 = raw humanity, 100 = robot perfection)
- Scale-aware snapping β snap pitch to Major, Minor, Pentatonic, Blues, Dorian, Mixolydian, Lydian, Phrygian, Locrian, Harmonic Minor, Melodic Minor, Whole Tone, Diminished, or plain Chromatic β in any of the 12 keys
- Retune speed β 0 ms for that T-Pain effect, up to 300 ms for a natural glide
- Formant preservation β LPC-based envelope re-synthesis keeps your voice sounding human even after aggressive pitch shifting
- Noise reduction β spectral subtraction gate with adaptive noise floor estimation (goodbye, fan noise)
- Reverb β Freeverb-style Schroeder reverb with room size, decay, and wet/dry controls
- Dynamics β compressor, limiter, and harmonic exciter for a polished, loud-enough final mix
All FFTW plans are created once and reused for the entire recording β no plan allocation mid-session, no glitches.
A full-featured preview dialog lets you hear the processed vocal, adjust every enhancement parameter in real time, nudge the audio/video sync offset, and preview again β as many times as you need before committing to a render.
Output: a 1920Γ1080 MP4 with the karaoke video on top and your webcam below. The vocal is mixed in with all enhancements applied. A pitch indicator strip is burned into the webcam frame β green when you're in tune, yellow when you're drifting, red when you'reβ¦ having a moment.
Native FFmpeg integration renders entirely in-process with a real-time progress bar. Falls back gracefully to spawning ffmpeg via a subprocess if the dev libraries weren't present at build time.
Load any song and click π΅ Backing Track. WakkaQt downloads the UVR-MDX-NET-Inst_HQ_3 ONNX vocal separation model (~80 MB, once) and runs it locally on your machine β no internet required after the first download, no cloud service, no privacy leak, no subscription.
The model separates vocals from the instrumental using MDX-Net deep learning, processed through a full STFT/iSTFT pipeline with FFTW3. The result is exported as WAV or MP3. Perfect for turning any song into a backing track for your next performance.
Every recording is saved to ~/.WakkaQt/library/ with a UUID folder, all source files, and JSON metadata. The library dialog lists everything with timestamps. You can rename, delete, or re-render any session β with updated enhancement settings β at any time.
| Feature | Status |
|---|---|
| MP4/MKV/WebM/MP3/WAV/FLAC playback | β |
| Microphone recording (selectable device) | β |
| Webcam recording | β |
| Real-time pitch monitor (YIN, always visible) | β |
| Real-time waveform visualizer | β |
| Pitch correction (phase vocoder) | β |
| Scale/key-aware pitch snapping | β |
| Formant preservation (LPC) | β |
| Noise reduction (spectral subtraction) | β |
| Reverb (Freeverb/Schroeder) | β |
| Compressor + limiter + harmonic exciter | β |
| Preview dialog with live tweak | β |
| Native FFmpeg rendering (in-process) | β |
| Pitch overlay on rendered video | β |
| Session library (save/rename/delete/re-render) | β |
| YouTube download (via yt-dlp) | β |
| AI vocal separation β backing track (ONNX) | β |
| Cross-platform (Linux / Windows) | β |
| Subscription required | β |
| Phone home to a server | β |
| Judgment about your singing | mostly β |
- Generate Backing Track β new feature powered by the UVR-MDX-NET-Inst_HQ_3 ONNX model. Downloads the model on first use (~80 MB), then separates vocals from any loaded track entirely offline. Output can be saved as WAV or MP3. The button appears automatically when a file is loaded and disappears when it isn't needed
- Save to existing file fixed β overwriting an already-existing WAV with the backing track output no longer fails silently and loses the processed audio. The destination is removed first; on copy failure the temp file is preserved and its path is shown in the error dialog
- Playback stopped before separation β WakkaQt now stops playback before starting the ONNX separation process to free audio and CPU resources for the model run
- CMake: ONNX Runtime optional dependency β ONNX Runtime is detected at configure time; if absent, the backing-track feature is disabled gracefully with a build warning and everything else continues to work
- Pitch overlay on rendered video β note name (e.g. "C#4"), cents deviation bar, and "+/βNc" value burned into the bottom of the webcam track. Color-coded: green < 10 Β’, yellow < 30 Β’, red β₯ 30 Β’
- PitchMonitorWidget always visible β pitch display is shown at all times, not just during recording
- Library: multi-select deletion β Ctrl+click or Shift+click to select multiple sessions for batch delete
- Native sample rate preservation β recorded audio is processed at the microphone's native sample rate (e.g. 48 kHz); no forced downsampling to 44100 Hz
- Library: tuned.wav not saved β the intermediate tuned vocal is regenerated from the raw audio on every re-render, so the preview dialog always reflects current settings
- A/V sync: fixed video desync with manual offset β both audio and video offsets are now set to
manualOffsetdirectly; previously a formula involving system latency caused the video to lag or lead by hundreds of milliseconds
- VocalEnhancer: fixed progressive robotic distortion β compressor and harmonic exciter were running inside the pitch-correction loop; now applied exactly once after all pitch work
- VocalEnhancer: fixed stale pitch state at section boundaries β pitch state is cleared after ~400 ms of silence, eliminating wrong shift ratios at phrase starts
- VocalEnhancer: octave detection guard β 2Γ / 0.5Γ detection errors corrected when pitch confidence is low
- A/V sync: fixed negative manual-offset case β webcam pre-roll footage is correctly seeked past in both directions
- Rendering: UI fully locked during FFmpeg β all controls disabled for the entire render regardless of code path
- FFmpeg native: keyframe alignment correction β PTS of the first decoded frame is used to shift all subsequent video frames into exact alignment after a seek
Install the required development packages (Debian/Ubuntu):
sudo apt install \
build-essential cmake ninja-build \
qt6-base-dev qt6-multimedia-dev \
libqt6multimedia6 libqt6multimediawidgets6 \
libfftw3-dev \
libavformat-dev libavcodec-dev libavfilter-dev \
libavutil-dev libswresample-dev libswscale-dev \
libglib2.0-dev \
pkg-configFor the AI backing-track feature, also install the ONNX Runtime development package.
Debian/Ubuntu:
sudo apt install libonnxruntime-devFedora / RHEL (not in standard repos β install from the official release):
ORT_VERSION=1.20.1
wget https://github.com/microsoft/onnxruntime/releases/download/v${ORT_VERSION}/onnxruntime-linux-x64-${ORT_VERSION}.tgz
tar -xzf onnxruntime-linux-x64-${ORT_VERSION}.tgz
sudo cp -r onnxruntime-linux-x64-${ORT_VERSION}/include/onnxruntime /usr/local/include/
sudo cp onnxruntime-linux-x64-${ORT_VERSION}/lib/libonnxruntime.so* /usr/local/lib/
sudo ldconfigWindows (MinGW / MSVC):
- Download
onnxruntime-win-x64-*.zipfrom the ONNX Runtime releases page. - Extract and copy the contents to
C:\Program Files (x86)\onnxruntime\. - Inside that folder, create the subfolder
include\onnxruntime\and move all header files frominclude\into it. The final structure must be:C:\Program Files (x86)\onnxruntime\ include\ onnxruntime\ onnxruntime_cxx_api.h onnxruntime_c_api.h ... (all other .h files) lib\ onnxruntime.lib onnxruntime.dll - Copy
onnxruntime.dllnext toWakkaQt.exein your build/install folder β Windows needs the DLL at runtime.
CMake will find the library automatically in Program Files (x86)\onnxruntime during configure.
ONNX Runtime is entirely optional β if it is not found at configure time, WakkaQt builds and runs normally without it and the backing-track button simply won't appear.
On Fedora/RHEL-based systems:
sudo dnf install \
cmake ninja-build gcc-c++ \
qt6-qtbase-devel qt6-qtmultimedia-devel \
fftw-devel \
ffmpeg-free-devel \
glib2-devel \
pkgconfYou also need ffmpeg and yt-dlp installed as runtime tools:
sudo apt install ffmpeg yt-dlp # Debian/Ubuntu
sudo dnf install ffmpeg yt-dlp # Fedoragit clone https://github.com/guprobr/WakkaQt.git
cd WakkaQt
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallelDebug build:
cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build --parallel./build/WakkaQtsudo cmake --install buildInstalls to /usr/bin/WakkaQt, with an icon at /usr/share/icons/hicolor/256x256/apps/WakkaQt.png and a .desktop launcher in /usr/share/applications/.
| Tool | Purpose |
|---|---|
ffmpeg |
Render fallback when FFmpeg dev libs were absent at build time |
yt-dlp |
In-app video download from YouTube and other sites |
Both must be on $PATH at runtime. The ONNX model (~80 MB) is downloaded automatically on first use of the backing-track feature and cached in ~/.WakkaQt/models/.
See LICENSE for details.