- π΅ Overview
- π Key Features
- πΈ Research on Guitar Notes
- π₯οΈ How to Use the App
- ποΈ Project Structure
- π¦ Installation Guide
- π Technical Highlights
- π§ Troubleshooting
- π Future Enhancements
- π¨βπ» Team
- π License
This application provides musicians, producers, and enthusiasts with a powerful yet intuitive interface for audio processing. Using advanced models like Demucs, Basic Pitch, and Whisper, the app offers:
- Audio Separation: Extract instrumental and vocal stems.
- MIDI Conversion: Convert instrumental audio to editable MIDI files.
- MIDI Modification: Customize MIDI files using AI-powered prompts.
- Lyrics Extraction and Translation: Extract lyrics from songs and translate them into multiple languages.
- Karaoke Video Generation: Create karaoke videos with synchronized lyrics and a custom background.
- Description: Extract individual components (e.g., vocals, bass, drums).
- Model Used: Demucs.
- Use Case: Isolate instrumental tracks for practice or remixing.
- Description: Convert audio into MIDI format for further editing.
- Model Used: Basic Pitch.
- Use Case: Generate sheet music or integrate into DAWs.
- Description: Apply transformations like changing the scale or style using AI-generated prompts.
- Model Used: Google Gemini API.
- Use Case: Create unique renditions of existing tracks.
- Description: Extract and translate lyrics from vocal tracks.
- Model Used: Whisper.
- Use Case: Understand or re-purpose song lyrics.
- Description: Create karaoke videos with synchronized lyrics and custom backgrounds.
- Tools Used: FFmpeg for video processing.
- Use Case: Host karaoke sessions or share lyric videos online.
Before developing the comprehensive audio processing app, we conducted focused research on recognizing guitar notes using machine learning and signal processing. This foundational work guided our understanding of audio features and model capabilities.
-
Feature Extraction for Audio
- MFCC (Mel-Frequency Cepstral Coefficients): Captures the spectral envelope of audio signals.
- Mel Spectrogram: Provides a frequency-based visual representation.
- Chroma Features: Highlights harmonic pitch content.
- Spectral Contrast: Differentiates between peaks and valleys in the spectrum.
-
Model Training
- Used Convolutional Neural Networks (CNNs) with TensorFlow/Keras to classify guitar chords.
- Trained on diverse datasets of
.wav
files containing guitar notes at varying pitches, tones, and durations. - Achieved high validation accuracy: 98.5%.
-
Pitch Estimation and Signal Processing
- Applied FFT (Fast Fourier Transform) and CQT (Constant-Q Transform) for frequency analysis.
- Estimated fundamental frequencies and converted them into MIDI notes.
- Segmented audio into smaller chunks for chord prediction.
-
Data Augmentation
- Applied techniques such as white noise addition, time stretching, and pitch shifting to improve model robustness.
-
Outputs
- Visualized predictions with CQT and FFT to validate chord recognition accuracy.
- Generated MIDI files for the predicted notes.
- Created music21 streams and MIDI files for playback and analysis.
- Example Output:
sweet_child_music21_with_chords.mid
.
-
Interactive UI for Fine-Tuning
- Implemented sliders to adjust CQT parameters for better flexibility and analysis.
-
Feature Extraction Outputs:
-
Model Training Evaluation:
-
Audio Processing Visuals:
This research proved instrumental in identifying the strengths and limitations of CNNs for specific instruments. It informed our decision to later leverage pre-trained models like Demucs and Basic Pitch for broader functionality.
- Run
python app.py
on the root directory. Open gradio on the browswer - Go to the "Audio Separation" tab.
- Upload an audio file.
- Customize parameters (e.g., model version, bitrate).
- Click Separate Audio and download the stems.
- Switch to the "Audio to MIDI" tab.
- Upload an instrumental audio file.
- Adjust MIDI generation settings (e.g., note threshold).
- Click Convert to MIDI to generate and download the file.
- Select the "Modify MIDI" tab.
- Upload a MIDI file.
- Enter a text prompt (e.g., "Change to jazz style").
- Click Modify MIDI to apply changes.
- Go to the "Lyrics Extraction" tab.
- Upload a vocal stem.
- Click Extract Lyrics to display text.
- Input a language code for translation (e.g.,
en
,es
,fr
) and click Translate.
- Upload instrumental and vocal stems.
- Use Whisper to synchronize lyrics.
- Customize lyrics and background image.
- Generate a karaoke video using FFmpeg.
audio_processing_app/
βββ output_stems/ # Processed audio stems
βββ output_midi/ # Generated MIDI files
βββ karaoke_videos/ # Karaoke video outputs
βββ notebooks/ # Development notebooks
βββ utilities/ # Helper scripts
β βββ separate_audio.py
β βββ audio_to_midi.py
β βββ modify_midi.py
β βββ lyrics_processing.py
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
Before installing the required dependencies, make sure to install the correct versions of PyTorch, torchvision, and torchaudio based on your system and CUDA version. Follow the instructions below:
-
Clone the Repository
git clone https://github.com/Corey-Holton/Group_3_Project.git cd Group_3_Project
-
Set Up Conda Environment
conda create -n audio_processing python=3.10 -y conda activate audio_processing
-
Install Dependencies
- First install
torch
,torchvision
,torchaudio
-
For GPU Users:
- Install the appropriate CUDA toolkit.
- Use the PyTorch installation guide to install the correct versions:
Replace
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cuXX
cuXX
with your specific CUDA version (e.g.,cu118
for CUDA 11.8).
-
For CPU Users:
- Install the CPU versions of PyTorch, torchvision, and torchaudio:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
- Install the CPU versions of PyTorch, torchvision, and torchaudio:
-
For macOS Users:
- Use the CPU-only version of PyTorch as macOS does not support CUDA:
pip install torch torchvision torchaudio
- Use the CPU-only version of PyTorch as macOS does not support CUDA:
-
- Second install
requirements.txt
pip install -r requirements.txt
- First install
-
Run the Application
python app.py
- Demucs: Audio stem separation.
- Basic Pitch: Audio-to-MIDI conversion.
- Whisper: Lyrics extraction and translation.
- Google Gemini API: AI-based MIDI modification.
- MFCC: Mel-Frequency Cepstral Coefficients.
- Chroma Features: Harmonic pitch representation.
- Spectral Contrast: Timbre differentiation.
- Mel Spectrogram: Frequency-based signal representation.
- White noise addition.
- Time stretching/shifting.
- Pitch shifting.
- Dependency Issues: Ensure all libraries in
requirements.txt
are installed. - Missing Outputs: Verify write permissions for
output_stems/
andoutput_midi/
. - Model Compatibility: Use Python 3.10+ for TensorFlow compatibility.
- Expand model compatibility for non-guitar instruments.
- Real-time audio processing.
- Cloud storage integration for outputs.
- Enhanced lyrics editing features.
- Corey Holton
- Christian Palacios
- Edwin Lovera
- Montre Davis
This project is licensed under the MIT License.