Harmonize: AI-Powered Audio Processing Suite

📖 Table of Contents

🎵 Overview
🌟 Key Features
🎸 Research on Guitar Notes
🖥️ How to Use the App
🗂️ Project Structure
📦 Installation Guide
📊 Technical Highlights
🔧 Troubleshooting
🚀 Future Enhancements
👨‍💻 Team
📜 License

🎵 Overview

This application provides musicians, producers, and enthusiasts with a powerful yet intuitive interface for audio processing. Using advanced models like Demucs, Basic Pitch, and Whisper, the app offers:

Audio Separation: Extract instrumental and vocal stems.
MIDI Conversion: Convert instrumental audio to editable MIDI files.
MIDI Modification: Customize MIDI files using AI-powered prompts.
Lyrics Extraction and Translation: Extract lyrics from songs and translate them into multiple languages.
Karaoke Video Generation: Create karaoke videos with synchronized lyrics and a custom background.

🌟 Key Features

1. Audio Separation

Description: Extract individual components (e.g., vocals, bass, drums).
Model Used: Demucs.
Use Case: Isolate instrumental tracks for practice or remixing.

2. Audio to MIDI Conversion

Description: Convert audio into MIDI format for further editing.
Model Used: Basic Pitch.
Use Case: Generate sheet music or integrate into DAWs.

3. Modify MIDI Files

Description: Apply transformations like changing the scale or style using AI-generated prompts.
Model Used: Google Gemini API.
Use Case: Create unique renditions of existing tracks.

4. Lyrics Extraction and Translation

Description: Extract and translate lyrics from vocal tracks.
Model Used: Whisper.
Use Case: Understand or re-purpose song lyrics.

5. Karaoke Video Generation

Description: Create karaoke videos with synchronized lyrics and custom backgrounds.
Tools Used: FFmpeg for video processing.
Use Case: Host karaoke sessions or share lyric videos online.

🎸 Research on Guitar Notes

Before developing the comprehensive audio processing app, we conducted focused research on recognizing guitar notes using machine learning and signal processing. This foundational work guided our understanding of audio features and model capabilities.

Key Steps in the Research:

Feature Extraction for Audio
- MFCC (Mel-Frequency Cepstral Coefficients): Captures the spectral envelope of audio signals.
- Mel Spectrogram: Provides a frequency-based visual representation.
- Chroma Features: Highlights harmonic pitch content.
- Spectral Contrast: Differentiates between peaks and valleys in the spectrum.
Model Training
- Used Convolutional Neural Networks (CNNs) with TensorFlow/Keras to classify guitar chords.
- Trained on diverse datasets of .wav files containing guitar notes at varying pitches, tones, and durations.
- Achieved high validation accuracy: 98.5%.
Pitch Estimation and Signal Processing
- Applied FFT (Fast Fourier Transform) and CQT (Constant-Q Transform) for frequency analysis.
- Estimated fundamental frequencies and converted them into MIDI notes.
- Segmented audio into smaller chunks for chord prediction.
Data Augmentation
- Applied techniques such as white noise addition, time stretching, and pitch shifting to improve model robustness.
Outputs
- Visualized predictions with CQT and FFT to validate chord recognition accuracy.
- Generated MIDI files for the predicted notes.
- Created music21 streams and MIDI files for playback and analysis.
- Example Output: sweet_child_music21_with_chords.mid.
Interactive UI for Fine-Tuning
- Implemented sliders to adjust CQT parameters for better flexibility and analysis.

Visualizations and Results:

Feature Extraction Outputs:
- MFCC visualization.
- Mel Spectrogram comparison (before/after training).
- Chroma features with labels of pitch classes.
Model Training Evaluation:
- Validation loss and accuracy graphs over training epochs.
Audio Processing Visuals:
- Raw audio waveform.
- FFT and CQT plots.
- Predicted guitar notes.

Why This Research Matters:

This research proved instrumental in identifying the strengths and limitations of CNNs for specific instruments. It informed our decision to later leverage pre-trained models like Demucs and Basic Pitch for broader functionality.

🖥️ How to Use the App

Step-by-Step Instructions:

Audio Separation

Run python app.py on the root directory. Open gradio on the browswer
Go to the "Audio Separation" tab.
Upload an audio file.
Customize parameters (e.g., model version, bitrate).
Click Separate Audio and download the stems.

Audio to MIDI Conversion

Switch to the "Audio to MIDI" tab.
Upload an instrumental audio file.
Adjust MIDI generation settings (e.g., note threshold).
Click Convert to MIDI to generate and download the file.

Modify MIDI Files

Select the "Modify MIDI" tab.
Upload a MIDI file.
Enter a text prompt (e.g., "Change to jazz style").
Click Modify MIDI to apply changes.

Lyrics Extraction and Translation

Go to the "Lyrics Extraction" tab.
Upload a vocal stem.
Click Extract Lyrics to display text.
Input a language code for translation (e.g., en, es, fr) and click Translate.

Karaoke Video Generation

Upload instrumental and vocal stems.
Use Whisper to synchronize lyrics.
Customize lyrics and background image.
Generate a karaoke video using FFmpeg.

📂 Project Structure

audio_processing_app/
├── output_stems/        # Processed audio stems
├── output_midi/         # Generated MIDI files
├── karaoke_videos/      # Karaoke video outputs
├── notebooks/           # Development notebooks
├── utilities/           # Helper scripts
│   ├── separate_audio.py
│   ├── audio_to_midi.py
│   ├── modify_midi.py
│   ├── lyrics_processing.py
├── app.py               # Main Gradio application
├── requirements.txt     # Python dependencies

📦 Installation Guide

Pre-Installation Steps

Before installing the required dependencies, make sure to install the correct versions of PyTorch, torchvision, and torchaudio based on your system and CUDA version. Follow the instructions below:

Clone the Repository

git clone https://github.com/Corey-Holton/Group_3_Project.git
cd Group_3_Project

Set Up Conda Environment

conda create -n audio_processing python=3.10 -y
conda activate audio_processing

Install Dependencies
- First install torch, torchvision, torchaudio
  - For GPU Users:
    - Install the appropriate CUDA toolkit.
    - Use the PyTorch installation guide to install the correct versions:
      pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cuXX
      Replace cuXX with your specific CUDA version (e.g., cu118 for CUDA 11.8).
  - For CPU Users:
    - Install the CPU versions of PyTorch, torchvision, and torchaudio:
      pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
  - For macOS Users:
    - Use the CPU-only version of PyTorch as macOS does not support CUDA:
      pip install torch torchvision torchaudio
- Second install requirements.txt
```
pip install -r requirements.txt
```
Run the Application
```
python app.py
```

📊 Technical Highlights

Models and Techniques

Demucs: Audio stem separation.
Basic Pitch: Audio-to-MIDI conversion.
Whisper: Lyrics extraction and translation.
Google Gemini API: AI-based MIDI modification.

Key Audio Features:

MFCC: Mel-Frequency Cepstral Coefficients.
Chroma Features: Harmonic pitch representation.
Spectral Contrast: Timbre differentiation.
Mel Spectrogram: Frequency-based signal representation.

Data Augmentation:

White noise addition.
Time stretching/shifting.
Pitch shifting.

🔧 Troubleshooting

Dependency Issues: Ensure all libraries in requirements.txt are installed.
Missing Outputs: Verify write permissions for output_stems/ and output_midi/.
Model Compatibility: Use Python 3.10+ for TensorFlow compatibility.

🚀 Future Enhancements

Expand model compatibility for non-guitar instruments.
Real-time audio processing.
Cloud storage integration for outputs.
Enhanced lyrics editing features.

👩‍💻 Team

Corey Holton
Christian Palacios
Edwin Lovera
Montre Davis

📜 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
Resources		Resources
audio_processing		audio_processing
notebooks		notebooks
utilities		utilities
.gitignore		.gitignore
Group5_Project_3.pptx		Group5_Project_3.pptx
Guitar_CNN_Model.ipynb		Guitar_CNN_Model.ipynb
Guitar_trained_model_prediction.ipynb		Guitar_trained_model_prediction.ipynb
README.md		README.md
app.py		app.py
lstm_style_gen_model_midi_final_vick.ipynb		lstm_style_gen_model_midi_final_vick.ipynb
pretty_midi_manual_instrument_change_final_vick.ipynb		pretty_midi_manual_instrument_change_final_vick.ipynb
pretty_midi_random_instrument_change_final_vick.ipynb		pretty_midi_random_instrument_change_final_vick.ipynb
requirements.txt		requirements.txt

rune-encoder/Harmonize-AI

Folders and files

Latest commit

History

Repository files navigation