These are good for learning the basic libraries used for audio analysis, and for getting a feel for the data.
- Audio Processing in Python - C. MacLeod
- Understanding the Mal Spectrum - L. Roberts
- Getting to know the Mel Spectrogram - D. Gartzman
- The Dummy's guide to MFCC - P. Nair (Medium)
I asked ChatGPT how one should build a project around exploring different music classifiers. Here's what it said:
-
Data Collection:
- Gather a diverse dataset of audio samples representing different music genres. Websites like GTZAN, Free Music Archive, or Kaggle provide such datasets.
-
Feature Extraction:
- Extract MFCC features from the audio samples. Libraries like Librosa (Python) or MIRtoolbox (MATLAB) can help with this. Store stage output locally.
-
Exploratory Data Analysis (EDA):
- Visualize the data to understand the distribution of MFCC features across genres.
-
Data Preprocessing:
- Normalize the MFCC features to ensure all features contribute equally.
- Split the data into training and testing sets.
- Dimensionality Reduction:
- Apply techniques like Principal Component Analysis (PCA) to reduce the dimensionality of MFCCs if the feature space is large.
-
Traditional Machine Learning Models:
- Random Forest:
- Random Forest classifiers work well for this task due to their ability to handle high-dimensional data.
- Support Vector Machine (SVM):
- SVM classifiers are effective for classification tasks and can handle complex relationships in the data.
- k-Nearest Neighbors (kNN):
- kNN classifiers are simple yet powerful for this scenario, especially for small to medium-sized datasets.
- Random Forest:
-
Deep Learning Models (Optional):
- Multilayer Perceptron (MLP):
- MLPs are suitable for this task since they can handle high-dimensional data and learn complex relationships.
- Convolutional Neural Networks (CNNs):
- CNNs are suitable for analyzing spectrogram-like data. You can convert MFCCs into images and use CNN architectures.
- Recurrent Neural Networks (RNNs):
- RNNs can capture sequential patterns in music. You might consider transforming MFCCs into sequences and use RNN architectures.
- Multilayer Perceptron (MLP):
-
Training:
- Train each selected model on the training data.
-
Evaluation:
- Evaluate models using metrics like accuracy, precision, recall, and F1-score.
- Perform cross-validation to assess models' generalizability.
-
Ensemble Methods:
- Experiment with ensemble methods like Voting Classifier or Stacking Classifier to combine predictions from multiple models.
-
Pretrained models:
- Use pretrained models like VGGish or OpenL3 to extract features from audio samples and train classifiers on top of them.
-
Error Analysis:
- Analyze misclassified samples to understand patterns or common mistakes made by the models.
-
Documentation:
- Document the entire process, including data preprocessing, model selection, hyperparameters, and evaluation results.
-
Report:
- Prepare a report summarizing findings, challenges faced, and insights gained during the analysis.
Some notes on specific steps to take:
This EDA step provides insights into the duration variations across genres, which can be essential for selecting appropriate model architectures and sequence lengths during the model building process.
import os import librosa import numpy as np import matplotlib.pyplot as plt import seaborn as sns
dataset_path = "/path/to/your/gtzan_dataset"
def extract_mfcc(file_path, n_mfcc=13, hop_length=512, n_fft=2048): audio, sr = librosa.load(file_path, sr=None) mfccs = librosa.feature.mfcc(audio, sr=sr, n_mfcc=n_mfcc, hop_length=hop_length, n_fft=n_fft) return mfccs
mfccs_list = [] labels = []
for genre in os.listdir(dataset_path): genre_folder = os.path.join(dataset_path, genre) for file in os.listdir(genre_folder): file_path = os.path.join(genre_folder, file) mfccs = extract_mfcc(file_path) # Store MFCCs and genre label mfccs_list.append(mfccs) labels.append(genre)
mfccs_array = np.array(mfccs_list) labels_array = np.array(labels)
plt.figure(figsize=(12, 6)) sns.boxplot(x=labels_array, y=[mfcc.shape[1] for mfcc in mfccs_array]) plt.xlabel('Genre') plt.ylabel('Number of Frames (Time Steps)') plt.title('Distribution of Number of Frames for Each Genre') plt.xticks(rotation=45) plt.tight_layout() plt.show()
———————————————— Checking the distributions of MFCCs between genres is a valuable step in Exploratory Data Analysis (EDA). While individual audio samples within genres may exhibit considerable variation, aggregating the MFCCs over multiple samples within each genre can reveal meaningful patterns and differences. To compare the distributions of MFCCs between genres, you can create violin plots or box plots for each MFCC coefficient across different genres. These visualizations allow you to observe the central tendency, spread, and shape of the MFCC distributions within each genre.
import pandas as pd
mfcc_data = [] for i, mfcc in enumerate(mfccs_array): genre = labels_array[i] for j in range(mfcc.shape[0]): # Number of MFCC coefficients mfcc_data.append([genre, j, np.mean(mfcc[j]), np.std(mfcc[j])])
df = pd.DataFrame(mfcc_data, columns=['Genre', 'MFCC_Coefficient', 'Mean', 'Standard_Deviation'])
plt.figure(figsize=(12, 6)) sns.violinplot(x='MFCC_Coefficient', y='Mean', hue='Genre', data=df, split=True, inner='quart') plt.xlabel('MFCC Coefficient') plt.ylabel('Mean Value') plt.title('MFCC Distributions Across Genres') plt.tight_layout() plt.show()