The notebook is a complete hands-on lab for performing multi-class classification using a real-world dataset related to obesity levels. Here's a structured breakdown of its content:
To demonstrate how to implement multi-class classification strategies in Python using scikit-learn on a labeled dataset about obesity.
- File: https://www.kaggle.com/datasets/ezzaldeenesmail/obesitydataset-raw-and-data-sinthetic
- Loaded with:
pandas.read_csv() - Target column:
NObeyesdad(which represents obesity categories)
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pylab as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.multiclass import OneVsOneClassifier
from sklearn.metrics import accuracy_scoreThese are the main tools used for:
- Data processing
- Visualization
- Training and evaluating classification models
- Load the dataset into a DataFrame
- Display first few records (
data.head()) - Visualize target variable distribution:
sns.countplot(y='NObeyesdad', data=data)- Apply One-Hot Encoding for categorical variables
- Use StandardScaler to normalize numerical features
- Split data into training and testing sets
- Implements a Logistic Regression model using:
- One-vs-Rest (OvR) strategy
- One-vs-One (OvO) strategy
# Example:
model = LogisticRegression()
ovo = OneVsOneClassifier(model)
ovo.fit(X_train, y_train)- Evaluate model performance using accuracy_score
- Compare results from OvO and OvR classifiers
- Analyzes which strategy works better for the dataset
- Shows final accuracy and insights
Would you like a copy of the dataset used, a visual chart of the workflow, or a restructured version of the notebook for learning purposes?