🐧 Penguin Classification using KNN and Naive Bayes

This project explores the classification of penguin species by evaluating two popular machine learning algorithms: K-Nearest Neighbours (KNN) and Naive Bayes. The analysis aims to determine which algorithm performs best for classification based on different metrics. The notebook includes data loading, cleaning, and visualisation, followed by model training (with and without cross-validation), hyperparameter tuning, and comprehensive evaluation. The performance of the models is assessed using confusion matrices, precision, recall, F1-score, ROC curves, and Area Under the Curve (AUC). Additionally, the impact of Principal Component Analysis (PCA) on the performance of these models is investigated.

📜 Contents

This notebook covers the following steps:

Installation: Installing the necessary R packages. 📦
Data Setup: Loading and preparing the palmerpenguins dataset. 📁
Data Cleaning: Handling missing values and selecting relevant features. ✨
Data Visualisation: Exploring the distribution of key features. 📊
KNN without Cross-Validation: Training and evaluating a basic KNN model. 🤖
Naive Bayes without Cross-Validation: Training and evaluating a basic Naive Bayes model. 🤖
KNN with Cross-Validation: Implementing KNN with 10-fold cross-validation for improved hyperparameter tuning. 🔄
Naive Bayes with Cross-Validation: Implementing Naive Bayes with 10-fold cross-validation. 🔄
Principal Component Analysis (PCA): Investigating the effect of dimensionality reduction on model performance using KNN and Naive Bayes. 🔍
Evaluation: Comparing the performance of all models using confusion matrices, precision, recall, F1-score, ROC curves, and AUC. ✅

📊 Dataset

This project uses the palmerpenguins dataset. 🐧

Tech and Libraries

This project uses the R programming language and several libraries, including tidyverse, palmerpenguins, caret, class, scales, ggplot2, pROC, naivebayes, and e1071. 📚

Models

The following machine learning models are evaluated:

K-Nearest Neighbours (KNN)
Naive Bayes

🛠️ Installation

To run this notebook, you need to have R installed. The required packages can be installed directly from within the R environment using the code in the notebook. ⬇️

📈 Results

The notebook presents the evaluation metrics for each model, both with and without cross-validation and PCA. The confusion matrices, precision, recall, F1-scores, ROC curves, and AUC values provide insights into the performance of KNN and Naive Bayes for penguin species classification on this dataset. 🎯

🙏 Acknowledgements

This project utilises the palmerpenguins dataset, generously provided by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER. 🙌

How to Run the Notebook

To use this notebook:

Open the notebook in a compatible environment (like Google Colab with R kernel). 💻
Run the cells sequentially to follow the data analysis and model training process. ▶️
Examine the outputs and visualisations to understand the data and model performance. 👀

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
classification_evaluation_KNN_Naive_Bayes_with_palmerpenguins.ipynb		classification_evaluation_KNN_Naive_Bayes_with_palmerpenguins.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🐧 Penguin Classification using KNN and Naive Bayes

📜 Contents

📊 Dataset

Tech and Libraries

Models

🛠️ Installation

📈 Results

🙏 Acknowledgements

How to Run the Notebook

About

Uh oh!

Releases

Packages

Languages

lhandley1/Penguin-Classification-using-KNN-and-Naive-Bayes

Folders and files

Latest commit

History

Repository files navigation

🐧 Penguin Classification using KNN and Naive Bayes

📜 Contents

📊 Dataset

Tech and Libraries

Models

🛠️ Installation

📈 Results

🙏 Acknowledgements

How to Run the Notebook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages