Skip to content

vyompatel77/Heart-Disease-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Heart-Disease-Classification

This Google Collab notebook is designed to predict the presence of Cardiovascular Disease using the patient examination results on Kaggle 70000 Records Cardio disease dataset. The dataset is split on the basis of gender and K-Mode Clustering is applied to further divide the dataset into 4 clusters. This resulted in an increase of almost 10% accuracy compared to previous models.

The algorithm used for this project includes XGBoost, RandomForest, and Multi-Layer Perceptron. The highest accuracy achieved was 87.28% with the Multi-Layer Perceptron model.

To run this notebook, simply open the notebook in Google Collab and follow the instructions. Make sure to upload the dataset to the notebook before running it.

FEATURES

• Gender-based dataset splitting: The dataset is split on the basis of gender for better accuracy.

• K-Mode Clustering: K-Mode Clustering is applied to further divide the dataset into 4 clusters, which improves accuracy.

• Multiple algorithms: The project uses XGBoost, RandomForest, and Multi-Layer Perceptron algorithms for classification.

• High accuracy: The highest accuracy achieved was 87.28% with the Multi-Layer Perceptron model.

We hope that this project helps in the early detection and prevention of Cardiovascular Disease.

Notebook contains:

  • Exploratory data analysis (EDA) - the process of going through a dataset and finding out more about it.
  • Model training - create model(s) to learn to predict a target variable based on other variables.
  • Model evaluation - evaluating a models predictions using problem-specific evaluation metrics.
  • Model comparison - comparing several different models to find the best one.
  • Model fine-tuning - once we've found a good model, how can we improve it?
  • Feature importance - since we're predicting the presence of heart disease, are there some things which are more important for prediction?
  • Cross-validation - if we do build a good model, can we be sure it will work on unseen data?
  • Reporting what we've found - if we had to present our work, what would we show someone?

To work through these topics, we'll use pandas, Matplotlib and NumPy for data anaylsis, as well as, Scikit-Learn for machine learning and modelling tasks.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors