This project is a complete end-to-end machine learning pipeline for predicting customer churn. It demonstrates how to transform raw customer data into actionable insights and deploy models that can predict whether a customer is likely to leave a company.
This notebook covers:
- Data Cleaning & Preprocessing
- Exploratory Data Analysis (EDA)
- Feature Engineering
- Training Multiple Machine Learning Models
- Evaluation with Accuracy, F1-Score, and ROC-AUC
- Insights & Recommendations
-
The dataset contains 7043 customers and 21 features, including demographics, subscription details, and payment information.
-
Key columns:
gender,SeniorCitizen,Partner,Dependentstenure,PhoneService,InternetServiceOnlineSecurity,DeviceProtection,TechSupportContract,PaperlessBilling,PaymentMethodMonthlyCharges,TotalChargesChurn(Target column)
-
Source: Publicly available customer churn dataset.
-
Missing values handled (
TotalChargescolumn). -
Categorical variables encoded using one-hot encoding.
-
Features and target separated:
- Features:
X(shape: 7043 x 30) - Target:
y(shape: 7043)
- Features:
-
Train-test split:
- Training set: 5634 samples
- Test set: 1409 samples
The project includes five different models for comparison:
| Model | Accuracy | F1-score | ROC-AUC |
|---|---|---|---|
| Logistic Regression | 0.8055 | 0.6029 | 0.8420 |
| Gradient Boosting | 0.7942 | 0.5672 | 0.8360 |
| XGBoost | 0.7821 | 0.5621 | 0.8166 |
| Random Forest | 0.7921 | 0.5594 | 0.8259 |
| Decision Tree | 0.7410 | 0.5020 | 0.6610 |
Best Model: Logistic Regression, based on overall accuracy and ROC-AUC.
The notebook contains clear and attractive charts:
- Count plots for categorical features
- Correlation heatmaps
- Distribution plots for numerical features
- Churn rate comparisons across categories
These visualizations help understand patterns in customer churn and provide actionable insights for business strategy.
The project uses the following Python libraries:
pandas
numpy
matplotlib
seaborn
plotly
scikit-learn
xgboost
lightgbm
notebook
IPython
-
Clone the repository:
git clone <your-repo-url> -
Navigate to the project folder:
cd customer-churn-prediction -
Install dependencies:
pip install -r requirements.txt -
Open the Jupyter Notebook and run cells sequentially:
jupyter notebook
-
Customers on month-to-month contracts are more likely to churn.
-
Offering online security and tech support reduces churn.
-
High monthly charges increase churn probability.
-
Businesses can use this model to target retention strategies for at-risk customers.
WAQAR ALI — Data Science Student
I am passionate about Machine Learning, Predictive Analytics, and Business-Oriented Data Solutions. This notebook demonstrates a complete end-to-end churn prediction system, designed with professional workflows and interpretable insights.
📊 Kaggle
💼 LinkedIn
🐙 GitHub
📘 Facebook
