Machine-Learning-Loan-Default

Loan default prediction using machine learning models (logistic regression, random forest, gradient boosting, neural network, stacked classifier).

Dataset

The Excel file (data_loan_default.xlsx) contains a dataset of 8929 loans issued between January 2007 and September 2017. There are 20 independent variables and one dependent variable. The dataset is imbalanced with about 15% of defaulting loans.

Methodology

Five models are compared:

logistic regression,
random forest,
gradient boosting (XGBoost),
neural network,
stacked classifier (with the four above models as base models and a random forest as meta model).

First, the data is made ready for analysis with the following steps:

feature engineering,
dealing with missing values,
encoding categorical variables,
splitting the dataset,
feature scaling.

Then, for each model, hyperparameters are found with cross-validation (grid search or bayesian optimization methods are used) and the optimization metric is the macro recall. Class imbalance is mitigated by balancing the class weights. The models are fit and predictions are made on out-of-sample data. For each model we provide various performance metrics as well as an estimate of feature importance.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
data_loan_default.xlsx		data_loan_default.xlsx
ml_loan_default.ipynb		ml_loan_default.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine-Learning-Loan-Default

Dataset

Methodology

About

Releases

Packages

Languages

Robin-Guilliou/Machine-Learning-Loan-Default

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning-Loan-Default

Dataset

Methodology

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages