Skip to content

This project uses Hadoop and Power BI to analyze 600+ loan applications based on income, credit history, employment, and demographics. The project uncovers key approval factors such as credit history and marital status while highlighting disparities by gender and region. Interactive dashboards reveal patterns across loan amounts,applicant profiles.

Notifications You must be signed in to change notification settings

jigyasaG18/Loan-Data-Analysis-Using-Big-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💳 Loan Application & Financial Risk Analysis

This project provides a comprehensive big data–driven analysis of loan applications, approvals, and financial risk patterns. By combining the Hadoop ecosystem for data processing with Power BI for interactive visualization, the study evaluates how demographic, economic, and geographic factors influence loan approvals across 600+ applicants.


🛠️ Project Setup and Tools

Tools & Technologies Used

  • Cloudera Hadoop QuickStart VM: Hosting the Hadoop ecosystem and HDFS.
  • HDFS (Hadoop Distributed File System): For storing and managing structured loan application data.
  • Apache Hive: For SQL-like querying and schema design on the ingested dataset.
  • Hive LLAP: For seamless integration between Hive and Power BI.
  • Power BI Desktop: For visualizing approval patterns, demographic risk, and financial behaviors.

📊 Dataset Overview

  • Format: CSV
  • Size: ~10 MB
  • Records: 614 loan applications
  • Key Features:
    • Applicant & Co-applicant Income
    • Loan Amount & Term
    • Education, Gender, Marital Status
    • Credit History
    • Employment Type
    • Property Area
    • Loan Status (Approved / Rejected)

🔍 Dashboard Highlights & Key Insights

Dashboard 1: Loan Application & Approval Overview

  • Total Applications: 614
  • Approval Rate: 69% (422 approvals)
  • Rejection Rate: 31%
  • Average Loan Amount: $145.72K
  • Key Insights:
    • Strong correlation between credit history and approvals.
    • Male and married applicants had higher approval chances.
    • Semi-urban regions showed the highest approval volumes.
    • Credit history was the dominant approval factor.

Dashboard 2: Financial and Demographic Analysis

  • Applicants with 3+ dependents requested higher loan amounts.
  • Self-employed individuals faced lower approval odds despite similar incomes.
  • Semi-urban property areas led in both approval volume and loan sum.
  • Married applicants had better approval rates and loan amounts than single applicants.

🧠 Strategic Risk Insights

Risk Factor Observation Recommended Response
Credit History 77% approvals were linked to good credit Promote credit education & building tools
Self-Employment Approval rate significantly lower Enhance financial profiling and support docs
Gender Disparity Male applicants approved more often Audit and enforce gender-neutral assessments
Region Dependence Semiurban areas dominate Diversify into rural and urban zones
Household Size More dependents = higher loan asks Stricter income vetting for large families

📝 Key Recommendations

  1. Deploy real-time risk scoring using machine learning for smarter approval decisions.
  2. Launch mobile-first loan application tools targeting rural and underserved areas.
  3. Automate rejection notifications with personalized improvement tips.
  4. Promote joint-loan products to improve approval odds for families.
  5. Partner with fintechs for alternative scoring and financial literacy, especially for self-employed applicants.

🎯 Conclusion

This end-to-end analytical solution demonstrates how integrating Big Data (Hadoop + Hive) with BI tools (Power BI) can uncover powerful insights in loan processing and risk management. The analysis not only reveals approval trends but also pinpoints demographic and behavioral patterns that can guide smarter, fairer, and more profitable lending strategies.

Future enhancements include:

  • Integrating Apache Spark for real-time loan scoring.
  • Building predictive models for default probability.
  • Developing mobile workflows for inclusive digital lending.

About

This project uses Hadoop and Power BI to analyze 600+ loan applications based on income, credit history, employment, and demographics. The project uncovers key approval factors such as credit history and marital status while highlighting disparities by gender and region. Interactive dashboards reveal patterns across loan amounts,applicant profiles.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages