This is a regression problem to predict california housing prices.
The dataset contains 20640 entries and 10 variables.
- Longitude
- Latitude
- Housing Median Age
- Total Rooms
- Total Bedrooms
- Population
- Households
- Median Income
- Median House Value
- Ocean Proximity
Median House Value is to be predicted in this problem.
I have done this project in two parts. First part contains data analysis and cleaning as explained in EDA and data cleaning.ipynb. Second is training of machine learning models explained in Training Machine Learning Algorithms.ipynb.
I have done the exploratory data analysis and done following manipulations on data.
- Creating new features
- Removing outliers
- Transforming skewed features
- Checking for multicoliniearity
Here, I have trained various machine learning algorithms like
- Linear Regression
- Ridge Regression
- Support Vector Regression
- Gradient Boosting Regression
- Stacking of various models