Regression-Trees-on-TLC-Trip-Record-Data

This notebook demonstrates how to build and evaluate a regression tree model to predict taxi tip amounts using the NYC Taxi dataset. Below is a summary of the key steps and findings:

Key Steps

Data Preparation:
- Loaded the dataset containing taxi trip information
- Examined correlations between features and the target variable (tip_amount)
- Identified low-correlation features that could potentially be removed
Model Training:
- Split data into training (70%) and testing (30%) sets
- Built a Decision Tree Regressor with max_depth=8
- Trained the model on the training data
Evaluation:
- Evaluated model performance using MSE (Mean Squared Error) and R² score
- MSE: 1.784
- R²: 0.001 (very low, indicating poor predictive performance)
Experimentation:
- Tested different max_depth values (4, 12)
- Removed low-correlation features to simplify the model
- Visualized the decision tree structure

Key Findings

Feature Importance:
- The top 3 features affecting tip amount are:
  1. fare_amount (highest correlation)
  2. tolls_amount
  3. trip_distance
Model Performance:
- The initial model performed poorly (R² ≈ 0)
- Reducing max_depth to 4 improved performance slightly
- Increasing max_depth to 12 worsened performance (negative R²), indicating overfitting
Feature Selection:
- Removing low-correlation features (payment_type, VendorID, etc.) had minimal impact on model performance
- A simplified model using only the top 3 features performed similarly to the full-feature model

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Regression-Trees-Taxi-Tip-v1.ipynb		Regression-Trees-Taxi-Tip-v1.ipynb
yellow_tripdata.csv		yellow_tripdata.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Regression-Trees-on-TLC-Trip-Record-Data

Key Steps

Key Findings

About

Uh oh!

Releases

Packages

Languages

abdulazeezalmaafry-crypto/Regression-Trees-on-TLC-Trip-Record-Data

Folders and files

Latest commit

History

Repository files navigation

Regression-Trees-on-TLC-Trip-Record-Data

Key Steps

Key Findings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages