Skip to content

Commit 185e8c3

Browse files
authored
Merge pull request RK1905101#101 from harshaparida/harshaparida'sContri
ML project (Python)
2 parents e27109c + 9a2dc85 commit 185e8c3

File tree

3 files changed

+1018
-0
lines changed

3 files changed

+1018
-0
lines changed
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Titanic Survival Prediction
2+
3+
This project aims to predict the survival of passengers aboard the Titanic using a Logistic Regression model. The model is trained on a dataset of passenger information and can predict whether a passenger would survive based on user-provided input features.
4+
5+
## Project Structure
6+
7+
- `train.csv`: The dataset containing information about Titanic passengers.
8+
- `titanic_survival_prediction.py`: The main Python script that preprocesses the data, trains the model, and predicts survival based on user input.
9+
10+
## Requirements
11+
12+
- Python 3.x
13+
- numpy
14+
- pandas
15+
- scikit-learn
16+
17+
## Setup
18+
19+
1. Ensure you have Python 3.x installed on your system.
20+
2. Install the necessary Python packages using pip:
21+
```sh
22+
pip install numpy pandas scikit-learn
23+
```
24+
25+
## Usage
26+
27+
1. Place the `train.csv` file in the same directory as `titanic_survival_prediction.py`.
28+
2. Run the `titanic_survival_prediction.py` script:
29+
```sh
30+
python titanic_survival_prediction.py
31+
```
32+
3. Follow the prompts to enter passenger details:
33+
- Passenger class (1st, 2nd, or 3rd)
34+
- Gender (Male/Female)
35+
- Age
36+
- Number of siblings or spouses aboard
37+
- Number of parents or children aboard
38+
- Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
39+
- Fare
40+
41+
4. The model will predict whether the passenger would survive or not and display the model accuracy.
42+
43+
## Script Details
44+
45+
### Data Preprocessing
46+
47+
The following preprocessing steps are applied to the dataset:
48+
- Drop the `Cabin` column due to a large number of missing values.
49+
- Fill missing `Age` values with the mean age.
50+
- Fill missing `Embarked` values with the mode.
51+
- Fill missing `Fare` values with the mean fare.
52+
- Convert categorical variables `Sex` and `Embarked` to numerical values.
53+
54+
### Model Training
55+
56+
- The features are defined by dropping irrelevant columns (`PassengerId`, `Name`, `Ticket`, `Survived`).
57+
- The target variable is `Survived`.
58+
- The dataset is split into training and testing sets (80-20 split).
59+
- A Logistic Regression model is trained on the training set.
60+
61+
### Prediction Function
62+
63+
- Prompts the user for passenger details.
64+
- Converts user input into a format suitable for the model.
65+
- Predicts survival based on user input.
66+
- Displays whether the passenger is predicted to survive or not.
67+
- Prints the model's accuracy on the test set.
68+
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Import necessary libraries
2+
import numpy as np
3+
import pandas as pd
4+
from sklearn.linear_model import LogisticRegression
5+
from sklearn.model_selection import train_test_split
6+
from sklearn.metrics import accuracy_score
7+
8+
# Load the dataset
9+
dataset = pd.read_csv('train.csv')
10+
11+
# Data preprocessing
12+
dataset = dataset.drop(columns='Cabin', axis=1)
13+
dataset['Age'].fillna(dataset['Age'].mean(), inplace=True)
14+
dataset['Embarked'].fillna(dataset['Embarked'].mode()[0], inplace=True)
15+
dataset['Fare'].fillna(dataset['Fare'].mean(), inplace=True) # Add this line to handle missing Fare values
16+
dataset.replace({'Sex': {'male': 0, 'female': 1}, 'Embarked': {'S': 0, 'C': 1, 'Q': 2}}, inplace=True)
17+
18+
# Define features (X) and target (y)
19+
X = dataset.drop(columns=['PassengerId', 'Name', 'Ticket', 'Survived'], axis=1)
20+
y = dataset['Survived']
21+
22+
# Splitting the dataset into training and testing sets
23+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
24+
25+
# Train the model
26+
model = LogisticRegression()
27+
model.fit(X_train, y_train)
28+
29+
# Function to get user input and make predictions
30+
def predict_survival():
31+
user_input = {}
32+
user_input['Pclass'] = int(input("Enter passenger class (1st, 2nd, or 3rd): "))
33+
user_input['Sex'] = 1 if input("Enter passenger gender (Male/Female): ").lower() == 'female' else 0
34+
user_input['Age'] = float(input("Enter passenger age: "))
35+
user_input['SibSp'] = int(input("Enter number of siblings or spouses aboard: "))
36+
user_input['Parch'] = int(input("Enter number of parents or children aboard: "))
37+
user_input['Embarked'] = input("Enter port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton): ")
38+
user_input['Embarked'] = {'C': 1, 'Q': 2, 'S': 3}.get(user_input['Embarked'].upper(), 3)
39+
user_input['Fare'] = float(input("Enter passenger fare: "))
40+
41+
user_df = pd.DataFrame([user_input], columns=X.columns)
42+
43+
44+
prediction = model.predict(user_df)
45+
46+
47+
if prediction[0] == 1:
48+
print("The passenger is predicted to survive.")
49+
else:
50+
print("The passenger is predicted not to survive.")
51+
52+
53+
y_pred = model.predict(X_test)
54+
accuracy = accuracy_score(y_test, y_pred)
55+
print("Model Accuracy:", accuracy)
56+
57+
predict_survival()
58+

0 commit comments

Comments
 (0)