This project is a machine learning-based SMS spam detection system, developed using Python. The system classifies SMS messages as either spam or not spam by leveraging natural language processing (NLP) techniques and a Logistic Regression model. The model is trained on a labeled dataset of SMS messages and uses TF-IDF for feature extraction.
The project was developed to demonstrate text processing, feature extraction, and model training, with the final model being saved for deployment in a production environment where it can classify new SMS messages in real-time.
- Programming Language: Python
- Libraries: Pandas for data processing / NLTK for natural language processing / Scikit-learn for machine learning and model evaluation / Joblib for model serialization
- NLP Techniques: Tokenization / Stop Words Removal / Stemming (or Lemmatization) / TF-IDF Vectorization
- Model: Logistic Regression
The dataset is loaded and cleaned, including the removal of punctuation, conversion to lowercase, tokenization, and stop words removal.
Optional stemming or lemmatization is applied to reduce words to their base forms.
TF-IDF vectorization is used to convert text data into numerical features for the model.
The Logistic Regression model is trained on the preprocessed data.
Model performance is evaluated using accuracy scores, classification reports, and cross-validation.
The trained model and TF-IDF vectorizer are saved using joblib for future use.
SMS Spam Classification: The system can classify new SMS messages as spam or not spam using the trained Logistic Regression model.
Model Performance Evaluation: Includes accuracy scores, classification reports, and cross-validation to assess the model's generalization to unseen data.
Write on the bash: git clone
Ensure you have Python installed and all the libraries used in the project.
Execute the Python script to train the model or use the pre-trained model for predictions.
[LinkedIn]: arixstoo
[E-mail]: [email protected]
[Discord]: arixstoo