GitHub - cjgwang/sms-text-classifier: NLP SMS classification using Naive Bayes

This project is an application of natural language processing in machine learning. It is a classifer that distinguishes between spam and ham (non-spam) tests. The goal is to develop a model that can accurately identify and filter unwanted texts.

How It Works

Data Collection and Labeling: We start with a dataset of emails that are already labeled as either spam or ham.
Text Preprocessing: The raw email text is cleaned and transformed with CountVectorizer:

Removing punctuation and special characters.
Converting all text to lowercase.
Removing common, non-informative words (known as "stop words," like "the," "is," "a").

Model Training: We will use a Naive Bayes classifier, a probabilistic algorithm that is well-suited for text classification tasks. The model is trained on our SMS data to learn the patterns that differentiate spam from ham.
Model Evaluation: The trained model's performance is tested on a separate set of emails it has never seen before. We measure its accuracy and other metrics to ensure it is effective.
Classify Message: We can input any SMS message and the model will classify it as either spam or ham.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.DS_Store		.DS_Store
README.md		README.md
project.py		project.py
requirements.txt		requirements.txt
spam.csv		spam.csv
test_project.py		test_project.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

cjgwang/sms-text-classifier

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages