Skip to content

cjgwang/sms-text-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project is an application of natural language processing in machine learning. It is a classifer that distinguishes between spam and ham (non-spam) tests. The goal is to develop a model that can accurately identify and filter unwanted texts.

How It Works

  1. Data Collection and Labeling: We start with a dataset of emails that are already labeled as either spam or ham.

  2. Text Preprocessing: The raw email text is cleaned and transformed with CountVectorizer:

  • Removing punctuation and special characters.
  • Converting all text to lowercase.
  • Removing common, non-informative words (known as "stop words," like "the," "is," "a").
  1. Model Training: We will use a Naive Bayes classifier, a probabilistic algorithm that is well-suited for text classification tasks. The model is trained on our SMS data to learn the patterns that differentiate spam from ham.

  2. Model Evaluation: The trained model's performance is tested on a separate set of emails it has never seen before. We measure its accuracy and other metrics to ensure it is effective.

  3. Classify Message: We can input any SMS message and the model will classify it as either spam or ham.

About

NLP SMS classification using Naive Bayes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages