A machine learning project that classifies crime report narratives into categories using NLTK and scikit-learn.
The goal is to demonstrate natural language processing (NLP) techniques for text preprocessing, feature extraction, and classification.
- Preprocessing of crime report text with:
- Lowercasing, punctuation removal, and stemming (Snowball Stemmer).
- Stopword filtering.
- Word bigram generation for context.
- Two classification models implemented:
- Linear SVC (Support Vector Classifier)
- Maximum Entropy (MaxEnt)
- Training and testing pipeline with accuracy evaluation.
- Customizable training and test datasets (CSV format).
Clone the repository and install dependencies:
git clone https://github.com/Mohataseem89/lapd-classifier.git
cd crime-classifier
npm install
npm start