Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 419 Bytes

README.md

File metadata and controls

6 lines (4 loc) · 419 Bytes

Text-Clustering

Implementation of clustering Yelp reviews with Naive Bayes/KNN. Java 8 required (Java Streams used for multithreading).

Files go through entire process from reading in files, stemming, eliminating stopwards, and storing them as sparse matrices or language models to getting clusters based on Naive Bayes and random projection KNN.

The models were compared with cross validation (also implemented).