Implementation of clustering Yelp reviews with Naive Bayes/KNN. Java 8 required (Java Streams used for multithreading).
Files go through entire process from reading in files, stemming, eliminating stopwards, and storing them as sparse matrices or language models to getting clusters based on Naive Bayes and random projection KNN.
The models were compared with cross validation (also implemented).