This project evaluated the use of support vector machines and convolutional neural networks as the solution for text categorization.
- IMDB movie reviews dataset:The IMDB dataset contains movie reviews excerpted from IMDB website. 20K negative reviews and 20K positive reviews are randomly selected to creat a total 40K reviews dataset for sentiment classification performance evaluation.
- Amazon reviews dataset: This dataset consists of product reviews from Amazon. Similar to what we did with IMDB dataset, 20K negative reviews and 20K positive reviews are randomly selected to creat a total 40K reviews dataset for sentiment classification performance evaluation.
- RCV1: This dataset consists of news stories from Reuters. These news stories are originally categorized with 103 topic categories in a hierarchy. We regroup the dataset into 8 single-label categories for multi-topic classification performance evaluation.