Text-Categorization-with-SVM-CNN

This project evaluated the use of support vector machines and convolutional neural networks as the solution for text categorization.

Datasets

IMDB movie reviews dataset：The IMDB dataset contains movie reviews excerpted from IMDB website. 20K negative reviews and 20K positive reviews are randomly selected to creat a total 40K reviews dataset for sentiment classification performance evaluation.
Amazon reviews dataset: This dataset consists of product reviews from Amazon. Similar to what we did with IMDB dataset, 20K negative reviews and 20K positive reviews are randomly selected to creat a total 40K reviews dataset for sentiment classification performance evaluation.
RCV1: This dataset consists of news stories from Reuters. These news stories are originally categorized with 103 topic categories in a hierarchy. We regroup the dataset into 8 single-label categories for multi-topic classification performance evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
amazon.m		amazon.m
cnn.py		cnn.py
feature_extraction_1_tfidf.py		feature_extraction_1_tfidf.py
movie.m		movie.m
rcv1_dag.m		rcv1_dag.m
rcv1_modifiy.m		rcv1_modifiy.m
rcv1_perprocessing.m		rcv1_perprocessing.m
readjson.py		readjson.py
zhenghe.py		zhenghe.py