Skip to content

Latest commit

 

History

History
28 lines (19 loc) · 1.23 KB

Exercise_6.md

File metadata and controls

28 lines (19 loc) · 1.23 KB

Exercise 6

In this exercise, we study text categorization using the 20 newsgroups (20ng) dataset. The dataset contains 20,000 text documents (Usenet messages) in 20 categories (newsgroups or topics). For the embeddings of RNNs and CNNs we are using pre-trained 100-dimensional GloVe vectors.

Task 1

Try three different approaches for text classification with the 20 newsgroups (20ng) dataset:

Run all three models and compare their accuracies and run times.

Task 2

Pick one model (RNN, CNN or BERT) and try to improve the results, e.g., by tweaking the model or the training parameters (optimizer, batch size, number of epochs, etc.).

You can also work on replacing BERT with another Transformers model (for example DistilBert). See also the HuggingFace Transformers documentation.