Reddit Comment Classification App

This project is a Streamlit web application that classifies reddit comments into three categories: "Veterinarian", "Medical Doctor", or "Others". The classification is performed using a fine-tuned BERT model.

Features

Upload a CSV file containing a column named 'comments'
Preprocess comments by removing duplicates and preserving order
Classify each comment into one of three categories
Download the classified results as a CSV file

Setup Instructions

Prerequisites

Python 3

Clone the Repository

Open your terminal or command prompt.
Clone the repository using the following command:
```
git clone <your-repo-url>
```
Navigate to the project directory:
```
cd <your-repo-name>
```

Set Up the Virtual Environment

Create a virtual environment:
```
python3 -m venv myenv
```
Activate the virtual environment:
- On macOS/Linux:
```
source myenv/bin/activate
```
- On Windows:
```
myenv\Scripts\activate
```
Install the required packages:
```
pip install -r requirements.txt
```

Run the Streamlit App

Ensure the virtual environment is activated.
Run the Streamlit app:
```
streamlit run app.py
```
Open your web browser and navigate to the URL provided by Streamlit (usually http://localhost:8501).

Usage

Upload a CSV file containing a column named 'comments'.
The app will preprocess and classify each comment.
View the classified results in the app.
Download the classified results as a CSV file.

Error handling on page

In a case where streamlit app throws a 'Cannot import BertTokenizer' error, simply refresh page to get required frontend.