This project is a Streamlit web application that classifies reddit comments into three categories: "Veterinarian", "Medical Doctor", or "Others". The classification is performed using a fine-tuned BERT model.
- Upload a CSV file containing a column named 'comments'
- Preprocess comments by removing duplicates and preserving order
- Classify each comment into one of three categories
- Download the classified results as a CSV file
- Python 3
-
Open your terminal or command prompt.
-
Clone the repository using the following command:
git clone <your-repo-url>
-
Navigate to the project directory:
cd <your-repo-name>
-
Create a virtual environment:
python3 -m venv myenv
-
Activate the virtual environment:
-
On macOS/Linux:
source myenv/bin/activate
-
On Windows:
myenv\Scripts\activate
-
-
Install the required packages:
pip install -r requirements.txt
-
Ensure the virtual environment is activated.
-
Run the Streamlit app:
streamlit run app.py
-
Open your web browser and navigate to the URL provided by Streamlit (usually
http://localhost:8501
).
- Upload a CSV file containing a column named 'comments'.
- The app will preprocess and classify each comment.
- View the classified results in the app.
- Download the classified results as a CSV file.
- In a case where streamlit app throws a 'Cannot import BertTokenizer' error, simply refresh page to get required frontend.