This repository contains the code for Automated Asssessment of Arabic Essays
- feature_extraction_wholeEssays.ipynb: Feature extraction script
- classification.ipynb: The baseline models for this project
- ranking.ipynb: ranking SVM model
- d2v.model: the model for doc2vec
- augmented_essays.tsv: the augmented essays that were generated by GPT4
- words_features.csv: feature of the words of the essays
- documents_features.csv: features of each document
The dataset used in this project is ZAEBUC, it can be downloaded from here.
Samer Lexicon was used to extract the readability levels of the words. The Samer lexicon can be downloaded from here.