Build classifier for typed vs handwritten text #108

funkyvoong · 2023-06-06T16:43:54Z

No description provided.

trgardos · 2023-06-20T17:42:14Z

Collecting some candidates for handwritten datasets:

Another option is to synthesize handwritten data. See ScrabbleGAN: https://github.com/amzn/convolutional-handwriting-gan

funkyvoong · 2023-06-21T21:09:40Z

Herbarium of the Future Paper: https://www.cell.com/trends/ecology-evolution/fulltext/S0169-5347(22)00295-6
Detecting Handwritten and Printed Text from Doctor's Notes: https://www.proquest.com/docview/2505259735?pq-origsite=gscholar&fromopenview=true

kabilanmohanraj · 2023-06-27T19:28:59Z

Updates (27th June 2023):

Datasets in use:

Handwritten

IAM HW
IAM Online-HW
IAM Washington

Machine Printed

FUNSD
SROIE2019 (typewriter-looking font)
Synthetic data (generated with different fonts and styles)

Unused for now:

COCO-Text (need to filter usable images strictly)
CVL-HW (need to segment lines of text using CRAFT, working on this now)
CVIT-HW (the text looks too organized, like printed text, and does not closely represent our data)

Work done:

I went through readings about preprocessing steps for OCR. The aspect ratio of our image plays a major role, so I plotted the statistic to understand its distribution, based on which I have written specific resizing and cropping transforms for each of the datasets we are currently using.
Populated test set with more images (working to increase the number of samples even more).
Testing out DenseNet model as an alternative to VGG16 because VGG16 overfits very quickly.

Model Performance:
Our pipeline accuracy is capped at 80% now. I have identified problematic images which the model misclassified, and I am working on adding more suitable images to counter this problem.

I think this is a good direction, as the COCO-Text dataset has not been included this time (last week, it was part of the training set). Individual preprocessing of each dataset has been proven effective, as our model's performance hasn't dropped much without the dataset.

Current Tasks:

Add samples to test dataset
Evaluate DenseNet model performance
Preprocess CVL-HW dataset
[Prof. Langdon] Add more samples with varied fonts to Typed text (Text+Font -> PDF -> Images)

kabilanmohanraj · 2023-07-01T23:25:08Z

Updates (30th June 2023):

Added synthetic font data with different font sizes and styles (LibreOffice UNO API -> ODT file -> PDF file -> JPG image -> CRAFT -> individual images). For sample data, please refer here [files] [images]. Adding the new data increased the accuracy score.
Modifications to the preprocessing pipeline - added erosion (morphological operation)
DenseNet121 model accuracy is over 88% (F1 score > 0.9). Unlocked more layers to fine-tune. Tuning the number of layers to unlock.
Discussion with Freddie:
a. Focus more on the data
b. Metrics to classify plant sample images into hw or typed (looking into this)
c. Pointers on DocAI models (Hugging Face, model distillation) (looking into this as well)

kabilanmohanraj · 2023-07-04T11:45:52Z

Updates (3rd July 2023):

Added more test images. Over 100 images were handpicked.
Both DenseNet121 and a custom VGG16-like model (trained from scratch) exhibit an accuracy of 88%. DenseNet is a little bit lower in accuracy.
Tried out some Document AI models hosted on Hugging Face. They don't identify labels from samples that well. Looking to label some samples to fine-tune such models.
Working on the post-processing pipeline to classify plant samples based on the average confidence scores for each segmentation classification. Scripting is almost done.

kabilanmohanraj · 2023-07-25T08:45:02Z

Updates (24th July 2023):

Classification task:
1.1 Implemented a Transformer classifier exhibiting a classification accuracy of about 96%.
1.2 Attempted to implement an AdaBoost-type training with a simple neural network. The accuracy was only about 64%.
1.3 Updated the post-processing (plant sample classification) step with the Transformer-based pipeline from 1.1. The classified images are in their respective folders.
1.4 Did readings on Transformers, attention mechanism, multi-head attention, TrOCR, and hugging face implementation of the TrOCR pipeline (from 17th-24th July).
1.5 Code commenting.

funkyvoong added the priority 1 label Jun 6, 2023

funkyvoong assigned kabilanmohanraj Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build classifier for typed vs handwritten text #108

Build classifier for typed vs handwritten text #108

funkyvoong commented Jun 6, 2023

trgardos commented Jun 20, 2023

funkyvoong commented Jun 21, 2023

kabilanmohanraj commented Jun 27, 2023 •

edited

Loading

kabilanmohanraj commented Jul 1, 2023 •

edited

Loading

kabilanmohanraj commented Jul 4, 2023

kabilanmohanraj commented Jul 25, 2023 •

edited

Loading

Build classifier for typed vs handwritten text #108

Build classifier for typed vs handwritten text #108

Comments

funkyvoong commented Jun 6, 2023

trgardos commented Jun 20, 2023

funkyvoong commented Jun 21, 2023

kabilanmohanraj commented Jun 27, 2023 • edited Loading

kabilanmohanraj commented Jul 1, 2023 • edited Loading

kabilanmohanraj commented Jul 4, 2023

kabilanmohanraj commented Jul 25, 2023 • edited Loading

kabilanmohanraj commented Jun 27, 2023 •

edited

Loading

kabilanmohanraj commented Jul 1, 2023 •

edited

Loading

kabilanmohanraj commented Jul 25, 2023 •

edited

Loading