add more "seqtag" annotation for rfb training #88

keighrim · 2024-06-18T18:00:41Z

This issue is to track effort to add more sequential tagging annotation data to improve RFB model performance.

A few notes:

For next rounds, I highly recommend setting up traceable data prep pipeline to link the images (or OCR results) back to its originating videos/timestamp.
Since everyone agrees that erroneous OCR results is the major cause of poor performance of the model, we need to also think about how to improve OCR while adding more of this (silver) seqtag data. As we are adding a new OCR engine (paddleOCR), first thing we need to know is whether paddle can outperform docTR and thus can replace docTR in the pipeline. (wait for improve OCR evaluation script aapb-evaluations#52)

clams-bot added this to infra Jun 18, 2024

github-project-automation bot moved this to Todo in infra Jun 18, 2024

Provide feedback