Wrapper for docTR end-to-end text detection and recognition.
The wrapper takes a VideoDocument
with SWT
TimeFrame
annotations. The app specifically
uses the representative TimePoint
annotations from SWT v4 TimeFrame
annotations to extract specific frames for OCR
From the docTR documentation
The docTR model returns a Document
object
Here is the typical Document layout:
Document(
(pages): [Page(
dimensions=(340, 600)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='No.', confidence=0.91),
Word(value='RECEIPT', confidence=0.99),
Word(value='DATE', confidence=0.96),
]
)]
(artefacts): []
)]
)]
)
The docTR wrapper preserves this structured information in the output MMIF by creating
lapps Paragraph
Sentence
and Token
annotations corresponding to the Block
, Line
, and Word
from the docTR output.
General user instructions for CLAMS apps is available at CLAMS Apps documentation.
- Requires mmif-python[cv] for the
VideoDocument
helper functions - Requires GPU to run at a reasonable speed
For the full list of parameters, please refer to the app metadata from CLAMS App Directory or metadata.py
file in this repository.