This directory contains the gold-standard annotation environment for "role-filler binding" (RFB) data. The most recent iteration of the RFB model uses a different "BIO" annotation structure based in NER, and uses training data generated from "silver standard" LLM outputs.
- To use/view that environment, see the
llm-silver-anno
directory. - To use/view the original gold-standard annotation environment, see this (top-level) directory
- Create a virtual environment
conda create -n annotate python=3.8
- Activate the virtual environment
conda activate annotate
- Install the app requirements
pip install -r requirements.txt
python ocr.py <directory_of_images>
streamlit run main.py <directory_of_images>:<annotation_output_directory>
<directory_of_images>
is the directory containing the images to be annotated and is the only volume that can be mounted to the container image<annotation_output_directory>
is the directory where the annotations will be saved. This directory will always be a subdirectory of the mounted volume<directory_of_images>
Note: Containerization does not currently support the OCR preprocessing step. Running in a container will require the user to run the OCR preprocessing step first
docker build -t annotation_env .
docker run -p 8501:8501 -v <directory_of_images>:/app/images annotation_env <directory_of_images>:<annotation_output_directory>
- For each image, annotate each Key-Value pair in the Annotation container
- Click on the
KEY
orVALUE
buttons with OCR output to copy the OCR output to the corresponding text box. Will append to existing text in the text box with a space in between - After annotating the whole image, there are four options:
- Click
Next Frame
to save your annotations and move to the next image - Click
Continuing Credits
to append your annotations to the previous image and move to the next image. (This will continue to append to the first image in this continuing chain untilNext Frame
is clicked) - Click
Duplicate Frame
to mark the current image as a duplicate of the previous image - Click
Skip Frame
to skip the current image after indicating the reason for skipping- Could be due to an unreadable image or an image with data not applicable to the current task
- Click
- If there is an error in the annotations for a given image, set the index in the
Delete Annotation
container to the index of the annotation to be deleted and clickDelete Key-Value Pair
- OCR output is color coded to indicate the confidence of the OCR output.
- Green indicates high confidence
- Orange indicates medium confidence
- Red indicates low confidence