Skip to content

Commit c5bd580

Browse files
Show EAST annotations in OCR view & updated documentation
1 parent 418c204 commit c5bd580

File tree

6 files changed

+42
-32
lines changed

6 files changed

+42
-32
lines changed

Dockerfile Containerfile

File renamed without changes.

README.md

+20-11
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,28 @@
11
# The MMIF Visualization Server
22

3-
This application creates an HTML server that visualizes annotation components in a [MMIF](https://mmif.clams.ai) file. Supported annotations are:
3+
This application creates an HTML server that visualizes annotation components in a [MMIF](https://mmif.clams.ai) file. It contains the following visualizations for any valid MMIF:
44

5-
- Video or Audio file player with HTML5.
6-
- [WebVTT](https://www.w3.org/TR/webvtt1/) for showing alignments.
5+
- Video or Audio file player with HTML5 (assuming file refers to video and/or audio document).
76
- Pretty-printed MMIF contents.
8-
- Javascript for bounding boxes.
9-
- Named entity annotations with [displaCy.](https://explosion.ai/demos/displacy-ent)
7+
- Interactive, searchable MMIF tree view with [JSTree](https://www.jstree.com/).
8+
- Embedded [Universal Viewer](https://universalviewer.io/) (assuming file refers to video and/or image document).
9+
10+
11+
The application also includes tailored visualizations depending on the annotations present in the input MMIF:
12+
| Visualization | Supported CLAMS apps |
13+
|---|---|
14+
| [WebVTT](https://www.w3.org/TR/webvtt1/) for showing alignments of video captions. | [Whisper](https://github.com/clamsproject/app-whisper-wrapper), [Kaldi](https://github.com/clamsproject/app-aapb-pua-kaldi-wrapper) |
15+
| Javascript bounding boxes for image and OCR annotations. | [Tesseract](https://github.com/clamsproject/app-tesseractocr-wrapper), [EAST](https://github.com/clamsproject/app-east-textdetection) |
16+
| Named entity annotations with [displaCy.](https://explosion.ai/demos/displacy-ent) | [SPACY](https://github.com/clamsproject/app-spacy-wrapper) | |
17+
18+
1019

1120
Requirements:
1221

1322
- A command line interface.
1423
- Git (to get the code).
15-
- [Docker](https://www.docker.com/) (if you run the visualizer using Docker).
16-
- Python 3.6 or later (if you want to run the server without Docker).
24+
- [Docker](https://www.docker.com/) or [Podman](https://podman.io/) (if you run the visualizer in a container).
25+
- Python 3.6 or later (if you want to run the server containerless).
1726

1827
To get this code if you don't already have it:
1928

@@ -23,12 +32,12 @@ $ git clone https://github.com/clamsproject/mmif-visualizer
2332

2433

2534

26-
## Running the server in a Docker container
35+
## Running the server in a container
2736

28-
Download or clone this repository and build an image using the `Dockerfile` (you may use another name for the -t parameter, for this example we use `clams-mmif-visualizer` throughout).
37+
Download or clone this repository and build an image using the `Dockerfile` (you may use another name for the -t parameter, for this example we use `clams-mmif-visualizer` throughout). **NOTE**: if using podman, just substitute `docker` for `podman` in the following commands.
2938

3039
```bash
31-
$ docker build -t clams-mmif-visualizer .
40+
$ docker build . -f Containerfile -t clams-mmif-visualizer
3241
```
3342

3443
In these notes we assume that the data are in a local directory named `/Users/Shared/archive` with sub directories `audio`, `image`, `text` and `video` (those subdirectories are standard in CLAMS, but the parent directory could be any directory depending on your local set up). We can now run a Docker container with
@@ -56,7 +65,7 @@ With this, the mounted directory `/data` in the container is accessable from ins
5665

5766

5867

59-
## Running the server without Docker
68+
## Running the server without Docker/Podman
6069

6170
First install the python dependencies listed in `requirements.txt`:
6271

app.py

-2
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,11 @@ def index():
1919
def ocrpage():
2020
data = request.form
2121
try:
22-
# print(html.unescape(data['frames_pages']))
2322
frames_pages = eval(html.unescape(data['frames_pages']))
2423
page_number = int(data['page_number'])
2524

2625
return (render_ocr(data['vid_path'], frames_pages, page_number))
2726
except Exception as e:
28-
print(html.unescape(data['frames_pages']))
2927
return f'<p class="error">Unexpected error of type {type(e)}: {e}</h1>'
3028
pass
3129

ocr.py

+10-10
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
from flask import render_template
77

88

9-
def add_bounding_box(anno, frames):
10-
frame_num = anno.properties["frame"]
9+
def add_bounding_box(anno, frames, fps):
10+
frame_num = anno.properties.get("frame") or anno.properties.get("timePoint")
1111
box_id = anno.properties["id"]
1212
boxType = anno.properties["boxType"]
1313
coordinates = anno.properties["coordinates"]
@@ -21,17 +21,18 @@ def add_bounding_box(anno, frames):
2121
frames[frame_num]["bb_ids"].append(box_id)
2222
else:
2323
frames[frame_num] = {"boxes": [box], "text": [], "bb_ids": [box_id], "timestamp": None, "secs": None, "repeat": False}
24+
if fps:
25+
secs = int(frame_num/fps)
26+
frames[frame_num]["timestamp"] = str(datetime.timedelta(seconds=secs))
27+
frames[frame_num]["secs"] = secs
28+
2429
return frames
2530

2631

27-
def align_annotations(frames_list, alignments, text_docs, fps):
32+
def align_annotations(frames_list, alignments, text_docs):
2833
"""Link alignments with frames"""
2934
prev_frame = None
3035
for frame_num, frame in frames_list:
31-
if fps:
32-
secs = int(frame_num/fps)
33-
frame["timestamp"] = str(datetime.timedelta(seconds=secs))
34-
frame["secs"] = secs
3536
for box_id in frame["bb_ids"]:
3637
text_id = alignments[box_id]
3738
frame["text"].append(text_docs[text_id])
@@ -98,9 +99,8 @@ def round_boxes(boxes):
9899
def get_ocr_views(mmif):
99100
"""Return OCR views, which have TextDocument, BoundingBox, and Alignment annotations"""
100101
views = []
101-
needed_types = ["TextDocument", "BoundingBox", "Alignment"]
102+
ocr_apps = ["east-textdetection", "tesseract"]
102103
for view in mmif.views:
103-
annotation_types = [str(url).split("/")[-1] for url in view.metadata.contains.keys()]
104-
if needed_types == annotation_types:
104+
if any([view.metadata.app.find(ocr_app) for ocr_app in ocr_apps]):
105105
views.append(view)
106106
return views

templates/ocr.html

+6-4
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,12 @@
1313
<h4>
1414
frame: {{frame_num}}<br>
1515
timestamp: <a class="timestamp" onclick="SetCurTime('{{secs}}')">{{frame["timestamp"]}}</a><br>
16-
text detected:<br>
17-
{% for text in frame["text"] %}
18-
&emsp;{{text}}<br>
19-
{% endfor %}
16+
{% if frame["text"] %}
17+
text detected:<br>
18+
{% for text in frame["text"] %}
19+
&emsp;{{text}}<br>
20+
{% endfor %}
21+
{% endif %}
2022
</h4>
2123
</div>
2224
</div>

utils.py

+6-5
Original file line numberDiff line numberDiff line change
@@ -357,10 +357,13 @@ def get_properties(annotation):
357357
def prepare_ocr_visualization(mmif, view):
358358
""" Visualize OCR by extracting image frames with BoundingBoxes from video"""
359359
frames, text_docs, alignments = {}, {}, {}
360+
vid_path = get_video_path(mmif)
361+
cv2_vid = cv2.VideoCapture(vid_path)
362+
fps = cv2_vid.get(cv2.CAP_PROP_FPS)
360363
for anno in view.annotations:
361364
try:
362365
if anno.at_type.shortname == "BoundingBox":
363-
frames = add_bounding_box(anno, frames)
366+
frames = add_bounding_box(anno, frames, fps)
364367

365368
elif anno.at_type.shortname == "TextDocument":
366369
t = anno.properties["text_value"]
@@ -379,10 +382,8 @@ def prepare_ocr_visualization(mmif, view):
379382
pass
380383

381384
# Generate pages (necessary to reduce IO cost) and render
382-
vid_path = get_video_path(mmif)
383-
cv2_vid = cv2.VideoCapture(vid_path)
384-
fps = cv2_vid.get(cv2.CAP_PROP_FPS)
385385
frames_list = [(k, v) for k, v in frames.items()]
386-
frames_list = align_annotations(frames_list, alignments, text_docs, fps)
386+
if any(at_type.shortname == "Alignment" for at_type in view.metadata.contains):
387+
frames_list = align_annotations(frames_list, alignments, text_docs)
387388
frames_pages = paginate(frames_list)
388389
return render_ocr(vid_path, frames_pages, 0)

0 commit comments

Comments
 (0)