Skip to content

Commit 10eb0e2

Browse files
authored
Merge pull request #32 from clamsproject/31-symlink-getrid
dynamic symlinking
2 parents bbacf1a + 2015e6d commit 10eb0e2

36 files changed

+140
-154
lines changed

.dockerignore

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
static/tmp*
21
*~
32
__pycache__
43
.git

.gitignore

-3
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,6 @@ gdrive_shared*/
7171
tags
7272
.tags
7373

74-
# static archival files
75-
static/tmp*
76-
7774
# VSCode
7875
.devcontainer
7976
devcontainer.json

README.md

+49-72
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ This application creates an HTML server that visualizes annotation components in
77
- Interactive, searchable MMIF tree view with [JSTree](https://www.jstree.com/).
88
- Embedded [Universal Viewer](https://universalviewer.io/) (assuming file refers to video and/or image document).
99

10-
1110
The application also includes tailored visualizations depending on the annotations present in the input MMIF:
1211
| Visualization | Supported CLAMS apps |
1312
|---|---|
@@ -16,9 +15,7 @@ The application also includes tailored visualizations depending on the annotatio
1615
| Named entity annotations with [displaCy.](https://explosion.ai/demos/displacy-ent) | [SPACY](https://github.com/clamsproject/app-spacy-wrapper) | |
1716
| Screenshots & HTML5 video navigation of TimeFrames | [Chyron text recognition](https://github.com/clamsproject/app-chyron-text-recognition), [Slate detection](https://github.com/clamsproject/app-slatedetection), [Bars detection](https://github.com/clamsproject/app-barsdetection) |
1817

19-
20-
21-
Requirements:
18+
## Requirements:
2219

2320
- A command line interface.
2421
- Git (to get the code).
@@ -31,7 +28,9 @@ To get this code if you don't already have it:
3128
$ git clone https://github.com/clamsproject/mmif-visualizer
3229
```
3330

34-
## Quick start
31+
## Startup
32+
33+
### Quick start
3534

3635
If you just want to get the server up and running quickly, the repository contains a shell script `start_visualizer.sh` to immediately launch the visualizer in a container. You can invoke it with the following command:
3736

@@ -42,72 +41,78 @@ If you just want to get the server up and running quickly, the repository contai
4241
* The **required** `data_directory` argument should be the absolute or relative path of the media files on your machine which the MMIF files reference.
4342
* The **optional** `mount_directory` argument should be specified if your MMIF files point to a different directory than where your media files are stored on the host machine. For example, if your video, audio, and text data is stored locally at `/home/archive` but your MMIF files refer to `/data/...`, you should set this variable to `/data`. (If this variable is not set, the mount directory will default to the data directory)
4443

45-
For example, if your media files are stored at `/llc_data` and your MMIF files specify the document location as `"location": "file:///data/...`, you can start the visualizer with the following command:
44+
For example, if your media files are stored at `/my_data` and your MMIF files specify the document location as `"location": "file:///data/...`, you can start the visualizer with the following command:
4645
```
47-
./start_visualizer.sh /llc_data /data
46+
./start_visualizer.sh /my_data /data
4847
```
4948

50-
The server can then be accessed at `http://localhost:5000/upload`
51-
52-
## Running the server in a container
53-
54-
Download or clone this repository and build an image using the `Dockerfile` (you may use another name for the -t parameter, for this example we use `clams-mmif-visualizer` throughout). **NOTE**: if using podman, just substitute `docker` for `podman` in the following commands.
49+
The server can then be accessed at `http://localhost:5001/upload`
5550

56-
```bash
57-
$ docker build . -f Containerfile -t clams-mmif-visualizer
58-
```
51+
The following is breakdown of the script's functionality:
5952

60-
In these notes we assume that the data are in a local directory named `/Users/Shared/archive` with sub directories `audio`, `image`, `text` and `video` (those subdirectories are standard in CLAMS, but the parent directory could be any directory depending on your local set up). We can now run a Docker container with
61-
62-
```bash
63-
$ docker run --rm -d -p 5000:5000 -v /Users/Shared/archive:/data clams-mmif-visualizer
64-
```
53+
### Running the server natively
6554

66-
See the *Data source repository and input MMIF file* section below for a description of the MMIF file. Assuming you have not made any changes to the directory structure you can use the example MMIF files in the `input` folder.
67-
68-
**Some background**
69-
70-
With the docker command above we do two things of note:
55+
First install the python dependencies listed in `requirements.txt`:
7156

72-
1. The container port 5000 (the default for a Flask server) is exposed to the same port on your Docker host (your local computer) with the `-p` option.
73-
2. The local data repository `/Users/Shared/archive` is mounted to `/data` on the container with the `-v` option.
57+
````bash
58+
$ pip install -r requirements.txt
59+
````
7460

75-
Another useful piece of information is that the Flask server on the Docker container has no direct access to `/data` since it can only see data in the `static` directory of this repository. Therefore we have created a symbolic link `static/data` that links to `/data`:
61+
You will also need to install opencv-python if you are not running within a container (`pip install opencv-python`).
62+
Then, to run the server do:
7663

7764
```bash
78-
$ ln -s /data static/data
65+
$ python app.py
7966
```
8067

81-
With this, the mounted directory `/data` in the container is accessable from inside the `/app/static` directory of the container. You do not need to use this command unless you change your set up because the symbolic link is part of this repository.
82-
68+
Running the server natively means that the source media file paths in the target MMIF file are all accessible in the local file system, under the same directory paths.
69+
If that's not the case, and the paths in the MMIF is beyond your FS permission, using container is recommended. See the next section for an example.
8370

71+
#### Data source repository and example MMIF file
72+
This repository contains an example MMIF file in `example/whisper-spacy.json`. This file refers to three media files:
8473

85-
## Running the server locally
74+
1. service-mbrs-ntscrm-01181182.mp4
75+
2. service-mbrs-ntscrm-01181182.wav
76+
3. service-mbrs-ntscrm-01181182.txt
77+
78+
> [!NOTE]
79+
> Note on source/copyright: these documents are sourced from [the National Screening Room collection in the Library of Congress Online Catalog](https://hdl.loc.gov/loc.mbrsmi/ntscrm.01181182). The collection provides the following copyright information:
80+
> > The Library of Congress is not aware of any U.S. copyright or other restrictions in the vast majority of motion pictures in these collections. Absent any such restrictions, these materials are free to use and reuse.
8681
87-
First install the python dependencies listed in `requirements.txt`:
82+
These files can be found in the directory `example/example-documents`. But according to the `whisper-spacy.json` MMIF file, those three files should be found in their respective subdirectories in `/data`.
83+
Easy way to align these paths is probably to create a symbolic link to the `example-documents` directory in the `/data` directory.
84+
However, since `/data` is located at the root directory, you might not have permission to write a new symlink to the FS root.
85+
In this case you can more easily re-map the `examples/example-documents` directory to `/data` by using the `-v` option in the docker-run command. See below.
8886

89-
````bash
90-
$ pip install -r requirements.txt
91-
````
87+
### Running the server in a container
9288

93-
You will also need to install opencv-python if you are not running within a container (`pip install opencv-python`).
89+
Download or clone this repository and build an image using the `Containerfile` (you may use another name for the -t parameter,
90+
for this example we use `clams-mmif-visualizer` throughout).
9491

95-
Let's again assume that the data are in a local directory `/Users/Shared/archive` with sub directories `audio`, `image`, `text` and`video`. You need to copy, symlink, or mount that local directory into the `static` directory. Note that the `static/data` symbolic link that is in the repository is set up to work with the docker containers, if you keep it in that form your data need to be in `/data`, otherwise you need to change the link to fit your needs, for example, you could remove the symbolic link and replace it with one that uses your local directory:
92+
> [!NOTE]
93+
> if using podman, just substitute `docker` for `podman` in the following commands.
9694
9795
```bash
98-
$ rm static/data
99-
$ ln -s /Users/Shared/archive static/data
96+
$ docker build . -f Containerfile -t clams-mmif-visualizer
10097
```
10198

102-
To run the server do:
99+
In these notes we assume that the data are in a local directory named `/home/myuser/public` with subdirectories `audio`, `image`, `text` and `video`. We can now run a container with
103100

104101
```bash
105-
$ python app.py
102+
$ docker run --rm -d -p 5001:5000 -v /home/myuser/public:/data clams-mmif-visualizer
106103
```
107104

105+
> [!NOTE]
106+
> With the docker command above we do two things of note:
107+
> 1. The container port 5000 (the default for a Flask server) is exposed to the same port on your host (your local computer) with the `-p` option.
108+
> 2. The local data repository `/home/myuser/public` is mounted to `/data` on the container with the `-v` option.
109+
110+
Now, when you use the `example/example-documents` directory as the data source to visualize `examples/whisper-spacy.json` MMIF file, you need to triple-mount the example directory to the container, as `audio`, `video`, and `text` respectively.
108111

109-
## Uploading Files
110-
MMIF files can be uploaded to the visualization server one of two ways:
112+
$ docker run --rm -d -p 5001:5000 -v $(pwd)/example/example-documents:/data/audio -v $(pwd)/example/example-documents:/data/video -v $(pwd)/example/example-documents:/data/text clams-mmif-visualizer
113+
114+
## Usage
115+
Use the visualizer by uploading files. MMIF files can be uploaded to the visualization server one of two ways:
111116
* Point your browser to http://0.0.0.0:5000/upload, click "Choose File" and then click "Visualize". This will generate a static URL containing the visualization of the input file (e.g. `http://localhost:5000/display/HaTxbhDfwakewakmzdXu5e`). Once the file is uploaded, the page will automatically redirect to the file's visualization.
112117
* Using a command line, enter:
113118
```
@@ -117,31 +122,3 @@ MMIF files can be uploaded to the visualization server one of two ways:
117122

118123
The server will maintain a cache of up to 50MB for these temporary files, so the visualizations can be repeatedly accessed without needing to re-upload any files. Once this limit is reached, the server will delete stored visualizations until enough space is reclaimed, drawing from oldest/least recently accessed pages first. If you attempt to access the /display URL of a deleted file, you will be redirected back to the upload page instead.
119124

120-
121-
## Data source repository and input MMIF file
122-
The data source includes video, audio, and text (transcript) files that are subjects for the CLAMS analysis tools. As mentioned above, to make this visualizer work with those files and be able to display the contents on the web browser, those source files need to be accessible from inside the `static` directory.
123-
124-
This repository contains an example MMIF file in `input/whisper-spacy.json`. This file refers to three media files:
125-
126-
1. service-mbrs-ntscrm-01181182.mp4
127-
2. service-mbrs-ntscrm-01181182.wav
128-
3. service-mbrs-ntscrm-01181182.txt
129-
130-
These files can be found in the directory `input/example-documents`. They can be moved anywhere on the host machine, as long as they are placed in the subdirectories `video`, `audio`, and `text` respectively. (e.g. `/Users/Shared/archive/video`, etc.)
131-
132-
According to the MMIF file, those three files should be found in their respective subdirectories in `/data`. The Flask server will look for these files in `static/data/video`, `static/data/audio` and `static/data/text`, amd those directories should point at the appropriate location:
133-
134-
- If you run the visualizer in a Docker container, then the `-v` option in the docker-run command is used to mount the local data directory `/Users/shared/archive` to the `/data` directory on the container and the `static/data` symlink already points to that.
135-
- If you run the visualizer on your local machine without using a container, then you have a couple of options (where you may need to remove the current link first):
136-
- Make sure that the `static/data` symlink points at the local data directory
137-
`$ ln -s /Users/Shared/archive/ static/data`
138-
- Copy the contents of `/Users/Shared/archive` into `static/data`.
139-
- You could choose to copy the data to any spot in the `static` folder but then you would have to edit the MMIF input file.
140-
141-
142-
---
143-
Note on source/copyright: these documents are sourced from [the National Screening Room collection in the Library of Congress Online Catalog](https://hdl.loc.gov/loc.mbrsmi/ntscrm.01181182). The collection provides the following copyright information:
144-
145-
> The Library of Congress is not aware of any U.S. copyright or other restrictions in the vast majority of motion pictures in these collections. Absent any such restrictions, these materials are free to use and reuse.
146-
147-
---

app.py

+24-13
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def index():
2121
def ocr():
2222
try:
2323
data = dict(request.json)
24-
mmif_str = open(cache.get_cache_path() / data["mmif_id"] / "file.mmif").read()
24+
mmif_str = open(cache.get_cache_root() / data["mmif_id"] / "file.mmif").read()
2525
mmif = Mmif(mmif_str)
2626
ocr_view = mmif.get_view_by_id(data["view_id"])
2727
return prepare_ocr_visualization(mmif, ocr_view, data["mmif_id"])
@@ -67,23 +67,29 @@ def upload():
6767
def invalidate_cache():
6868
app.logger.debug(f"Request to invalidate cache on {request.args}")
6969
if not request.args.get('viz_id'):
70+
app.logger.debug("Invalidating entire cache.")
7071
cache.invalidate_cache()
7172
return redirect("/upload")
7273
viz_id = request.args.get('viz_id')
73-
in_mmif = open(cache.get_cache_path() / viz_id / 'file.mmif', 'rb').read()
74+
in_mmif = open(cache.get_cache_root() / viz_id / 'file.mmif', 'rb').read()
75+
app.logger.debug(f"Invalidating {viz_id} from cache.")
7476
cache.invalidate_cache([viz_id])
7577
return upload_file(in_mmif)
7678

7779

7880
@app.route('/display/<viz_id>')
7981
def display(viz_id):
80-
try:
81-
path = cache.get_cache_path() / viz_id
82+
path = cache.get_cache_root() / viz_id
83+
app.logger.debug(f"Displaying visualization {viz_id} from {path}")
84+
if os.path.exists(path / "index.html"):
85+
app.logger.debug(f"Visualization {viz_id} found in cache.")
8286
set_last_access(path)
8387
with open(os.path.join(path, "index.html")) as f:
8488
html_file = f.read()
8589
return html_file
86-
except FileNotFoundError:
90+
else:
91+
app.logger.debug(f"Visualization {viz_id} not found in cache.")
92+
os.remove(path)
8793
flash("File not found -- please upload again (it may have been deleted to clear up cache space).")
8894
return redirect("/upload")
8995

@@ -95,12 +101,12 @@ def send_js(path):
95101

96102
def render_mmif(mmif_str, viz_id):
97103
mmif = Mmif(mmif_str)
98-
media = documents_to_htmls(mmif, viz_id)
99-
app.logger.debug(f"Prepared Media: {[m[0] for m in media]}")
104+
htmlized_docs = documents_to_htmls(mmif, viz_id)
105+
app.logger.debug(f"Prepared document: {[d[0] for d in htmlized_docs]}")
100106
annotations = prep_annotations(mmif, viz_id)
101107
app.logger.debug(f"Prepared Annotations: {[annotation[0] for annotation in annotations]}")
102108
return render_template('player.html',
103-
media=media, viz_id=viz_id, annotations=annotations)
109+
docs=htmlized_docs, viz_id=viz_id, annotations=annotations)
104110

105111

106112
def upload_file(in_mmif):
@@ -109,7 +115,7 @@ def upload_file(in_mmif):
109115
in_mmif_str = in_mmif_bytes.decode('utf-8')
110116
viz_id = hashlib.sha1(in_mmif_bytes).hexdigest()
111117
app.logger.debug(f"Visualization ID: {viz_id}")
112-
path = cache.get_cache_path() / viz_id
118+
path = cache.get_cache_root() / viz_id
113119
app.logger.debug(f"Visualization Directory: {path}")
114120
try:
115121
os.makedirs(path)
@@ -136,9 +142,14 @@ def upload_file(in_mmif):
136142

137143
if __name__ == '__main__':
138144
# Make path for temp files
139-
cache_path = cache.get_cache_path()
140-
if not os.path.exists(cache_path):
141-
os.makedirs(cache_path)
145+
cache_path = cache.get_cache_root()
146+
cache_symlink_path = os.path.join(app.static_folder, cache._CACHE_DIR_SUFFIX)
147+
if os.path.islink(cache_symlink_path):
148+
os.unlink(cache_symlink_path)
149+
elif os.path.exists(cache_symlink_path):
150+
raise RuntimeError(f"Expected {cache_symlink_path} to be a symlink (for re-linking to a new cache dir, "
151+
f"but it is a real path.")
152+
os.symlink(cache_path, cache_symlink_path)
142153

143154
# to avoid runtime errors for missing keys when using flash()
144155
alphabet = 'abcdefghijklmnopqrstuvwxyz1234567890'
@@ -148,4 +159,4 @@ def upload_file(in_mmif):
148159
if len(sys.argv) > 2 and sys.argv[1] == '-p':
149160
port = int(sys.argv[2])
150161

151-
app.run(port=port, host='0.0.0.0', debug=True, use_reloader=False)
162+
app.run(port=port, host='0.0.0.0', debug=True, use_reloader=True)

cache.py

+13-16
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,28 @@
11
import os
2-
import time
2+
import pathlib
33
import shutil
4+
import tempfile
45
import threading
5-
import pathlib
6-
7-
from utils import app
6+
import time
87

98
lock = threading.Lock()
109

11-
12-
def get_cache_path():
13-
return pathlib.Path(app.static_folder) / "tmp"
10+
# module constants are unchanged throughout multiple "imports"
11+
_CACHE_DIR_SUFFIX = "mmif-viz-cache"
12+
_CACHE_DIR_ROOT = tempfile.TemporaryDirectory(suffix=_CACHE_DIR_SUFFIX)
1413

1514

16-
def get_cache_relpath(full_path):
17-
return str(full_path)[len(app.static_folder):]
15+
def get_cache_root():
16+
return pathlib.Path(_CACHE_DIR_ROOT.name)
1817

1918

2019
def invalidate_cache(viz_ids):
2120
if not viz_ids:
22-
app.logger.debug("Invalidating entire cache.")
23-
shutil.rmtree(get_cache_path())
24-
os.makedirs(get_cache_path())
21+
shutil.rmtree(get_cache_root())
22+
os.makedirs(get_cache_root())
2523
else:
2624
for v in viz_ids:
27-
app.logger.debug(f"Invalidating {v} from cache.")
28-
shutil.rmtree(get_cache_path() / v)
25+
shutil.rmtree(get_cache_root() / v)
2926

3027

3128
def set_last_access(path):
@@ -35,9 +32,9 @@ def set_last_access(path):
3532

3633
def scan_tmp_directory():
3734
oldest_accessed_dir = {"dir": None, "access_time": None}
38-
total_size = sum(f.stat().st_size for f in get_cache_path().glob('**/*') if f.is_file())
35+
total_size = sum(f.stat().st_size for f in get_cache_root().glob('**/*') if f.is_file())
3936
# this will be some visualization IDs
40-
for p in get_cache_path().glob('*'):
37+
for p in get_cache_root().glob('*'):
4138
if not (p / 'last_access.txt').exists():
4239
oldest_accessed_dir = {"dir": p, "access_time": 0}
4340
elif oldest_accessed_dir["dir"] is None:

displacy/__init__.py

+2-9
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,8 @@
11
import os
22

3-
from spacy import displacy
4-
5-
from mmif.serialize import Mmif, View, Annotation
6-
from mmif.vocabulary import AnnotationTypes
7-
from mmif.vocabulary import DocumentTypes
83
from lapps.discriminators import Uri
9-
10-
11-
def get_displacy(mmif: Mmif):
12-
return displacy_dict_to_ent_html(mmif_to_displacy_dict(mmif))
4+
from mmif.serialize import Mmif, View, Annotation
5+
from spacy import displacy
136

147

158
def visualize_ner(mmif: Mmif, view: View, document_id: str, app_root: str) -> str:
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)