Skip to content

clamsproject/mmif-visualizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

0447048 · Mar 10, 2024
Jun 7, 2023
May 26, 2023
Aug 27, 2023
Jul 25, 2023
Oct 1, 2019
Aug 27, 2023
Dec 29, 2021
Jun 7, 2023
Jul 20, 2023
Apr 26, 2019
Sep 16, 2023
Mar 10, 2024
Aug 27, 2023
Apr 4, 2023
Aug 27, 2023
Aug 27, 2023
Aug 27, 2023
Sep 16, 2023
Sep 12, 2023

Repository files navigation

The MMIF Visualization Server

This application creates an HTML server that visualizes annotation components in a MMIF file. It contains the following visualizations for any valid MMIF:

  • Video or Audio file player with HTML5 (assuming file refers to video and/or audio document).
  • Pretty-printed MMIF contents.
  • Interactive, searchable MMIF tree view with JSTree.
  • Embedded Universal Viewer (assuming file refers to video and/or image document).

The application also includes tailored visualizations depending on the annotations present in the input MMIF:

Visualization Supported CLAMS apps
WebVTT for showing alignments of video captions. Whisper, Kaldi
Javascript bounding boxes for image and OCR annotations. Tesseract, EAST
Named entity annotations with displaCy. SPACY
Screenshots & HTML5 video navigation of TimeFrames Chyron text recognition, Slate detection, Bars detection

Requirements:

  • A command line interface.
  • Git (to get the code).
  • Docker or Podman (if you run the visualizer in a container).
  • Python 3.6 or later (if you want to run the server containerless).

To get this code if you don't already have it:

$ git clone https://github.com/clamsproject/mmif-visualizer

Quick start

If you just want to get the server up and running quickly, the repository contains a shell script start_visualizer.sh to immediately launch the visualizer in a container. You can invoke it with the following command:

./start_visualizer.sh <data_directory> <mount_directory>
  • The required data_directory argument should be the absolute or relative path of the media files on your machine which the MMIF files reference.
  • The optional mount_directory argument should be specified if your MMIF files point to a different directory than where your media files are stored on the host machine. For example, if your video, audio, and text data is stored locally at /home/archive but your MMIF files refer to /data/..., you should set this variable to /data. (If this variable is not set, the mount directory will default to the data directory)

For example, if your media files are stored at /llc_data and your MMIF files specify the document location as "location": "file:///data/..., you can start the visualizer with the following command:

./start_visualizer.sh /llc_data /data

The server can then be accessed at http://localhost:5000/upload

Running the server in a container

Download or clone this repository and build an image using the Dockerfile (you may use another name for the -t parameter, for this example we use clams-mmif-visualizer throughout). NOTE: if using podman, just substitute docker for podman in the following commands.

$ docker build . -f Containerfile -t clams-mmif-visualizer

In these notes we assume that the data are in a local directory named /Users/Shared/archive with sub directories audio, image, text and video (those subdirectories are standard in CLAMS, but the parent directory could be any directory depending on your local set up). We can now run a Docker container with

$ docker run --rm -d -p 5000:5000 -v /Users/Shared/archive:/data clams-mmif-visualizer

See the Data source repository and input MMIF file section below for a description of the MMIF file. Assuming you have not made any changes to the directory structure you can use the example MMIF files in the input folder.

Some background

With the docker command above we do two things of note:

  1. The container port 5000 (the default for a Flask server) is exposed to the same port on your Docker host (your local computer) with the -p option.
  2. The local data repository /Users/Shared/archive is mounted to /data on the container with the -v option.

Another useful piece of information is that the Flask server on the Docker container has no direct access to /data since it can only see data in the static directory of this repository. Therefore we have created a symbolic link static/data that links to /data:

$ ln -s /data static/data

With this, the mounted directory /data in the container is accessable from inside the /app/static directory of the container. You do not need to use this command unless you change your set up because the symbolic link is part of this repository.

Running the server locally

First install the python dependencies listed in requirements.txt:

$ pip install -r requirements.txt

You will also need to install opencv-python if you are not running within a container (pip install opencv-python).

Let's again assume that the data are in a local directory /Users/Shared/archive with sub directories audio, image, text andvideo. You need to copy, symlink, or mount that local directory into the static directory. Note that the static/data symbolic link that is in the repository is set up to work with the docker containers, if you keep it in that form your data need to be in /data, otherwise you need to change the link to fit your needs, for example, you could remove the symbolic link and replace it with one that uses your local directory:

$ rm static/data
$ ln -s /Users/Shared/archive static/data

To run the server do:

$ python app.py

Uploading Files

MMIF files can be uploaded to the visualization server one of two ways:

  • Point your browser to http://0.0.0.0:5000/upload, click "Choose File" and then click "Visualize". This will generate a static URL containing the visualization of the input file (e.g. http://localhost:5000/display/HaTxbhDfwakewakmzdXu5e). Once the file is uploaded, the page will automatically redirect to the file's visualization.
  • Using a command line, enter:
    curl -X POST -F "file=@<filename>" -s http://localhost:5000/upload
    
    This will upload the file and print the unique identifier for the file visualization. The visualization can be accessed at http://localhost:5000/display/<id>

The server will maintain a cache of up to 50MB for these temporary files, so the visualizations can be repeatedly accessed without needing to re-upload any files. Once this limit is reached, the server will delete stored visualizations until enough space is reclaimed, drawing from oldest/least recently accessed pages first. If you attempt to access the /display URL of a deleted file, you will be redirected back to the upload page instead.

Data source repository and input MMIF file

The data source includes video, audio, and text (transcript) files that are subjects for the CLAMS analysis tools. As mentioned above, to make this visualizer work with those files and be able to display the contents on the web browser, those source files need to be accessible from inside the static directory.

This repository contains an example MMIF file in input/whisper-spacy.json. This file refers to three media files:

  1. service-mbrs-ntscrm-01181182.mp4
  2. service-mbrs-ntscrm-01181182.wav
  3. service-mbrs-ntscrm-01181182.txt

These files can be found in the directory input/example-documents. They can be moved anywhere on the host machine, as long as they are placed in the subdirectories video, audio, and text respectively. (e.g. /Users/Shared/archive/video, etc.)

According to the MMIF file, those three files should be found in their respective subdirectories in /data. The Flask server will look for these files in static/data/video, static/data/audio and static/data/text, amd those directories should point at the appropriate location:

  • If you run the visualizer in a Docker container, then the -v option in the docker-run command is used to mount the local data directory /Users/shared/archive to the /data directory on the container and the static/data symlink already points to that.
  • If you run the visualizer on your local machine without using a container, then you have a couple of options (where you may need to remove the current link first):
    • Make sure that the static/data symlink points at the local data directory $ ln -s /Users/Shared/archive/ static/data
    • Copy the contents of /Users/Shared/archive into static/data.
    • You could choose to copy the data to any spot in the static folder but then you would have to edit the MMIF input file.

Note on source/copyright: these documents are sourced from the National Screening Room collection in the Library of Congress Online Catalog. The collection provides the following copyright information:

The Library of Congress is not aware of any U.S. copyright or other restrictions in the vast majority of motion pictures in these collections. Absent any such restrictions, these materials are free to use and reuse.


About

A web site to visualize MultiMedia Interchange Format json

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages 1

Contributors 6