automatic document-level `Annotation` recorder #226

keighrim · 2023-06-28T00:09:18Z

New Feature Summary

We have added a place to store document-level information by re-purposing Annotation at_type (clamsproject/mmif#134) but none of the apps have been taking advantage of the feature so far, AFIK. That said, I'd like to propose a way to automate some of the recording process implemented as a part of the SDK.

Implementation plan;

on `Document` side

Document class keeps a secondary properties2 attributes for temporary storage for document-level information.
Document.add_property() should add information to the properties2 to keep the original properties from the source document object from the top-level documents list intact.

automatic generation of `Annotation` objects

The annotation object should be generated from the properties2 attribute.
We can use the "sanitize" step (adding sanitized serialize #212) in the serialize() to generate those objects.
When a property from properties2 dict is "converted" to an Annotation annotation, it should be popped out to prevent duplicate recording.
- (updated) ideally in a CLAMS app, serialization only happens when all the data processing has been done and all Annotations were added to the corresponding views. In such case, there's no "duplicate" recording/serialization hence this point is irrelevant. On the other hand, during development and testing, devs can use serialization for debugging purposes, so in that case, popping properties out from the temporary memory is not a good idea. So I decide not to pop out them.
So, which view to throw all these Annotation objects? I'm thinking whichever comes first during the serialization.
- (updated) turned out, using the last view would be much easier implementation-wise and doesn't seem to hurt any functionality.

automatic loading of `Annotation` objects

Now we are talking about a third properties3 attributes to loading document properties from existing Annotation objects during de-serialization.
This should be read-only and shouldn't be serialized back into the Document instance when serializing.
As a result, we will have three properties dictionary, hence Document should provide get_property(prop_name) to look for all three dicts and return the found value.

Example

Suppose a slate detection app, having with code:

class SlateApp(ClamsApp):
   ...  # set attributes, __init__, etc

   def _annotate(self, mmif):
        for vd in mmif.get_documents_by_type(VideoDocument):
            v = mmif.new_view()
            self.sign_view(v)
            video = mmif_utils_videodocument.read(vd)
            for slate in self.detect_slates(video):
                v.new_annotation(TimeFrame, **slate.__dict__)

    def detect_slates(self, video: cv2.VideoCapture):
       ...  # code continues

Then in the mmif_utils_videodocument:

def read(vd: mmif.serialize.annotation.Document):
    v = cv2.VideoCapture(vd.location_path())
    vd.add_property('fps', v.get(CAP_PROP_FPS))  # this will save `fps=29.97` in the `properties2` dictionary
    return v

Then, you'll write something like this

in_mmif = json.loads("""{
  "metadata": {
    "mmif": "http://mmif.clams.ai/1.0.0"
  },
  "documents": [
    {
      "@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
      "properties": {
        "mime": "video",
        "id": "d1",
        "location": "file:///dogs.mp4"
      }
    }
  ],
  "views": []""")
app = SlateApp()    # note that in _annotate() method, there weren't a line to manually add `Annotation` annotations to the view
out_mmif = app._annotate(in_mmif)
out_json = out_mmif.serialize(pretty=true)

to see

{
  "metadata": {
    "mmif": "http://mmif.clams.ai/1.0.0"
  },
  "documents": [
    {
      "@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
      "properties": {
        "mime": "video",
        "id": "d1",
        "location": "file:///dogs.mp4"
      }
    }
  ],
  "views": [
    {
      "id": "v1",
      "metadata": {
        "timestamp": "2020-05-27T12:23:45",
        "app": "http://apps.clams.ai/slate-detector/versionX",
        "contains": {
          "http://vocab.lappsgrid.org/Annotation": {
            "document": "d1"
          },
          "http://vocab.lappsgrid.org/TimeFrame": {
            "document": "d1",
            "timeUnit": "frames"
          }
        }
      },
      "annotations": [
        {
          "@type": "http://vocab.lappsgrid.org/TimeFrame",
          "properties": {
            "document": "d1",
            "start": 400,
            "end": 2400,
            "id": "tf1"
          }
        },
        {
          "@type": "http://vocab.lappsgrid.org/Annotation",    # ta-da! This one is generated from `properties2` by the MMIF serializer code
          "properties": {
            "document": "d1",
            "id": "a1",
            "fps": "29.97"
          }
        }
      ]
    }
  ]
}

And finally, if there's a downstream processing:

downstream_input_mmif = Mmif(out_json)  # the same json from the above
vd = downstream_input_mmif.get_document_by_id('d1')
print(vd.properties)
print(vd.properties2)
print(vd.properties3)
print(vd.get_property('fps'))

will show

{
  "mime": "video",
  "id": "d1",
  "location": "file:///dogs.mp4"
}   # original property staying intact

{  }   # nothing has added during current processing app

{
  "fps": "29.97"
}   # read from `Annotation` objects existed in the input MMIF (`out_json`),  completely volatile

"29.97"  # `get_property` should be smart enough to look for the key in all three dicts

Alternatives

No response

Additional context

I propose this feature since I realized we must have all video-related MMIFs have Annotation for the FPS of the video, but none of the current video apps implements that. Without FPS information, we cannot perform evaluation of video apps solely based on the MMIF files without having access to the source video files because then time unit conversion is impossible without opening up the video files.

It's true that devs still need to update the existing apps to use this new feature (once it's ready) to have FPS properly "recorded", but having this feature will make future video apps development much easier, with combination with #221.

@kelleyl @MrSqually @snewman-aa I know you guys have been working on some video processing apps, and want to hear your feedback on this proposal.

The text was updated successfully, but these errors were encountered:

marcverhagen · 2023-06-29T15:19:53Z

I think I am missing something on what exactly the "recording process" is. I was thinking of this "capital A" annotation process as something that proceeds in the same way as regular view creation, that is, the app adds a view with he document-level annotation, and no extra machinery is needed.

keighrim · 2023-06-29T16:18:58Z

You are correct regarding the current status of the Annotation situation (which is not being used at all in any existing apps I've seen). I'm just trying to provide some automated assistance to read and write Annotation objects for app developers.

The "recoding process" is the lines of code in the serialize() (or _serialize()) method of MMIF objects.

snewman-aa · 2023-07-03T17:03:43Z

I've definitely been frustrated with how inelegant my handling of video document properties has been. I've been extracting the location, fps, etc and then passing those values around "loose" or making them app instance attributes which wouldn't work for handling multiple video documents.

I think this will make it easier to make apps more flexible.

It feels like something we'd want to have in the documents property, but I get that those are read-only after their initialized so I see why we're not using that.

I am pretty sure that having a "capital" Annotation annotation type will bring quite a bit of communicative confusion in the future

This was my first thought reading through this. vd.add_property() and .get_property are definitely intuitive though so I think this will be good for readability too. I just think seeing an Annotation type annotation of document properties in the MMIF might be a little confusing. We would just want to make it clear in the MMIF documentation I guess.

keighrim · 2023-07-05T14:28:19Z

Adding Annotation objects to the last view can be problematic, for example when a NLP app generates many views for each previously existing TextDocument from OCR app, and adding doc-level Annotation for each document , all of them will be added to the last view that holds other annotations (NLP results) only about the last text documents.

* see #226 (comment)

keighrim · 2023-07-11T00:09:40Z

4600e65 solves the problem raise in the above comment.

* `view.metadata.contains.some_at_type` dictionary * the basic idea is the same as proposed in #226 * mmif.utils.vdhelper.get_annotation_property is now deprecated in favor of Annotation.get_property

keighrim added the ✨N New feature or request label Jun 28, 2023

clams-bot added this to infra Jun 28, 2023

github-project-automation bot moved this to Todo in infra Jun 28, 2023

clams-bot assigned keighrim Jul 3, 2023

keighrim mentioned this issue Jul 4, 2023

automatic generation of Annotation annotation #227

Merged

keighrim added a commit that referenced this issue Jul 11, 2023

automatic Annotation generation now finds a more proper view to add

4600e65

* see #226 (comment)

keighrim closed this as completed in #227 Jul 11, 2023

github-project-automation bot moved this from Todo to Done in infra Jul 11, 2023

clams-bot unassigned keighrim Jul 11, 2023

keighrim mentioned this issue Jul 11, 2023

releasing 1.0.2 #228

Merged

keighrim mentioned this issue May 1, 2024

One view or two views? clamsproject/app-swt-detection#96

Closed

keighrim mentioned this issue Jul 3, 2024

List of issues with the current specifications clamsproject/mmif#230

Open

2 tasks

keighrim mentioned this issue Jul 12, 2024

properties of "derived" TextDocument must be recorded within the document object, not a separate capital Annotation object #290

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatic document-level `Annotation` recorder #226

automatic document-level `Annotation` recorder #226

keighrim commented Jun 28, 2023 •

edited

Loading

marcverhagen commented Jun 29, 2023

keighrim commented Jun 29, 2023

snewman-aa commented Jul 3, 2023

keighrim commented Jul 5, 2023

keighrim commented Jul 11, 2023

automatic document-level Annotation recorder #226

automatic document-level Annotation recorder #226

Comments

keighrim commented Jun 28, 2023 • edited Loading

New Feature Summary

Implementation plan;

on Document side

automatic generation of Annotation objects

automatic loading of Annotation objects

Example

Related

Alternatives

Additional context

marcverhagen commented Jun 29, 2023

keighrim commented Jun 29, 2023

snewman-aa commented Jul 3, 2023

keighrim commented Jul 5, 2023

keighrim commented Jul 11, 2023

automatic document-level `Annotation` recorder #226

automatic document-level `Annotation` recorder #226

keighrim commented Jun 28, 2023 •

edited

Loading

on `Document` side

automatic generation of `Annotation` objects

automatic loading of `Annotation` objects