Representation of video annotations #9

GermanHydrogen · 2025-02-11T14:05:52Z

If I understand correctly, BIIGLE represents (time-dependent) video annotations using the image-annotations:frame attribute, e.g., like this:

{
  "image-set-header": {
    "image-annotation-creators": [
      {
        "id": "https://orcid.org/0009-0001-1994-0399",
        "name": "Christopher Krämmer"
      }
    ],
    "image-annotation-labels": [
      {
        "id": "urn:lsid:marinespecies.org:taxname:281088",
        "name": "Hoplosebastes armatus Schmidt,1929"
      }
    ]
  },
  "image-set-items": {
    "video.mp4": [
      {
        "image-datetime": "2024-07-17T00:32:18.000000",
        "image-annotations": [
          {
            "shape": "rectangle",
            "coordinates": [
              [
                2274.28,
                1766.4,
                2299.44,
                1767.76,
                2297.96,
                1795.1,
                2272.8,
                1793.74
              ],
              [
                2274.28,
                1766.4,
                2299.44,
                1767.76,
                2297.96,
                1795.1,
                2272.8,
                1793.74
              ]
            ],
            "labels": [
              {
                "label": "urn:lsid:marinespecies.org:taxname:281088",
                "annotator": "https://orcid.org/0009-0001-1994-0399",
                "created-at": "2024-11-22T06:03:56.655940+01:00"
              }
            ],
            "frames": [
              1,
              10
            ]
          }
        ]
      }
    ]
  }
}

This is in contrast to the representation of time dependent information related to a video, which uses multiple list entries with different values in image-datetime for one video file. Video annotations could also be expressed this way:

{
  "image-set-header": {
    "image-annotation-creators": [
      {
        "id": "https://orcid.org/0009-0001-1994-0399",
        "name": "Christopher Krämmer"
      }
    ],
    "image-annotation-labels": [
      {
        "id": "urn:lsid:marinespecies.org:taxname:281088",
        "name": "Hoplosebastes armatus Schmidt,1929"
      }
    ]
  },
  "image-set-items": {
    "video.mp4": [
      {
        "image-datetime": "2024-07-17T00:32:18.000000"
      },
      {
        "image-datetime": "2024-07-17T00:32:19.000000",
        "image-annotations": [
          {
            "shape": "rectangle",
            "coordinates": [
              [
                2274.28,
                1766.4,
                2299.44,
                1767.76,
                2297.96,
                1795.1,
                2272.8,
                1793.74
              ]
            ],
            "labels": [
              {
                "label": "urn:lsid:marinespecies.org:taxname:281088",
                "annotator": "https://orcid.org/0009-0001-1994-0399",
                "created-at": "2024-11-22T06:03:56.655940+01:00"
              }
            ]
          }
        ]
      },
      {
        "image-datetime": "2024-07-17T00:32:28.000000",
        "image-annotations": [
          {
            "shape": "rectangle",
            "coordinates": [
              [
                2274.28,
                1766.4,
                2299.44,
                1767.76,
                2297.96,
                1795.1,
                2272.8,
                1793.74
              ]
            ],
            "labels": [
              {
                "label": "urn:lsid:marinespecies.org:taxname:281088",
                "annotator": "https://orcid.org/0009-0001-1994-0399",
                "created-at": "2024-11-22T06:03:56.655940+01:00"
              }
            ]
          }
        ]
      }
    ]
  }
}

From a perspective of data consistency, I would prefer if iFDOs would only offer one way to represent time dependent data.
Is my assumption correct, that both representations encode the same information for video annotations? Could BIIGLE use the second representation?

This is related to https://codebase.helmholtz.cloud/datahub/marehub/ag-videosimages/fair-marine-images/-/issues/69

The text was updated successfully, but these errors were encountered:

GermanHydrogen · 2025-02-11T14:06:27Z

@ckraemme

mzur · 2025-02-11T14:14:21Z

I see what you mean now. The two representations do not encode the same information. In the second variant, you don't know if there are two (single-frame) rectangle annotations or one (moving) annotation.

In addition, BIIGLE could only provide the second variant, if it knows the timestamp of the video. I'm not sure if this is a mandatory field in an iFDO but you can also annotate videos without this metadata in BIIGLE.

GermanHydrogen · 2025-02-11T14:44:48Z

So you use the last entry in image-annotations:frame to determine the last frame of an annotation?

mzur · 2025-02-12T12:11:05Z

I don't see how this relates to the question above but yes, the frames determine when the annotation is visible in the video.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representation of video annotations #9

Representation of video annotations #9

GermanHydrogen commented Feb 11, 2025

GermanHydrogen commented Feb 11, 2025

mzur commented Feb 11, 2025

GermanHydrogen commented Feb 11, 2025

mzur commented Feb 12, 2025

Representation of video annotations #9

Representation of video annotations #9

Comments

GermanHydrogen commented Feb 11, 2025

GermanHydrogen commented Feb 11, 2025

mzur commented Feb 11, 2025

GermanHydrogen commented Feb 11, 2025

mzur commented Feb 12, 2025