Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BEP for audio/video capture of behaving subjects #1771

Open
bendichter opened this issue Apr 11, 2024 · 68 comments
Open

BEP for audio/video capture of behaving subjects #1771

bendichter opened this issue Apr 11, 2024 · 68 comments
Labels
BEP opinions wanted Please read and offer your opinion on this matter raw

Comments

@bendichter
Copy link
Contributor

I would like to create a BEP to store the audio and/or video recordings of behaving subjects.

While this would obviously be problematic for sharing human data, it would be useful to internal human data and for internal and shared data of non-human subjects.

Following the structure of the Task Events we will define types of files that can be placed in various data_type directories.

sub-<label>/[ses-<label>]
    <data_type>/
        <matches>_behcapture.mp3|.wav|.mp4|.mkv|.avi
        <matches>_behcapture.json

This schema will follow the standard principles of BIDS, listed here for clarity:

  • If no relevant <data_type> exists, use beh/.
  • Video or audio files that are continuous recordings split into files will use the _split- entity.
  • Video or audio files that are recorded simultaneously but from different angles or at different locations would use the _recording- entity to differentiate. We will need to modify the definition of this term to generalize it a bit to accommodate this usage. This entity would also be used to differentiate if a video and audio were recorded simultaneously but from different devices. Not that simply using the file extension to differentiate would not work because it would not be clear which file the .json maps to.
  • The start time of each audio or video recording should be noted in the scans.tsv file.

The JSON would define "streams" which would define each stream in the file.

The *_beh.json would looks like this:

{
  "device": "Field Recorder X200",
  "streams": [
    {
      "type": "audio",
      "sampling_rate": 44100.0,
      "description": "High-quality stereo audio stream."
    },
    {
      "type": "video",
      "sampling_rate": 30.0,
      "description": "Standard 1080p video stream."
    }
  ]
}

To be specific, it would follow this JSON Schema structure:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "device": {
      "type": "string"
    },
    "streams": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "type": {
            "type": "string",
            "enum": ["audio", "video"]
          },
          "sampling_rate": {
            "type": "number",
            "format": "float"
          },
          "description": {
            "type": "string"
          }
        },
        "required": ["type", "sampling_rate"],
        "additionalProperties": false
      }
    }
  },
  "required": ["device", "streams"],
  "additionalProperties": false
}

This BEP would be specifically for audio and/or video, and would not include related data like eye tracking, point tracking, pose estimation, or behavioral segmentation. All of these would be considered derived and are reserved for another BEP.

@bendichter
Copy link
Contributor Author

cc @yarikoptic who is providing guidance on this concept.

@bendichter
Copy link
Contributor Author

An alternative idea is to name the files "_video.mp4|avi|mkv|..." and "_audio.mp3|wav|...". The advantage of this is it may be more clear what these files are. The disadvantages are that this does not make it clear that it's a recording of the subject as opposed to a stimulus, and that it's not clear what you should do if you have an audio/video recording.

@bendichter
Copy link
Contributor Author

bendichter commented Apr 11, 2024

Another alternative idea is to have the files called "_beh.mp3|.wav|.mp4|.mkv|.avi|...", though this conflicts with the current beh modality. If there is a beh.tsv file in the beh/ directory, then it will have an accompanying beh.json file, which would conflict with the json file that corresponds to the data (e.g. beh.mp3) file

@Remi-Gau
Copy link
Collaborator

@Remi-Gau
Copy link
Collaborator

This BEP would be specifically for audio and/or video, and would not include related data like eye tracking, point tracking, pose estimation, or behavioral segmentation. All of these would be considered derived and are reserved for another BEP.

Some of this may already be covered by the BIDS support for motion data and look at the the eyetracking BEP (PR and HTML)

@Remi-Gau
Copy link
Collaborator

tagging @gdevenyi who I think mentioned wanting to work on something like this last time I saw him.

@VisLab
Copy link
Member

VisLab commented Apr 12, 2024

The ideas for allowing annotations of movies and audios as expressed in issue #153 could be expanded to allow annotation of participant video/audio but in the imaging directories themselves with appropriate file structure to distinguish.
@neuromechanist @Remi-Gau @yarikoptic @adelavega @dungscout96 @dorahermes @arnodelorme
.

@Remi-Gau
Copy link
Collaborator

I like how those different initiatives are synching up.

Wouldn't those annotations of videos using HED when experimenters "code" their video be more appropriate as a derivative though.

@VisLab
Copy link
Member

VisLab commented Apr 12, 2024

Wouldn't those annotations of videos using HED when experimenters "code" their video be more appropriate as a derivative though.

Not necessarily.... in one group I worked with on experiments on stuttering -- the speech pathologist's annotations were definitely considered part of the original data. Most markers that you see in typical event files didn't come from the imaging equipment but are extracted from the control software or external devices. The eye trackers have algorithms to mark saccades and blinks and these are written as original data.

In my mind, if the annotations pertain to data that has been "calculated" from the original experimental data it should go into the derivatives folder. Annotations pertaining to data acquired during the experiment itself should probably go in the main folder.

@Remi-Gau
Copy link
Collaborator

I see I was more thinking of cases where videos of an animal behavior have to be annotated to code when certain behavior happened. Given this is not automated and can happen long time after data acquisition I would have seen this as more derivatives. But your examples show that the answer like in many cases will be "it depends".

@gdevenyi
Copy link

We have potential animal applications in both domains:

  1. Video with annotation timestreams coming from automated touchscreen-based animal behaviour systems.
  2. Videos of animals in classic "open field test" and similar setups where poostprocessing analysis collects a variety of annotations of the video determined by behaviour.

also I guess a:
3. Manual human annotation of videos of animals in naturalistic environments, like maternal care events

@Remi-Gau Remi-Gau added the opinions wanted Please read and offer your opinion on this matter label Apr 16, 2024
@Remi-Gau Remi-Gau changed the title RFC: BEP for audio/video capture of behaving subjects BEP for audio/video capture of behaving subjects Apr 16, 2024
@Remi-Gau Remi-Gau added the BEP label Apr 16, 2024
@DimitriPapadopoulos
Copy link
Collaborator

Would non-contiguous recordings (using the same setup) end up in the same or distinct files?

As an example, there could be cases where video recording has been stopped while taking care of a crying baby and resumed later on. Should BIDS try to enforce anything here, or leave it to end users (and data providers)?

What about other types of "time-series" data? Not sure about MEG, for EEG I know the EDF+ format allows discontinuous recordings:

EDF+ allows storage of several NON-CONTIGUOUS recordings into one file. This is the only incompatibility with EDF. All other features are EDF compatible. In fact, old EDF viewers still work and display EDF+ recordings as if they were continuous. Therefore, we recommend EDF+ files of EEG or PSG studies to be continuous if there are no good reasons for the opposite.

@bendichter
Copy link
Contributor Author

@DimitriPapadopoulos I believe this would be different runs. You would specify the start time of each run in the scans file

@yarikoptic
Copy link
Collaborator

I think there might be multiple scenarios (entities) how it could be handled:

  • runs - if e.g. this corresponds also to separate runs of neural data if any acquired along, so primarily as "this is how we intended this all to be".
  • But I wonder if we should look into adopting/extending (currently they are too narrowly focused) any of other entities meaning of which relate somehow to have "pieces of" (using term which is not yet an entity): split, part, chunk.

@neuromechanist
Copy link
Member

We have potential animal applications in both domains

From the annotation perspective in #153, an annot- entity enables multiple annotations per _media file. It might be useful here as well.

But I wonder if we should look into adopting/extending (currently they are too narrowly focused) any of the other entities meaning of which relate somehow to have "pieces of" (using term which is not yet an entity): split, part, chunk.

Any of them seems great, I currently suggested part- as an entity to use. But I can see any of the three work.

Video or audio files recorded simultaneously but from different angles or at different locations would use the _recording- entity to differentiate.

This is similar to having a stimulus with multiple tracks (left or right video streams, multiple audio channels, or separate video and audio), but they are not recording- per se. So, we might be able to look for a common entity that covers both potentially. We have two suggestions in #153 for now, (1) stream- and (2) track-. Would be happy to have any additional suggestions.

@neuromechanist
Copy link
Member

Also, as @bendichter mentioned, this proposal will very soon find its audience in human neuroscience, especially with DeepLabCut adding subject masking capabilities and newer modalities such as LiDAR and wifi motion capture comes into play.

It might be useful to have Motion-BIDS maintainers' (@sjeung and @JuliusWelzel) opinions as well.

@bendichter
Copy link
Contributor Author

How do we feel about this naming convention?

sub-<label>/[ses-<label>]
    <data_type>/
        <matches>_behcapture.mp3|.wav|.mp4|.mkv|.avi
        <matches>_behcapture.json

I'm not 100% on it myself but I can't think of anything better. Other options:

  • "_video.mp4|avi|mkv|..." and "_audio.mp3|wav|...".
  • "_behvideo.mp4|avi|mkv|..." and "_behaudio.mp3|wav|...".
  • "_behmedia.mp4|avi|mkv|mp3|wav|..."

Is there any precedence from other standards we could use here?

@gdevenyi
Copy link

Is there any precedence from other standards we could use here?

Technically mkv is container format, it could have different kinds of video/audio streams.

Should we specify non-patent-encumbered video compression formats?

@dorahermes
Copy link
Member

dorahermes commented Aug 6, 2024

@bendichter would be good to have your input on the proposed entities here.

A specific point of discussion is how open the description of the proposed -annot entity would be: for stimuli only or also for other types of annotations as discussed above?

@bendichter
Copy link
Contributor Author

@dorahermes I like the idea of a general text annotations file that annotates a media file, and I think that could certainly be relevant downstream of these behavioral capture files.

I think the needs of stimuli storage and behavioral capture storage are different. With stimuli, you often have a single file that you play many times across different subjects, sessions, and trials, so it makes sense to have a root folder for these where they can be referenced repeatedly. For behavioral captures, every capture is unique, so it would make more sense to store these alongside other types of recordings. So I like what is going on with stimuli, but I don't want that that to engulf these ideas about how to represent behavioral capture.

I also am trying to keep this to an MVP, so I'd like to push off discussion of annotations, though I will say I think the general approach you link to will probably work for behcapture as well with minimal adjustments.

@bendichter
Copy link
Contributor Author

Should we specify non-patent-encumbered video compression formats?

The most likely culprit here would be H264, which is used in mpeg files, however it seems that would be a non-issue since this would be covered under the "Free Internet Broadcasting" consideration (source)

@yarikoptic
Copy link
Collaborator

  • "_behvideo.mp4|avi|mkv|..." and "_behaudio.mp3|wav|...".

FWIW, I also think that we should have "audio", "video" in suffix (ref elsewhere) but do not think we should want to collapse an "intent" (beh) into it, moreover since we do have datatype beh and even modality so in principle could be depicted as _mod-beh but AFAIK so far we never did such way to associate (e.g. for _events.tsv).

@yarikoptic
Copy link
Collaborator

yarikoptic commented Oct 2, 2024

I think there is a good amount of overlap (datatypes, extensions) with "stimuli" BEP044. @bendichter when you get a chance, have a look at that BEP google doc.

@niksirbi
Copy link

I'm not so sure that it would. I know I said that earlier in the thread but I think I was confused. The filename would be something like: sub-001/ses-002/sub-001_ses-002_recording-right_audiocapture.wav

That raises a different question (apologies in advance for derailing this conversation on suffixes, which is closing in on some consensus):

How would we store video/audio files capturing multiple subjects at once? It's quite common to acquire videos with multiple subjects. It will be tricky to do given BIDS' subject-first approach. But perhaps this is out-of-scope, in which case ignore my question.

@satra
Copy link
Collaborator

satra commented Jan 17, 2025

some of those derivatives files are an artifact of the past. those are no longer really there.

in terms of the broader space for audio-video capture, in the BBQS consortium where some of this is highly relevant, there are projects on dyadic interaction, navigation in the wild, in specific settings (e.g. hospital rooms and houses), multiple cameras/devices, etc.,. so capturing context will be as important as storing the streams.

@bendichter
Copy link
Contributor Author

bendichter commented Jan 17, 2025

@niksirbi, this is tricky. BIDS' file hierarchy assumes a subject -> session structure where a session has a single subject. My first thought would be to create two different session folders, one for each subject, and use a softlink to link them together. Fortunately, in DANDI (and maybe OpenNeuro?), files that are exactly identical are de-duplicated, so it would be not problem to have multiple sessions with the same video capture file, even if that file takes different names.

@niksirbi
Copy link

My first thought would be to create two different session folders, one for each subject, and use a softlink to link them together.

Something like that seems like a good compromise.

@yarikoptic
Copy link
Collaborator

first thought would be to create two different session folders, one for each subject, and use a softlink to link them together.

softlink/symlink is filesystem specific solution, thus not encouraged/used anywhere in BIDS.

BIDS' file hierarchy assumes a subject -> session structure where a session has a single subject.

  • with a solution for Make it possible to specify folders layout to be other than sub-{label}/[ses-{label}/] bids-2-devel#54 (BIDS 2̶.̶0̶1.0: flex BIDS layout (bids-2-devel/issues/54) #1809) we should be able to get away from such "hardcoded" assumption
  • BEP044 (stimuli) referenced above already somewhat provides an example where "BIDS naming" is used under stimuli/ without sub- prefix; For multi-subject recordings, it might have made sense, if we had that extra entity recording or alike, to have recordings/ folder to collect them all and then similarly to stimuli somehow point to those recordings from within "per subject" folders. (just trying to come up with some generic principle)
  • workaround already could be in coming up with "metasubjects" like sub-s1s2/ which would have behavioral recordings of multiple subjects (s1 and s2) and providing further details on which particular subjects in e.g. participants.tsv or alike. could even have ses- subfolders if multiple sessions are expected/collected per each groupping of subjects. It would not be 100% clearly formalized, hence "workaround".

In reality, there could be no difference between a video file presented as a stimulus vs one captured to track behaviour.

That is exactly why I also would prefer to stay away from using suffix for depicting intention/purpose for the file. So far we mostly avoided that in BIDS, as suffixes describe content not intention per se. E.g. someone could potentially use _T1w anatomicals to research "behavior" of participants during anatomical sequences.

Also I dislike "capture" since too generic -- all data is "captured". Even movie videos from Hollywood are "captured" by video cameras.

But I do confirm that we do have potential for a conflict ambiguity ATM! E.g.,how do we organize and name files for a session where subject was presented with a particular to that subject/session movie stimuli [*], and subject's behavior was captured on camera (audio video as well)

Hence -- we have two sub-X/ses-Y/beh/sub-X_ses-Y_..._audiovideo.mp4 files (let's assume it is purely behavioral experiment). I think that is where we are getting into one of the principles of BIDS on filename construction: come up with a minimal (usually a single entity) addition to filename which would tell two files apart. @bendichter suggests _recording- but IMHO similarly to "capture" it is too generic and does not "capture" clearly difference between such files (also we do not get into the domain of "inheritance principle" if we add an entity to only one of them). Given that we do have stimuli/ folder, and with BEP044 we would get stimulus entity (plural stimuli), which is pretty much aligns with "modality" folder which at least in case of original MRI domain describes "intent" (anatomy vs functional, behavioral, ...) we could "(ab)use" _mod- entity here [**]? (related: bids-standard/bids-2-devel#55). Then they would gain _mod-stim and _mod-beh (beh takes as the name of directory we have and stim as the short version of stimulus entity from BEP044; but we might want to think this better through if decide to go this route). WDYT?

[*] hence does not make sense for placing into top level stimuli/ folder
[**] original and the only use of mod- ATM is to contain original suffix in case of disambiguation of different _defacemasks...

@yarikoptic
Copy link
Collaborator

softlink/symlink is filesystem specific solution, thus not encouraged/used anywhere in BIDS.

I have said that but forgot about BIDS URIs, https://bids-specification.readthedocs.io/en/stable/common-principles.html#bids-uri , so those could potentially be used I guess (ATM we have those + ad-hoc pointers like stimuli_relative)

@neuromechanist
Copy link
Member

[*] hence does not make sense for placing into top-level stimuli/ folder

I believe neither this BEP nor BEP044 proposes a solution for this case (stimulus files per subject in the subject/session directories).

I agree that with some use cases that this BEP will accommodate, it is only natural to have an entity to determine the scope of the recording.

An example that comes to my mind is the STRUM task/dataset, in which two subjects collaborate in a first-person game environment with recordings from (among many datastreams) EEG, eye-tracking, and videos of the participants' faces (behavior), screens (stimulus), and eye-gaze (both behavior and stimulus). Sample videos are here (I removed the face camera video because they are not anonymized).

@bendichter
Copy link
Contributor Author

I think I may be a bit confused. My reading of BEP044 is that all stimuli go in a stimuli directory at the root of the dataset, even if they are subject- or session-specific stimuli. If we wanted to modify it, we could allow for a stimuli directory at the subject or session level. Then we wouldn't ever have a naming collision with these captured videos.

@bendichter
Copy link
Contributor Author

@neuromechanist

If I understand correctly, you are talking about differentiating between multiple simultaneous video recordings, right? I proposed this in the initial comment

Video or audio files that are recorded simultaneously but from different angles or at different locations would use the _recording- entity to differentiate

would that handle this or am I missing something about your example?

@neuromechanist
Copy link
Member

AFAIK, the stimulus directory is intended to store stimuli used across the dataset (that is presented to multiple subjects), hence the directory is at the root of the dataset. However, the current spec does not restrict it. For now at least, subject-specific stimuli have not been addressed (or discussed) in BEP044.

Video or audio files that are recorded simultaneously but from different angles or at different locations would use the _recording- entity to differentiate

Yes, but as in the example, and the case @yarikoptic raised, some videos may not have any behavior in them (being stimulus presentation). It might be beneficial to have a way to differentiate them, either with the _recording entity, or _mod. Probably having both also work, as _recording would indicate multi-camera or multi-angle support, while _mod or some other entity would indicate intent.

@yarikoptic
Copy link
Collaborator

FTR, some of the openneuro datasets with videos under per-subj folders, likely with beh recordings
*$> for ds in ds*; do find $ds/sub-* -iname *.avi -o -iname *.mp4 -o -iname *.mkv | head; done
find: ‘ds001107/sub-*’: No such file or directory
find: ‘ds003676/sub-*’: No such file or directory
ds004505/sub-06/video/sub-06_trial-01.mp4
ds004505/sub-06/video/sub-06_trial-02.mp4
ds004505/sub-06/video/sub-06_trial-03.mp4
ds004505/sub-06/video/sub-06_trial-04.mp4
ds004505/sub-06/video/sub-06_trial-05.mp4
ds004505/sub-06/video/sub-06_trial-06.mp4
ds004505/sub-06/video/sub-06_trial-07.mp4
ds004505/sub-06/video/sub-06_trial-08.mp4
ds004505/sub-06/video/sub-06_trial-09.mp4
ds004505/sub-06/video/sub-06_trial-10.mp4
ds004598/sub-01/ses-1/eeg/sub-01_ses-1_task-LinearTrack_video.avi
ds004598/sub-02/ses-1/eeg/sub-02_ses-1_task-LinearTrack_video.avi
ds004598/sub-02/ses-2/eeg/sub-02_ses-2_task-LinearTrack_video.avi
ds004598/sub-02/ses-3/eeg/sub-02_ses-3_task-LinearTrack_video.avi
ds004598/sub-03/ses-1/eeg/sub-03_ses-1_task-LinearTrack_video.avi
ds004598/sub-03/ses-2/eeg/sub-03_ses-2_task-LinearTrack_video.avi
ds004598/sub-04/ses-1/eeg/sub-04_ses-1_task-LinearTrack_video.avi
ds004598/sub-04/ses-2/eeg/sub-04_ses-2_task-LinearTrack_video.avi
ds004598/sub-05/ses-1/eeg/sub-05_ses-1_task-LinearTrack_video.avi
ds004598/sub-05/ses-2/eeg/sub-05_ses-2_task-LinearTrack_video.avi
find: ‘ds004643/sub-*’: No such file or directory
ds005127/sub-00002/ses-1/video/sub-00002_ses-1_task-sleep_run-20160531_2257.avi
ds005127/sub-00002/ses-1/video/sub-00002_ses-1_task-sleep_run-20160531_2330.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160712_2255.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160712_2350.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0013.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0142.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0222.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0310.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0338.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0459.avi
find: ‘ds005443/sub-*’: No such file or directory
find: ‘ds005590/sub-*’: No such file or directory

Example(s) in dandi (not yet BIDS):

@bendichter
Copy link
Contributor Author

bendichter commented Feb 10, 2025

I had a conversation just now with @yarikoptic that helped me understand his perspective on the (lack of) a concrete differentiation between video stimuli and video recordings/capture. The idea is that even when you display images it can be hard to fully reproduce that exact stimulus with all of the display settings, and you could resolve this with a screen recording. In this case you are in fact recording/capturing a stimulus, and the differentiation between the two breaks down. This also explains why Yarik wants to store stimuli in a session folder as opposed to a root stimuli folder. If the images are shown in a random order or timing, you would have a different video for every single session. The same would apply to auditory stimuli.

I guess the questions are: do we want this BEP to handle this use-case, or is this sufficiently different from the original motivation of capturing behavior? Perhaps handling that use-case would be better suited for the stimuli PR here?

@gdevenyi
Copy link

Perhaps handling that use-case would be better suited for the stimuli PR here?

I think that would firmly fall under "stimuli", not a measurement of behaving subjects.

@VisLab
Copy link
Member

VisLab commented Feb 10, 2025

I guess the questions are: do we want this BEP to handle this use-case, or is this sufficiently different from the original motivation of capturing behavior?

I vote not putting this in the stimuli BEP. I suspect that filming the stimulus as it is being presented is an edge use case. On the other hand, experiments using standardized, well-annotated stimulus corpus (examples: Forrest movie, the Present movie, the emotion database of images, the database of standardized language stimuli) appear extensively in the literature and in shared data.

@neuromechanist
Copy link
Member

I didn't think that discussing individual stimulus files present in the individual subject/session directories are particularly relevant to BEP044 #2022, but I guess we have to address it in BEP044 anyway.

The reason is #750 (which I was ignorant to), and also the changes made in the task_event description ~10 months ago (#1750), see the spec:

If the same continuous recording has been used for all subjects (for example in the case where they all watched the same movie), one file placed in the root directory (for example, /task-movie_stim.<tsv.gz|json>) MAY be used and will apply to all task-movie_. files.

(IMO, this description conflicts with BEP044, but that is another discussion probably for BEP044).

Then, the implications for this issue might be:

  1. There is already a _stim suffix, so adding mod-stim would likely confuse people, as there is already a stim suffix, and a stim entity (according to BEP044).
  2. BEP044 envisions a part- entity for a mutli-part or multi-stream stimulus; so, it might make sense to still use mod or part here for multi-view recordings, etc, but not for stimulus.
  3. It seems [ENH] allow for _stim.{mp[34],mkv,avi} to provide stimuli files for func data #750 would resolve our current issue here (that is, sharing stimulus captures would be under _stim.ext), although there is the cost of using a datatype based on the intention of the file, not the actual data type. BEP044 should probably clarify this use-case with some examples ([WIP] examples for BEP044, Stim-BIDS bids-examples#433).
  4. The guidelines for sharing stimulus would then become a. stimulus should be placed in the stimuli directory (with the file name starting with the stim- entity) if it is used across subjects. b. If the stimulus is specific to a subject/session/modality, the stimulus file should be put in the respective directory, with the stim suffix: sub-<label>[_sess-<label>][_task-<lablel>][_run-<label>]_stim.<ext>

@yarikoptic
Copy link
Collaborator

yarikoptic commented Feb 12, 2025

fresh related aspect worth exercising for this BEP (behavior subjects capture) in particular. Inspired by

demonstrated originally by @talmo from SLEAP, and observing thousands of video recordings, many of which are splits based on capture hardware restrictions and desires to avoid broken recordings/data loss: e.g. day long recording with separate file for each 10 minutes so if hardware crashes/reboots, only up to 10 minutes video is lost. That results in 6*24 = 144 videos per subject/day. Now think of a week or a month of recording and multiple subjects. Scales up quickly. In DANDI we already place supplemental (to .nwb files) videos into subfolder , so e.g. for sub-X/sub-X_ses-Y_..._behavior+image+ogen.nwb having sub-X/sub-X_ses-Y_..._behavior+image+ogen/ with video/image files under it, and have some dandisets with hundreds of videos (https://dandiarchive.org/dandiset/000559/0.240502.0456/files?location=). I wonder if here we should seek for similar "feature" and instrument naming under those folders to also follow some convention. I.e. having sub-X/ses-Y/beh/sub-X_ses-Y_..._audiovideo/ folder with files e.g. part-{index}.avi or some acq-{datestamp}.avi or alike?

@bendichter
Copy link
Contributor Author

What is the specific worry about having so many videos? This seems to be a clear application of the split entity: https://bids-specification.readthedocs.io/en/stable/appendices/entities.html#split

@talmo
Copy link

talmo commented Feb 13, 2025

Hey there,

Not sure if this matters for the split entity usage (or if I'm getting the context right), but MP4s and other video containers tend to be pretty picky about integrity (e.g., some have lookup tables stored in the tail bytes, others have different degrees of tolerance for missing packets), so it's not very easy to merge/split them at the byte level without re-encoding.

And to be clear Ben: we're talking about billions of frames across tens of thousands of files. Doing any kind of manipulation (honestly even moving them) is painful at that scale.

Disregard if not relevant to this discussion though :)

@bendichter
Copy link
Contributor Author

@talmo

The split entity is designed exactly for this scenario - I'm not suggesting we merge or manipulate the video files at all, just organize the naming of your existing files to show that they're segments of a single continuous recording from one camera.

For example:

sub-X_ses-Y_task-Z_run-01_split-001_video.mp4
sub-X_ses-Y_task-Z_run-01_split-002_video.mp4
...
sub-X_ses-Y_task-Z_run-01_split-144_video.mp4

While renaming will be needed for BIDS compliance, this naming scheme makes the relationship between files clear without requiring any manipulation of the video data itself.

@niksirbi
Copy link

I also know of other projects that store long 24h recording in 1h chunks, for the reasons Talmo mentioned.
I agree with Ben that split would be the most appropriate entity to use here. As far as I can tell it's currently only used for MEG data (correct me if I'm wrong), but conceptually it's exactly what we need.

@yarikoptic
Copy link
Collaborator

@bendichter I would still like to excercise more the idea of having some dedicated sub-folder e.g. per each run or alike.

  • in principle, if we ever finalize BIDS 2̶.̶0̶1.0: flex BIDS layout (bids-2-devel/issues/54) #1809 - it could provide the "flexibility desired", but
  • meanwhile, especially with DANDI/EMBER hat on, we should avoid creating an "asset" record for each tiny separate recording.
    • I think ideally we should handle them as ".zarr/" folders -- a single "data asset", potentially with multiple files/built-in hierarchy structure under.
  • And then if we are to use split entity, logically we should allow for splits.{tsv,json} which provide summary extract for those files in the folder. Analogous to stimuli.{tsv.json} of @bep044 etc.

@yarikoptic
Copy link
Collaborator

I also know of other projects that store long 24h recording in 1h chunks, for the reasons Talmo mentioned.

@niksirbi do you have some publicly available datasets with such data?

@niksirbi
Copy link

I also know of other projects that store long 24h recording in 1h chunks, for the reasons Talmo mentioned.

@niksirbi do you have some publicly available datasets with such data?

The ones I had in mind are not public, at least not yet...

@bendichter
Copy link
Contributor Author

meanwhile, especially with DANDI/EMBER hat on, we should avoid creating an "asset" record for each tiny separate recording. I think ideally we should handle them as ".zarr/" folders -- a single "data asset", potentially with multiple files/built-in hierarchy structure under.

Can you explain why you want to avoid having many assets? What problems would arise? How does a zarr structure address these problems? My impression is that cloud storage is perfectly happy to have many small files, and zarr actually wouldn't change that anyway. Are there reasons why that would become an issue on the database or front-end layer? Handling compressed video data in Zarr would be really tricky. It is sounds like it would require quite a bit of either bending of the Zarr standard or conversion of the raw data to comply with Zarr.

And then if we are to use split entity, logically we should allow for splits.{tsv,json} which provide summary extract for those files in the folder. Analogous to stimuli.{tsv.json} of @bep044 etc.

OK, this is a concern. We certainly do not need 144 separate json files describing the same camera and recording setup for 144 splits of a video. But is this really the way things are currently done? For fif data, do you have a json for each fif file, or is there one that applies to all splits? Would it be possible for us to designate that a single annotation file can apply to all splits?

@yarikoptic
Copy link
Collaborator

Sorry -- too many ?s at once ;)
Just to clarify -- I did not propose to stick them into ".zarr/" format but rather handle them as we handle ".zarr/" folders in DANDI -- we have a single asset (DB record) for the entire zarr, which potentially holds 100s of thousands of files. So, here I would see a single asset for sub-X/ses-Y/beh/sub-X_ses-Y_..._audiovideo/ folder with individual video files underneath. No "zarr format imposed", but we might need to come up with "metadata extractor/aggregator" across those multitudes of files so that metadata record makes most sense. I would be happy to discuss/brain storm over zoom or alike.

@bendichter
Copy link
Contributor Author

@yarikoptic thanks for the clarification. I'm still not clear on what problem is being addressed here. What's wrong why having lots of assets? Also, I think a common use case might be for users to want to download specific video files and not the entire block. How could this be accomplished under your single asset solution without some sort of zarr-like indexing?

@yarikoptic
Copy link
Collaborator

Similarly to how we are working it out for zarr components which are easily navigatable (e.g on webdav and github, s3 clients) ; e.g. in dandi/dandi-cli#1572 . That is just a tech aspect.

At the level of BIDS, also dumping 1000s of audio/video recordings per _run for e.g. multiple runs make that folder no longer "perceivable" by a human. Hence my suggestion to have them groupped into a folder.

@bendichter
Copy link
Contributor Author

I still don't really see the advantage of grouping them into a single asset. I think if I were a user if a dataset has a lot of mp4 videos then that's what I'd like to see. It will make the structure of the data obvious as well as how I can access, stream, and/or download an individual video file. With DANDI/EMBER, we could easily implement pagination.

I suppose it would not be ideal to have some of the non-video data streams on the last page so we could put the videos in a dedicated folder. We do have a beh/ directory in the spec, but it currently says it is only to be used in the case where there is no func/ directory. I'm not really sure why that would be. It seems like it would be a good place for these videos. Alternatively, we could create a sub-folder within a session, though is there precedent for this in the current schema?

@talmo
Copy link

talmo commented Feb 14, 2025

Subdirectory sounds like a good call 🤙

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BEP opinions wanted Please read and offer your opinion on this matter raw
Projects
None yet
Development

No branches or pull requests