-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inheritance principle: clarify the procedure of which files would be considered #102
Comments
and only now realized that we do not "officially/clearly" have a way to define the most common |
Thinking about it even more, I still like my algorithmic approach (with a single caveat outlined in the bottom of the original post, still thinking about it) to the definition and would like to propose to remove that restriction
In above example, all 3 specific types of .json files (or at least two, if we exclude |
Continuing talking to myself, here is a list of non $> ls ds00*/*json | grep -v -e dataset_description -e participants -e task- | xargs -n 1 basename | sort | uniq -c
1 acq-epi_T1w.json
1 acq-flipangle05_run-01_MEFLASH.json
1 acq-flipangle30_run-01_MEFLASH.json
1 acq-moldOFF_T1w.json
1 acq-moldON_T1w.json
1 acq-mprage_T1w.json
1 bold.json
1 dir-0_epi.json
1 dir-1_epi.json
1 dir-AP_epi.json
1 dir-PA_epi.json
3 dwi.json
1 inplaneT2.json
3 phasediff.json
1 run-1_echo-1_FLASH.json
1 run-1_echo-2_FLASH.json
1 run-1_echo-3_FLASH.json
1 run-1_echo-4_FLASH.json
1 run-1_echo-5_FLASH.json
1 run-1_echo-6_FLASH.json
1 run-1_echo-7_FLASH.json
1 run-2_echo-1_FLASH.json
1 run-2_echo-2_FLASH.json
1 run-2_echo-3_FLASH.json
1 run-2_echo-4_FLASH.json
1 run-2_echo-5_FLASH.json
1 run-2_echo-6_FLASH.json
1 run-2_echo-7_FLASH.json
24 T1w.json
2 T2w.json attn @tyarkoni (happen you didn't spot my whining here before) |
wow -- it is very popular in back references, but nobody voiced their opinions. @tsalo @sappelhoff @Remi-Gau @effigies WDYT? |
This is the can of worm issue. We know it's there and that we are going to have to open it at some points but nobody likes to eat worm and not just because some of us are vegetarians. |
TBH I'm not sure I understand the issue. It's just that we should be able to have multiple applicable JSON files at the same level? I remember when we made that rule, and it was because the precedence within a level is ambiguous, so overrides could be implementation- or even RNG-dependent. Given that there is some appetite for removing inheritance altogether in BIDS 2.0, I can't imagine the people who feel that way being pleased with it getting even more byzantine. |
HED (Hierarchical Event Descriptors) relies heavily on inheritance. Specifically the Having a single HED uses the rule that HED really needs to have inheritance, but we only need one applicable sidecar at a given level. @sappelhoff @dungscout96 @smakeig |
@yarikoptic if you have time, would you mind updating the top message so we can know what aspects of this issues have NOT yet been addressed or clarified sufficiently by #946 |
I will @Remi-Gau ... I guess feel welcome to close if I don't get back to it let's say by Sep 1 (hopefully before ;)). |
I would just add:
|
I would like to argue STRONGLY against getting rid of the inheritance
principle for usability. It allows users to include the metadata common
across recordings in a single file, while allowing specifics to occur at
the recording level. You may not need inheritance for JSON files that are
strongly tied to each image file, but there is important metadata that does
not fall into that category.
For example, a very common situation is for all recordings in a dataset to
share the same kinds of events except for a single type of unique setup
event that occurs at the beginning of the file.
Inheritance allows a single JSON sidecar placed at the top level to include
most of the metadata about events. This makes it easier for the researchers
to maintain and update their metadata.
It also immediately tells the user of the data that all of the files in
this dataset share this metadata.
Without inheritance, any small variation in recording setup would require
users to put copies of that event sidecar at the lowest level for every
recording in the dataset. Now if a change needs to be made, you don't know
what is shared in common and what isn't. It really is a mess.
The same comments allow not only to events, but also to channel and other
metadata. Tools can be easily written to summarize information if needed.
…On Tue, Aug 2, 2022 at 3:39 PM Horea Christian ***@***.***> wrote:
I would just add:
- If we are to keep inheritance, encoding it in the schema somehow
might be a good idea, as currently any validator implementation would need
to hard-code both *when* and *how* it applies.
- Perhaps it is worth considering whether we could drop the
inheritance principle in the future. The amount of text duplication it
prevents is trivial in terms of size on disk, and it makes datasets both
more monolithic and less transparent --- particularly for manual inspection.
—
Reply to this email directly, view it on GitHub
<#102 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJCJOUM5N2YAB3YM2OJ2E3VXGBPNANCNFSM4GIOB2VA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Dear @VisLab , I am 100% with you on this! Please (attn @TheChymera as well) follow up on the dedicated issue for that at bids-standard/bids-2-devel#36. Here we are not getting rid of Inheritance principle but working toward "fixing it"! ;) Although I see the desire expressed by @TheChymera as "encoding it in the schema somehow", I would argue that "there is no need" (if not impossible). Rationale: The entire src/schema can be loaded only following an "algorithm" which is coded up in tools/schemacode. schema/README provides description of the related data structures and some parts of the algorithm with that. Using that description in README.md and implementation in dandischematools as "canonical" other tools could provide implementations to load the schema. So we already have a duality of "schema" and "algorithm". Situation is likely to stay the same for Inheritance principle. And IMHO it is ok to just express it as an algorithm (original desire) or a clear set of rules (current formulation) which gets coded up in But as current IP was cleared up in #946 I think it did address most of my original questions -- I think we can let this issue RiP, and instead file more specific issues or PRs to make "inheritance principle better" (hopefully without introducing backward incompatible changes). FTR here is a summary of issues/questions I had in mind and which might have been "addressed" or not (separate issues):
|
@yarikoptic thanks for pointing out the main issue. @VisLab I agree that it is more convenient when taking a monolithic approach to the dataset, though I disagree that:
I would argue the reverse, that changes (infrequent) can easily be made across low-level files with ubiquitous tools such as In any case for the time being inheritance will not be dropped due to backwards-compatibility, and perhaps better discussed in the issue linked by @yarikoptic above. |
Much of the earlier text posted in this issue relates to my follow-up proposal to #946, being #1003. Sorry I didn't find this thread and link to it. IMO it should be both possible and recommended to define at higher levels of the filesystem hierarchy any metadata that are common to many files at lower levels of the hierarchy based on a set of common entities. This is a natural recapitulation of the hierarchical nature of the metadata. Duplicating such data, whether electively or through obligation due to removal of the inheritance principle, would mean that any application that depends on such information may need to check for consistency of those data across sidecar files, which contra-indicates such a design. I spent a fair bit of time devising the language of #1003 as a way of systematizing the definition of how sidecar files can be present both within and across filesystem levels, and their relative ordering is wholly unambiguous, without a need to access the contents of those files. I tried to do it in a way that was both comrehensible for users, and sufficiently precise to guide software implementations. This was a natural extension of the existing IP, and was adequate for my own use case (though see bids-standard/bids-bep016#50 for the latest on that topic). @yarikoptic may actually be looking for something even more general & powerful here. As shown in Example 2 in the current iteration of #1003, there are some circumstances in which having multiple sidecar files in a single filesystem hierarchy level is still impermissible: even though both possess a subset of the entities in the data file of interest, there is no objective way to decide the order in which they should be loaded, and therefore, if there is a metadata field that is present in both files, the content of that metadata field as applied to the data file of interest is ambiguous. |
;-) "great mind think alike" (using it 2nd time this Monday, a good sign!!) -- I have left https://github.com/bids-standard/bids-specification/pull/1003/files#r940247739 before reading this reply (due to the ordering of my mailbox ;-) ). Yes, it would add more complexity to validator and somewhat to implementation of inheritances principle, but IMHO worthwhile. |
It might be just me, but from current wording it is not 100% clear if it is only
_run
and_rec
are of special treatment to be "generalized over" or any other possible key/value pair (_acq
etc). I.e. for a filewould
task-<label>_<contrast_label>.json
at the top level be considered, disregarding any possibly present OPTIONAL key (like_acq
,_ce
, etc)?What about anatomy, e.g. for a file
would
<suffix>.json
be considered regardless any of the possibleacq
,ce
etc present in the target subject/session specific file?To make it totally clear, it might be worth to formulate explicitly a generic rule to generate "higher level" filename considered for inheritance for any given "leaf" file (might be already as implemented in pybids -- didn't check yet):
sub-<label>_
prefix while providing generalization across subject(s)_ses-<label>
while providing generalization across session(s)_key-<value>
if generalizing across various types of acquisition_
if left only with the suffix, such as_<suffix>.json
in anat examplewhat do you think? Or it is just me, and current wording is sufficient (I would be ok with that)
FWIW, here is a list (click to expand) of all interesting "corner cases" across openneuro datasets
A list (click to expand) of all interesting task- "corner cases" where it is not just task-_bold.json across openneuro datasets
edit 1: "The constraint"
another not entirely clear aspect to me, which is the not spelled out, is the requirement to have only a single "applicable" file at any given level, which is demonstrated in Example 1: Two JSON files at same level that are applicable for NIfTI file.:
"violating the constraint that no more than one file may be defined at a given level of the directory structure"
(wording around soon to be tuned up a bit in a https://github.com/bids-standard/bids-specification/pull/98/files#diff-ba564f153b960d803d493fe37fbbb34eL148). I do not see a clear definition of such constraint in the actual text describing inheritance principle. While working on fixing auto aggregation of common fields into the top level files to be inherited within heudiconv I placed myself into a corner with an example of having e.g.
sub-1_task-task1_run-1_bold.json
andsub-1_task-task1_acq-X_run-1_bold.json
per subject (should be ok), and then trying to aggregate over them while retaining also
_acq-
if defined. Then I would end up withtask-task1_bold.json
task-task1_acq-X_bold.json
at the top level. Is this legit???
Should then
_acq-X_bold
leaf files inherit also fromtask-task1_bold.json
? It shouldn't be so I guess.And it is not just a matter of having that constraint "no more than one file may be defined at a given level of the directory structure", because I could potentially place the
_acq-X
in per-subject directory, thus avoiding it. It is a matter of clear definition on how we "expand" the common filename to match any leaf one. If it is a matter of the fact that we could expand into a leaf file with arbitrary additional_key-value
pairs, then the situation above could be "Ok" so thattask-task1_acq-X_bold.json
could extend (or overwrite, but not delete) fields defined intask-task1_bold.json
.The text was updated successfully, but these errors were encountered: