Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inheritance principle: clarify the procedure of which files would be considered #102

Closed
yarikoptic opened this issue Dec 5, 2018 · 15 comments
Assignees
Labels
inheritance question Further information is requested

Comments

@yarikoptic
Copy link
Collaborator

yarikoptic commented Dec 5, 2018

It might be just me, but from current wording it is not 100% clear if it is only _run and _rec are of special treatment to be "generalized over" or any other possible key/value pair (_acq etc). I.e. for a file

sub-<label>/[ses-<label>/]
    func/
        sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_ce-<label>][_dir-<label>][_rec-<label>][_run-<index>][_echo-<index>]_<contrast_label>.json

would task-<label>_<contrast_label>.json at the top level be considered, disregarding any possibly present OPTIONAL key (like _acq, _ce, etc)?

What about anatomy, e.g. for a file

sub-<label>/[ses-<label>/]
    anat/
        sub-<label>[_ses-<label>][_acq-<label>][_ce-<label>][_rec-<label>][_run-<index>]_<suffix>.json

would <suffix>.json be considered regardless any of the possible acq, ce etc present in the target subject/session specific file?

To make it totally clear, it might be worth to formulate explicitly a generic rule to generate "higher level" filename considered for inheritance for any given "leaf" file (might be already as implemented in pybids -- didn't check yet):

  • strip sub-<label>_ prefix while providing generalization across subject(s)
  • remove _ses-<label> while providing generalization across session(s)
  • remove any _key-<value> if generalizing across various types of acquisition
  • strip possibly present leading _ if left only with the suffix, such as _<suffix>.json in anat example

what do you think? Or it is just me, and current wording is sufficient (I would be ok with that)

FWIW, here is a list (click to expand) of all interesting "corner cases" across openneuro datasets
$> ls -l */*json | grep -v -e dataset_de -e participa -e task-  
-rw------- 1 yoh yoh   240 Dec  4 15:36 ds000101/T1w.json
-rw------- 1 yoh yoh   247 Dec  4 15:37 ds000102/T1w.json
-rw------- 1 yoh yoh  1372 Dec  4 16:08 ds000117/acq-mprage_T1w.json
-rw------- 1 yoh yoh    82 Dec  4 16:08 ds000117/run-1_echo-1_FLASH.json
-rw------- 1 yoh yoh    82 Dec  4 16:08 ds000117/run-1_echo-2_FLASH.json
-rw------- 1 yoh yoh    82 Dec  4 16:08 ds000117/run-1_echo-3_FLASH.json
-rw------- 1 yoh yoh    82 Dec  4 16:08 ds000117/run-1_echo-4_FLASH.json
-rw------- 1 yoh yoh    82 Dec  4 16:08 ds000117/run-1_echo-5_FLASH.json
-rw------- 1 yoh yoh    82 Dec  4 16:08 ds000117/run-1_echo-6_FLASH.json
-rw------- 1 yoh yoh    82 Dec  4 16:08 ds000117/run-1_echo-7_FLASH.json
-rw------- 1 yoh yoh    83 Dec  4 16:08 ds000117/run-2_echo-1_FLASH.json
-rw------- 1 yoh yoh    83 Dec  4 16:08 ds000117/run-2_echo-2_FLASH.json
-rw------- 1 yoh yoh    83 Dec  4 16:08 ds000117/run-2_echo-3_FLASH.json
-rw------- 1 yoh yoh    83 Dec  4 16:08 ds000117/run-2_echo-4_FLASH.json
-rw------- 1 yoh yoh    83 Dec  4 16:08 ds000117/run-2_echo-5_FLASH.json
-rw------- 1 yoh yoh    83 Dec  4 16:08 ds000117/run-2_echo-6_FLASH.json
-rw------- 1 yoh yoh    83 Dec  4 16:08 ds000117/run-2_echo-7_FLASH.json
-rw------- 1 yoh yoh   146 Dec  4 15:46 ds000144/T1w.json
-rw------- 1 yoh yoh   128 Dec  4 15:47 ds000164/T1w.json
-rw------- 1 yoh yoh   204 Dec  4 15:46 ds000168/T1w.json
-rw------- 1 yoh yoh   288 Dec  4 15:42 ds000174/T1w.json
-rw------- 1 yoh yoh  1184 Dec  4 15:58 ds000201/T1w.json
-rw------- 1 yoh yoh  1127 Dec  4 15:58 ds000201/T2w.json
-rw------- 1 yoh yoh   901 Dec  4 15:59 ds000201/dwi.json
-rw------- 1 yoh yoh   209 Dec  4 15:47 ds000205/T1w.json
-rw------- 1 yoh yoh   207 Dec  4 15:41 ds000208/T1w.json
-rw------- 1 yoh yoh   141 Dec  4 15:49 ds000213/T1w.json
-rw------- 1 yoh yoh   169 Dec  4 15:48 ds000214/T1w.json
-rw------- 1 yoh yoh   226 Dec  4 15:48 ds000222/T1w.json
-rw------- 1 yoh yoh    62 Dec  4 15:48 ds000229/T1w.json
-rw------- 1 yoh yoh   285 Dec  4 15:44 ds000231/T1w.json
-rw------- 1 yoh yoh   408 Dec  4 15:49 ds000239/T1w.json
-rw------- 1 yoh yoh   217 Dec  4 15:44 ds000240/T1w.json
-rw------- 1 yoh yoh    73 Dec  4 16:05 ds000244/dir-0_epi.json
-rw------- 1 yoh yoh    74 Dec  4 16:05 ds000244/dir-1_epi.json
-rw------- 1 yoh yoh   197 Dec  4 16:05 ds000244/dwi.json
-rw------- 1 yoh yoh   252 Dec  4 15:47 ds000245/T1w.json
-rw------- 1 yoh yoh   195 Dec  4 15:42 ds000248/acq-epi_T1w.json
-rw------- 1 yoh yoh    77 Dec  4 15:42 ds000248/acq-flipangle05_run-01_MEFLASH.json
-rw------- 1 yoh yoh    78 Dec  4 15:42 ds000248/acq-flipangle30_run-01_MEFLASH.json
-rw------- 1 yoh yoh   144 Dec  4 15:48 ds000254/T1w.json
-rw------- 1 yoh yoh   176 Dec  4 15:43 ds000255/T1w.json
-rw------- 1 yoh yoh   215 Dec  4 15:55 ds001021/T1w.json
-rw------- 1 yoh yoh   157 Dec  4 15:55 ds001021/dwi.json
-rw------- 1 yoh yoh    54 Dec  4 15:55 ds001021/phasediff.json
-rw------- 1 yoh yoh    69 Dec  4 15:39 ds001105/dir-AP_epi.json
-rw------- 1 yoh yoh    68 Dec  4 15:39 ds001105/dir-PA_epi.json
-rw------- 1 yoh yoh   259 Dec  4 16:01 ds001246/T1w.json
-rw------- 1 yoh yoh   253 Dec  4 16:01 ds001246/inplaneT2.json
-rw------- 1 yoh yoh   977 Dec  4 15:52 ds001386/bold.json
-rw------- 1 yoh yoh    75 Dec  4 15:57 ds001454/phasediff.json
-rw------- 1 yoh yoh   376 Dec  4 16:04 ds001486/T1w.json
-rw------- 1 yoh yoh   247 Dec  4 16:07 ds001525/T1w.json
-rw------- 1 yoh yoh    75 Dec  4 16:09 ds001545/phasediff.json
-rw------- 1 yoh yoh   517 Dec  4 16:16 ds001597/T1w.json
-rw------- 1 yoh yoh   517 Dec  4 16:16 ds001597/T2w.json
A list (click to expand) of all interesting task- "corner cases" where it is not just task-_bold.json across openneuro datasets
$> ls -l */*json | grep -v -e dataset_de -e participa | grep task-.*_[^b]
-rw------- 1 yoh yoh   284 Dec  4 15:47 ds000164/task-stroop_events.json
-rw------- 1 yoh yoh    76 Dec  4 15:59 ds000201/task-hands_physio.json
-rw------- 1 yoh yoh   596 Dec  4 15:48 ds000214/task-Cyberball_events.json
-rw------- 1 yoh yoh   884 Dec  4 15:46 ds000234/task-motorphotic_asl.json
-rw------- 1 yoh yoh   786 Dec  4 15:44 ds000235/task-rest_asl.json
-rw------- 1 yoh yoh   786 Dec  4 15:42 ds000236/task-rest_asl.json
-rw------- 1 yoh yoh   224 Dec  4 15:47 ds000237/task-MemorySpan_acq-multiband_bold.json
-rw------- 1 yoh yoh   934 Dec  4 15:44 ds000240/task-restEyesOpen_asl.json
-rw------- 1 yoh yoh  1628 Dec  4 16:05 ds000244/task-ArchiEmotional_acq-ap_bold.json
-rw------- 1 yoh yoh  1628 Dec  4 16:05 ds000244/task-ArchiEmotional_acq-ap_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ArchiEmotional_acq-pa_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ArchiEmotional_acq-pa_sbref.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-ArchiSocial_acq-ap_bold.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-ArchiSocial_acq-ap_sbref.json
-rw------- 1 yoh yoh  1624 Dec  4 16:05 ds000244/task-ArchiSocial_acq-pa_bold.json
-rw------- 1 yoh yoh  1624 Dec  4 16:05 ds000244/task-ArchiSocial_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ArchiSpatial_acq-ap_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ArchiSpatial_acq-ap_sbref.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-ArchiSpatial_acq-pa_bold.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-ArchiSpatial_acq-pa_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ArchiStandard_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ArchiStandard_acq-ap_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ArchiStandard_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ArchiStandard_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn01_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn01_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn02_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn02_acq-pa_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsTrn03_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsTrn03_acq-ap_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn04_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn04_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn05_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn05_acq-pa_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsTrn06_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsTrn06_acq-ap_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn07_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn07_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn08_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn08_acq-pa_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsTrn09_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsTrn09_acq-ap_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn10_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn10_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn11_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsTrn11_acq-pa_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsTrn12_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsTrn12_acq-ap_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsVal01_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsVal01_acq-pa_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal02_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal02_acq-ap_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal03_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal03_acq-ap_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsVal04_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsVal04_acq-pa_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal05_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal05_acq-ap_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal06_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal06_acq-ap_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsVal07_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-ClipsVal07_acq-pa_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal08_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal08_acq-ap_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal09_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-ClipsVal09_acq-ap_sbref.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-ContRing_acq-ap_bold.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-ContRing_acq-ap_sbref.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-ExpRing_acq-pa_bold.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-ExpRing_acq-pa_sbref.json
-rw------- 1 yoh yoh  1624 Dec  4 16:05 ds000244/task-HcpEmotion_acq-ap_bold.json
-rw------- 1 yoh yoh  1624 Dec  4 16:05 ds000244/task-HcpEmotion_acq-ap_sbref.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-HcpEmotion_acq-pa_bold.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-HcpEmotion_acq-pa_sbref.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-HcpGambling_acq-ap_bold.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-HcpGambling_acq-ap_sbref.json
-rw------- 1 yoh yoh  1624 Dec  4 16:05 ds000244/task-HcpGambling_acq-pa_bold.json
-rw------- 1 yoh yoh  1624 Dec  4 16:05 ds000244/task-HcpGambling_acq-pa_sbref.json
-rw------- 1 yoh yoh  1624 Dec  4 16:05 ds000244/task-HcpLanguage_acq-ap_bold.json
-rw------- 1 yoh yoh  1624 Dec  4 16:05 ds000244/task-HcpLanguage_acq-ap_sbref.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-HcpLanguage_acq-pa_bold.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-HcpLanguage_acq-pa_sbref.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-HcpMotor_acq-ap_bold.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-HcpMotor_acq-ap_sbref.json
-rw------- 1 yoh yoh  1621 Dec  4 16:05 ds000244/task-HcpMotor_acq-pa_bold.json
-rw------- 1 yoh yoh  1621 Dec  4 16:05 ds000244/task-HcpMotor_acq-pa_sbref.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-HcpRelational_acq-ap_bold.json
-rw------- 1 yoh yoh  1627 Dec  4 16:05 ds000244/task-HcpRelational_acq-ap_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-HcpRelational_acq-pa_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-HcpRelational_acq-pa_sbref.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-HcpSocial_acq-ap_bold.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-HcpSocial_acq-ap_sbref.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-HcpSocial_acq-pa_bold.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-HcpSocial_acq-pa_sbref.json
-rw------- 1 yoh yoh  1619 Dec  4 16:05 ds000244/task-HcpWm_acq-ap_bold.json
-rw------- 1 yoh yoh  1619 Dec  4 16:05 ds000244/task-HcpWm_acq-ap_sbref.json
-rw------- 1 yoh yoh  1618 Dec  4 16:05 ds000244/task-HcpWm_acq-pa_bold.json
-rw------- 1 yoh yoh  1618 Dec  4 16:05 ds000244/task-HcpWm_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage00_acq-ap_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage00_acq-ap_sbref.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage00_acq-pa_bold.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage00_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage01_acq-ap_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage01_acq-ap_sbref.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage01_acq-pa_bold.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage01_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage02_acq-ap_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage02_acq-ap_sbref.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage02_acq-pa_bold.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage02_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage03_acq-ap_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage03_acq-ap_sbref.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage03_acq-pa_bold.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage03_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage04_acq-ap_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage04_acq-ap_sbref.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage04_acq-pa_bold.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage04_acq-pa_sbref.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage05_acq-ap_bold.json
-rw------- 1 yoh yoh  1626 Dec  4 16:05 ds000244/task-RSVPLanguage05_acq-ap_sbref.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage05_acq-pa_bold.json
-rw------- 1 yoh yoh  1625 Dec  4 16:05 ds000244/task-RSVPLanguage05_acq-pa_sbref.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-WedgeAnti_acq-ap_bold.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-WedgeAnti_acq-ap_sbref.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-WedgeAnti_acq-pa_bold.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-WedgeAnti_acq-pa_sbref.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-WedgeClock_acq-ap_bold.json
-rw------- 1 yoh yoh  1623 Dec  4 16:05 ds000244/task-WedgeClock_acq-ap_sbref.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-WedgeClock_acq-pa_bold.json
-rw------- 1 yoh yoh  1622 Dec  4 16:05 ds000244/task-WedgeClock_acq-pa_sbref.json
-rw------- 1 yoh yoh   843 Dec  4 15:48 ds000254/task-bilateralfingertapping_echo-1_bold.json
-rw------- 1 yoh yoh   842 Dec  4 15:48 ds000254/task-bilateralfingertapping_echo-2_bold.json
-rw------- 1 yoh yoh   843 Dec  4 15:48 ds000254/task-bilateralfingertapping_echo-3_bold.json
-rw------- 1 yoh yoh   843 Dec  4 15:48 ds000254/task-bilateralfingertapping_echo-4_bold.json
-rw------- 1 yoh yoh  2345 Dec  4 15:55 ds001021/task-BREATHHOLD_acq-1400_bold.json
-rw------- 1 yoh yoh  1916 Dec  4 15:55 ds001021/task-CHECKERBOARD_acq-1400_bold.json
-rw------- 1 yoh yoh  1605 Dec  4 15:55 ds001021/task-CHECKERBOARD_acq-645_bold.json
-rw------- 1 yoh yoh  1593 Dec  4 15:55 ds001021/task-rest_acq-1400_bold.json
-rw------- 1 yoh yoh  1314 Dec  4 15:55 ds001021/task-rest_acq-645_bold.json
-rw------- 1 yoh yoh  1134 Dec  4 15:55 ds001021/task-rest_acq-CAP_bold.json
-rw------- 1 yoh yoh    76 Dec  4 16:11 ds001553/task-checkerboard_events.json
-rw------- 1 yoh yoh   869 Dec  4 16:16 ds001597/task-cuedMFM_events.json
-rw------- 1 yoh yoh    25 Dec  4 16:18 ds001600/task-rest_acq-AP_bold.json
-rw------- 1 yoh yoh    25 Dec  4 16:18 ds001600/task-rest_acq-PA_bold.json
-rw------- 1 yoh yoh    25 Dec  4 16:18 ds001600/task-rest_acq-v1_bold.json
-rw------- 1 yoh yoh    25 Dec  4 16:18 ds001600/task-rest_acq-v2_bold.json
-rw------- 1 yoh yoh    25 Dec  4 16:18 ds001600/task-rest_acq-v4_bold.json

edit 1: "The constraint"
another not entirely clear aspect to me, which is the not spelled out, is the requirement to have only a single "applicable" file at any given level, which is demonstrated in Example 1: Two JSON files at same level that are applicable for NIfTI file.:

"violating the constraint that no more than one file may be defined at a given level of the directory structure"
(wording around soon to be tuned up a bit in a https://github.com/bids-standard/bids-specification/pull/98/files#diff-ba564f153b960d803d493fe37fbbb34eL148). I do not see a clear definition of such constraint in the actual text describing inheritance principle. While working on fixing auto aggregation of common fields into the top level files to be inherited within heudiconv I placed myself into a corner with an example of having e.g.

  • sub-1_task-task1_run-1_bold.json and
  • sub-1_task-task1_acq-X_run-1_bold.json

per subject (should be ok), and then trying to aggregate over them while retaining also _acq- if defined. Then I would end up with

  • task-task1_bold.json
  • task-task1_acq-X_bold.json

at the top level. Is this legit???
Should then _acq-X_bold leaf files inherit also from task-task1_bold.json? It shouldn't be so I guess.
And it is not just a matter of having that constraint "no more than one file may be defined at a given level of the directory structure", because I could potentially place the _acq-X in per-subject directory, thus avoiding it. It is a matter of clear definition on how we "expand" the common filename to match any leaf one. If it is a matter of the fact that we could expand into a leaf file with arbitrary additional _key-value pairs, then the situation above could be "Ok" so that task-task1_acq-X_bold.json could extend (or overwrite, but not delete) fields defined in task-task1_bold.json.

@yarikoptic yarikoptic added the question Further information is requested label Dec 5, 2018
@yarikoptic
Copy link
Collaborator Author

and only now realized that we do not "officially/clearly" have a way to define the most common .json for inheritance, which might generalize for ALL other files.
Well -- dataset_description.json is kinda like that file -- since I guess all information pertinent to the dataset as a whole is pertinent to individual files as well. But then should it be considered while constructing the full metadata record for a particular leaf file?

@yarikoptic
Copy link
Collaborator Author

Thinking about it even more, I still like my algorithmic approach (with a single caveat outlined in the bottom of the original post, still thinking about it) to the definition and would like to propose to remove that restriction to have only a single "applicable" file at any given level.

  1. as stated above, the most common file to absorb all "common" fields is actually already there and it is dataset_description.json. Whatever applies to the entire dataset does apply to each file by default IMHO.
  2. I might want to have ses-X.json on top level to provide specific common fields for any subject/modality for ses-X in the study (e.g. that a specific scanner/software was used - a real use case from today!), which was different from another scanner in ses-Y.
  3. And then I would like to have task-A_bold.json and task-B_bold.json which would have other common fields specific to each task so I could have a quick overview of differences between tasks.
    I could furthermore provide specifications for different acq- etc, pretty much may be even per each _key- if I am that good. That makes it possible to scale nicely for large(r) studies and provide a nice summary of differences on top of the dataset.

In above example, all 3 specific types of .json files (or at least two, if we exclude dataset_description.json) at the same top level should then be used while "inheriting".
Instead of the rule for "a single applicable file" there could be a rule (with a validator check) that files at any given level must not be in conflict, in that they must not re-define the same field (i.e. a field F to be defined both in task-_bold.json and ses-.json files) thus making it ambiguous since would depend on the order on how "inheritance" is done across those files at the same level. If there is no overlap in fields, no ambiguity -- no need for a restriction.

@yarikoptic
Copy link
Collaborator Author

Continuing talking to myself, here is a list of non task- files in the top level to be inherited:

$> ls ds00*/*json | grep -v -e dataset_description -e participants -e task- | xargs -n 1 basename | sort | uniq -c                                                                
      1 acq-epi_T1w.json
      1 acq-flipangle05_run-01_MEFLASH.json
      1 acq-flipangle30_run-01_MEFLASH.json
      1 acq-moldOFF_T1w.json
      1 acq-moldON_T1w.json
      1 acq-mprage_T1w.json
      1 bold.json
      1 dir-0_epi.json
      1 dir-1_epi.json
      1 dir-AP_epi.json
      1 dir-PA_epi.json
      3 dwi.json
      1 inplaneT2.json
      3 phasediff.json
      1 run-1_echo-1_FLASH.json
      1 run-1_echo-2_FLASH.json
      1 run-1_echo-3_FLASH.json
      1 run-1_echo-4_FLASH.json
      1 run-1_echo-5_FLASH.json
      1 run-1_echo-6_FLASH.json
      1 run-1_echo-7_FLASH.json
      1 run-2_echo-1_FLASH.json
      1 run-2_echo-2_FLASH.json
      1 run-2_echo-3_FLASH.json
      1 run-2_echo-4_FLASH.json
      1 run-2_echo-5_FLASH.json
      1 run-2_echo-6_FLASH.json
      1 run-2_echo-7_FLASH.json
     24 T1w.json
      2 T2w.json

attn @tyarkoni (happen you didn't spot my whining here before)

@yarikoptic
Copy link
Collaborator Author

wow -- it is very popular in back references, but nobody voiced their opinions. @tsalo @sappelhoff @Remi-Gau @effigies WDYT?

@Remi-Gau
Copy link
Collaborator

wow -- it is very popular in back references, but nobody voiced their opinions. @tsalo @sappelhoff @Remi-Gau @effigies WDYT?

This is the can of worm issue. We know it's there and that we are going to have to open it at some points but nobody likes to eat worm and not just because some of us are vegetarians.

@effigies
Copy link
Collaborator

TBH I'm not sure I understand the issue. It's just that we should be able to have multiple applicable JSON files at the same level?

I remember when we made that rule, and it was because the precedence within a level is ambiguous, so overrides could be implementation- or even RNG-dependent.

Given that there is some appetite for removing inheritance altogether in BIDS 2.0, I can't imagine the people who feel that way being pleased with it getting even more byzantine.

@VisLab
Copy link
Member

VisLab commented Nov 16, 2021

HED (Hierarchical Event Descriptors) relies heavily on inheritance. Specifically the _events.json files can be inherited. This is crucial for annotation, as it allows a user to provide a single file at the top level containing the event annotations for the entire dataset.

Having a single _events.json sidecar at the top level is recommended process for annotation. To force users to make copies of the annotation at lower portions of the dataset means that users of the data would have no idea whether the same annotations were being used across datasets.

HED uses the rule that _event.json files can be placed at any level. A duplicated key at a lower level overrides the contents at a key at a higher level. We haven't considered that there might be multiple applicable _events.json sidecars at the same level.

HED really needs to have inheritance, but we only need one applicable sidecar at a given level. @sappelhoff @dungscout96 @smakeig

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Jun 9, 2022

@yarikoptic if you have time, would you mind updating the top message so we can know what aspects of this issues have NOT yet been addressed or clarified sufficiently by #946

@yarikoptic yarikoptic self-assigned this Jul 25, 2022
@yarikoptic
Copy link
Collaborator Author

I will @Remi-Gau ... I guess feel welcome to close if I don't get back to it let's say by Sep 1 (hopefully before ;)).

@TheChymera
Copy link
Collaborator

I would just add:

  • If we are to keep inheritance, encoding it in the schema somehow might be a good idea, as currently any validator implementation would need to hard-code both when and how it applies.
  • Perhaps it is worth considering whether we could drop the inheritance principle in the future. The amount of text duplication it prevents is trivial in terms of size on disk, and it makes datasets both more monolithic and less transparent --- particularly for manual inspection.

@VisLab
Copy link
Member

VisLab commented Aug 3, 2022 via email

@yarikoptic
Copy link
Collaborator Author

Dear @VisLab , I am 100% with you on this! Please (attn @TheChymera as well) follow up on the dedicated issue for that at bids-standard/bids-2-devel#36. Here we are not getting rid of Inheritance principle but working toward "fixing it"! ;)


Although I see the desire expressed by @TheChymera as "encoding it in the schema somehow", I would argue that "there is no need" (if not impossible).

Rationale: The entire src/schema can be loaded only following an "algorithm" which is coded up in tools/schemacode. schema/README provides description of the related data structures and some parts of the algorithm with that. Using that description in README.md and implementation in dandischematools as "canonical" other tools could provide implementations to load the schema. So we already have a duality of "schema" and "algorithm".

Situation is likely to stay the same for Inheritance principle. And IMHO it is ok to just express it as an algorithm (original desire) or a clear set of rules (current formulation) which gets coded up in dandischematools. I think current formulation in common principles: The Ineritance Principle (let's call it IP here) got much better through the work by @Lestropie in #946. There IP is expressed as a set of rules, and they can be expressed as an algorithm (multiple versions of it really, let's not get there here ;) ) and coded up. In that PR we have touched on the aspect the IP is not yet covering and which I have mentioned in my original description. We agreed with @Lestropie in the PR #946 that it needs to be done post that PR, but AFAIK it was not yet addressed. That should not prevent us though from implementing current version (I filed #1181) of the IP and then see where it is not yet complete or fails empirically.

But as current IP was cleared up in #946 I think it did address most of my original questions -- I think we can let this issue RiP, and instead file more specific issues or PRs to make "inheritance principle better" (hopefully without introducing backward incompatible changes). FTR here is a summary of issues/questions I had in mind and which might have been "addressed" or not (separate issues):

  • comment above: having both ses-X_bold.json (for common to ses-X across all subjects metadata) and task-Y_bold.json at the same top level. Following 80/20 principle, let's not bother about them. Current rule IP.4 "partially" forbids it. "Partially" because I could have ses-1_bold.json and sub-{1,2,...}/task-Y_bold.json to overcome in some ugly way. But I do consider such combinatorics well within those 20% to not bother about
  • Original description had a case of having both task-task1_acq-X_bold.json and task-task1_bold.json -- I filed a dedicated inheritance: "false positive" conflict among metadata files if some entities are missing #1182 to address it one way or another;
  • my ideas throwing about inheriting from datasets_description.json (and probably others like participants.tsvetc) are addressed by the restriction that we must retain the same _suffix. Higher level tools would "bind" participant etc metadata with metadata of specific data files.

@TheChymera
Copy link
Collaborator

@yarikoptic thanks for pointing out the main issue.

@VisLab I agree that it is more convenient when taking a monolithic approach to the dataset, though I disagree that:

Now if a change needs to be made [...] It really is a mess.
Tools can be easily written to summarize information if needed.

I would argue the reverse, that changes (infrequent) can easily be made across low-level files with ubiquitous tools such as find, grep, and sed, and information integration across inheritance levels (frequent) is non-trivial (e.g. #1182 ) and constitutes another nonstandard extraneous library, which may be handled differently in different packages.

In any case for the time being inheritance will not be dropped due to backwards-compatibility, and perhaps better discussed in the issue linked by @yarikoptic above.

@Lestropie
Copy link
Collaborator

Much of the earlier text posted in this issue relates to my follow-up proposal to #946, being #1003. Sorry I didn't find this thread and link to it.

IMO it should be both possible and recommended to define at higher levels of the filesystem hierarchy any metadata that are common to many files at lower levels of the hierarchy based on a set of common entities. This is a natural recapitulation of the hierarchical nature of the metadata. Duplicating such data, whether electively or through obligation due to removal of the inheritance principle, would mean that any application that depends on such information may need to check for consistency of those data across sidecar files, which contra-indicates such a design.

I spent a fair bit of time devising the language of #1003 as a way of systematizing the definition of how sidecar files can be present both within and across filesystem levels, and their relative ordering is wholly unambiguous, without a need to access the contents of those files. I tried to do it in a way that was both comrehensible for users, and sufficiently precise to guide software implementations. This was a natural extension of the existing IP, and was adequate for my own use case (though see bids-standard/bids-bep016#50 for the latest on that topic).

@yarikoptic may actually be looking for something even more general & powerful here. As shown in Example 2 in the current iteration of #1003, there are some circumstances in which having multiple sidecar files in a single filesystem hierarchy level is still impermissible: even though both possess a subset of the entities in the data file of interest, there is no objective way to decide the order in which they should be loaded, and therefore, if there is a metadata field that is present in both files, the content of that metadata field as applied to the data file of interest is ambiguous.
Theoretically, one could generalise the IP even further, permitting such "parallel" inheritance only if there are no metadata fields common to both files. This would introduce even further complexity to both the IP and the validator; but as long as the validator is capable of checking such exhaustively, software applications would not strictly need to check for such.
Given the amount of resistance I've had to #1003, I highly doubt the chances of such a proposal getting through; but I wanted to elucidate here fully given the relevance. If anything I think this would actually be more intuitive for users willing to exploit this level of complexity, as they could simply define metadata fields in sidecar files at the correct level with the correct set of entities that define their relevance. But complexity of both understanding and implementation is not something to commit to without very careful thought.

@yarikoptic
Copy link
Collaborator Author

Theoretically, one could generalise the IP even further, permitting such "parallel" inheritance only if there are no metadata fields common to both files. This would introduce even further complexity to both the IP and the validator; but as long as the validator is capable of checking such exhaustively, software applications would not strictly need to check for such.

;-) "great mind think alike" (using it 2nd time this Monday, a good sign!!) -- I have left https://github.com/bids-standard/bids-specification/pull/1003/files#r940247739 before reading this reply (due to the ordering of my mailbox ;-) ). Yes, it would add more complexity to validator and somewhat to implementation of inheritances principle, but IMHO worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inheritance question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants