Description
It is a next step to the discussion which happened in
On a recent road-trip with @effigies we briefly discussed it and so far did not see a show stopper but it would require more minds to analyze.
ATM one of the problems of inheritance principle is unclear semantic in case of a value to be modified down the hierarchy: order can be unclear in case of multiple "candidate" files, unclear how to "remove" a value, etc.
And overall for a human it is cumbersome to "gather" the final value since for a file down the hierarchy someone needs to go through all possibly inherited files to arrive at the final value. But what if we take my suggestion in aforementioned issue further:
- retain ability to "chain" candidates for metadata from higher to lower levels as in current inheritance principle
- completely disallow overloading the value at lower (deeper in hierarchy) levels Corollaries:
- if present at different levels (e.g. entire dataset and then specific sidecar .json) - value must be identical/consistent across all levels of inheritance, or otherwise not given at any higher level
- if particular subject/session has some different value from the others as defined at higher (dataset) level, we need to remove that value from higher level and define at lower (e.g. subject/session) level
It will be a (now doable) job for a validator to ensure that all duplicated (across levels, if any) metadata is consistent.
As a result we would provide user a convenience that looking at top level metadata file provides a "guaranteed" correct metadata across all subject sessions, which is not the case currently as we can change it following the order of inheritance.
- FWIW, we already do something like that in heudiconv, where top level
task-*_bold.json
files collate all identical values across subject/sessions -- makes it easy to see what is common (e.g. scanner ID etc) - Conceptually is what we have in BIDS ATM, e.g.
participants.tsv
summarizes metadata across participants and we expect it to be consistent with possible other phenotypic information to be found in subject/sessions. - It somewhat would allow for easier composition of Allow composition of a BIDS dataset (dataset level) from smaller (subj or subj/ses) level #59. Again -- metadata present on higher level would remain consistent with the lower, which would be easier to achieve (copy) and ensure (validator).
Attn @Lestropie as he has spent most time to improve Inheritance principle definition, and @dorahermes who is an active proponent and its user: do you think such "simplification" (removal of "value overload") of inheritance would simplify and remain usable? Or may be I do not see some common use case such additional "restriction" would disallow?
I think it might be worth writing some checker and apply it across all openneuro datasets to see if we run into such data "overloads". What would be a tool/functionality which implements inheritance principle already "closest to the bible", e.g. which pretty much would return a list of lists of .json/.tsv files in their "inherited" bundles? (specific code examples would be welcome)
Edit:
- might cause trouble with Allow composition of a BIDS dataset (dataset level) from smaller (subj or subj/ses) level #59 since we do need to overload value. But this file isn't really subject to inheritance principle in it's current formulation although is the information pertinent to all files
Metadata
Assignees
Type
Projects
Status
Todo