Skip to content

Replace "inheritance" with "summarization" principle #65

Open

Description

It is a next step to the discussion which happened in

On a recent road-trip with @effigies we briefly discussed it and so far did not see a show stopper but it would require more minds to analyze.

ATM one of the problems of inheritance principle is unclear semantic in case of a value to be modified down the hierarchy: order can be unclear in case of multiple "candidate" files, unclear how to "remove" a value, etc.
And overall for a human it is cumbersome to "gather" the final value since for a file down the hierarchy someone needs to go through all possibly inherited files to arrive at the final value. But what if we take my suggestion in aforementioned issue further:

  • retain ability to "chain" candidates for metadata from higher to lower levels as in current inheritance principle
  • completely disallow overloading the value at lower (deeper in hierarchy) levels Corollaries:
    • if present at different levels (e.g. entire dataset and then specific sidecar .json) - value must be identical/consistent across all levels of inheritance, or otherwise not given at any higher level
    • if particular subject/session has some different value from the others as defined at higher (dataset) level, we need to remove that value from higher level and define at lower (e.g. subject/session) level

It will be a (now doable) job for a validator to ensure that all duplicated (across levels, if any) metadata is consistent.

As a result we would provide user a convenience that looking at top level metadata file provides a "guaranteed" correct metadata across all subject sessions, which is not the case currently as we can change it following the order of inheritance.

  • FWIW, we already do something like that in heudiconv, where top level task-*_bold.json files collate all identical values across subject/sessions -- makes it easy to see what is common (e.g. scanner ID etc)
  • Conceptually is what we have in BIDS ATM, e.g. participants.tsv summarizes metadata across participants and we expect it to be consistent with possible other phenotypic information to be found in subject/sessions.
    • Hence I think it also relates to BEP036 (Phenotypic Data Guidelines), attn @surchs @ericearl (I just now created @bids-standard/bep036 team) where the idea circles to be able to "segregate" metadata into subject/session level while keeping consistently in the top level (under phetotype/ folder).
  • It somewhat would allow for easier composition of Allow composition of a BIDS dataset (dataset level) from smaller (subj or subj/ses) level #59. Again -- metadata present on higher level would remain consistent with the lower, which would be easier to achieve (copy) and ensure (validator).

Attn @Lestropie as he has spent most time to improve Inheritance principle definition, and @dorahermes who is an active proponent and its user: do you think such "simplification" (removal of "value overload") of inheritance would simplify and remain usable? Or may be I do not see some common use case such additional "restriction" would disallow?

I think it might be worth writing some checker and apply it across all openneuro datasets to see if we run into such data "overloads". What would be a tool/functionality which implements inheritance principle already "closest to the bible", e.g. which pretty much would return a list of lists of .json/.tsv files in their "inherited" bundles? (specific code examples would be welcome)

Edit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    inheritancemodularityIssues affecting modularity and composition of BIDS datasets

    Type

    No type

    Projects

    • Status

      Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions