Skip to content

Formalize concept/specification of the "BIDS Extensions" #74

Open

Description

I always mixed up currently used "Extension" with "Enhancement" in BEP. But quite often "Extension" means an additional part which is not per se par the integral part of the whole, so it is more of a "plug-in".

Other standards

Similar concepts are present in

  • NWB: https://nwb-extensions.github.io -- components which provide additional elements (schema) on top of the core NWB schema. Main NWB functionality can validate extensions as well as their schema is also embedded within NWB files.
  • Zarr ZEP0004: added "namespaces" to just list what additional "namespaces" in the metadata there are. I proposed formalizing it further, which was agreed to be a good idea by some, but overall was seems "too much to swallow": https://github.com/zarr-developers/zarr-specs/pull/262/files#r1540042136
  • DICOM: ... I am not familiar much, but @neurolabusc could hopefully help here... allows for manufacture's ad-hoc metadata fields, which in turn should conform to basic data types so they could also be somewhat validated

BIDS

In BIDS we already have some

  • HED - already in BIDS, but tangled up. We have
    • HEDVersion in dataset_description.json
    • HED tag columns in _events.{tsv,json}.
    • and validator can validate against HED standard/schema (@dorahermes mentioned that, I didn't check)
  • to a degree current "supported data formats" are extensions to BIDS since in principle we can offload validation of their internal structure/data/metadata to external tools (.json, .nii.gz, .zarr, .ome.zarr, .nwb to name a few). AFAIK bids-validator might be doing already some "built-in" e.g. for JSON, .nii.gz.
  • we allow for arbitrary other metadata fields in .json as long as they do not conflict
    • but in principle they could be formalized too, e.g. in heudiconv we use pydicom IIRC to extract more metadata fields from DICOMs and dump them into side car .json. In theory we could provide schema for those, and announce that dataset has such fields
  • BIDS models already also fits somewhat as an "extension" if dataset would include models for its analysis somewhere (I think it is not formalized, although could easily be with Make it possible to specify folders layout to be other than sub-{label}/[ses-{label}/] #54 on model entity)
  • ... I bet there is more ...

BEPs

And some ongoing BEPs are likely to be well represented to be "Extensions", e.g.

Projects building atop

And some ongoing projects which could have benefited from formalization:

Proposal elements

Should formalize within BIDS schema on how to

  • specify extensions within dataset_description.json. E.g. migrate HEDVersion used ATM into
  "BIDSExtensions": {
      "HED": {
          "version": "..."
      }
  }
  • ideally: allow for extensions to have extensions to BIDS schema (which files, metadata fields and where) so they are "overlaid" on top of the schema. That would be difficult to do it right (pretty much needs a "patch mechanism"; might be reasonable if only "adding" not removing/editing), but something to think about.
    • there they should add hooks into external
      • schemas (jsonschema, linkml.io) to validate metadata if present
      • formats -- tools to validate data or metadata
    • allow for embedding of the extension specification (as extension to schema) within e.g. schemas/ and have entries like
  "BIDSExtensions": {
      "MyLab": {
          "version": "...",
          "schema": "schemas/mylab/"
      }
  }

so that bids-validator could take that schema overlay to add to core BIDS and validate dataset accounting for custom stuff (attn @effigies @nellh @rwblair)

  • within BIDS have a registry of "approved" extensions (e.g. we have HED now), so records above could point to those, but they shouldn't be "melded" into BIDS but rather be those additional components.

I even wonder if we should list existing BEP elements there, i.e. to provide a quick way to identify components of the dataset without navigating it fully. I.e. to annotate what data types etc that dataset has...

Pros / Cons

Some might argue that such extensions might "dilute" BIDS and make it less formalized, such extension mechanism could allow for formalization of BIDS datasets to become "valid" beyond BIDS specification and avoid people simply hiding such components within their .bidsignore files.

Similarly to how works in NWB world (attn @oruebel @rly @bendichter), some BEPs could potentially be developed as "BIDSExtensions" (patches to the schema) and thus datasets with them could potentially be prepared/validated, even mixing different active BEPs (now would require some custom BIDSVersion and some custom schema as merged between BEPs... not really formalized.), until BEP is accepted and "patches" to schema become part of the BIDSVersion itself.

Related Existing Issues

BIDS 2.0 (what is feasible)

I think that for BIDS 2.0 we would not be able to solve it fully but I think it is feasible to at least partially address it, e.g. through formalization of BIDSExtensions within dataset_description.json to prepare BIDS 2.x namespace to annotate/accept aforementioned extension in a formalized way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    modularityIssues affecting modularity and composition of BIDS datasets

    Type

    No type

    Projects

    • Status

      BIDS 3.0

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions