Skip to content

Capturing metadata about the file itself #758

@ypriverol

Description

@ypriverol

Since the release of version 1.0.0 of SDRF we had discussions about how to capture metadata of the file itself. For example.

  • SDRF version: This information now is quite relevant because with this new version we do need to differentiate bettween versions of the format.
  • SDRF Template: This will define which template was used to annotate the dataset.
  • SDRF Template Version: This will define the version of the template.

Additional information could be things like software annotator, its version, etc.

To serialise this information into SDRF, we can have multiple approaches:

1 - Currently for example, lesSDRF uses extra comment columns to annotate these information, but it is quite repeated information with mainly only one value per column.

source name characteristics[organism] ... source name technology type ... comment[sdrf version] comment[sdrf template]

2- Use the notations of some genomics formats as VCF and use comments for this information something like:

#sdrf version: 1.1.0
#sdrf template: human
#sdrf template version: 1.0.0
#annotation software: lesSDRF

source name characteristics[organism] ... source name technology type ... 

3- Use the same version than before but at the bottom of the file, like:

source name characteristics[organism] ... source name technology type ... 

#sdrf version: 1.1.0
#sdrf template: human
#sdrf template version: 1.0.0
#annotation software: lesSDRF

2 and 3 solutions are compatible with all python ecosystem including pandas etc. Other solutions?

Metadata

Metadata

Assignees

Labels

PSI-DiscussionSpecificationSpecification issues related with PRIDE formats, API, etcenhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions