-
Notifications
You must be signed in to change notification settings - Fork 111
Open
Labels
PSI-DiscussionSpecificationSpecification issues related with PRIDE formats, API, etcSpecification issues related with PRIDE formats, API, etcenhancementNew feature or requestNew feature or request
Milestone
Description
Since the release of version 1.0.0 of SDRF we had discussions about how to capture metadata of the file itself. For example.
- SDRF version: This information now is quite relevant because with this new version we do need to differentiate bettween versions of the format.
- SDRF Template: This will define which template was used to annotate the dataset.
- SDRF Template Version: This will define the version of the template.
Additional information could be things like software annotator, its version, etc.
To serialise this information into SDRF, we can have multiple approaches:
1 - Currently for example, lesSDRF uses extra comment columns to annotate these information, but it is quite repeated information with mainly only one value per column.
source name characteristics[organism] ... source name technology type ... comment[sdrf version] comment[sdrf template]
2- Use the notations of some genomics formats as VCF and use comments for this information something like:
#sdrf version: 1.1.0
#sdrf template: human
#sdrf template version: 1.0.0
#annotation software: lesSDRF
source name characteristics[organism] ... source name technology type ...
3- Use the same version than before but at the bottom of the file, like:
source name characteristics[organism] ... source name technology type ...
#sdrf version: 1.1.0
#sdrf template: human
#sdrf template version: 1.0.0
#annotation software: lesSDRF
2 and 3 solutions are compatible with all python ecosystem including pandas etc. Other solutions?
Metadata
Metadata
Assignees
Labels
PSI-DiscussionSpecificationSpecification issues related with PRIDE formats, API, etcSpecification issues related with PRIDE formats, API, etcenhancementNew feature or requestNew feature or request