Capturing metadata about the file itself

Since the release of version 1.0.0 of SDRF we had discussions about how to capture metadata of the file itself. For example. 

- SDRF version: This information now is quite relevant because with this new version we do need to differentiate bettween versions of the format. 
- SDRF Template: This will define which template was used to annotate the dataset. 
- SDRF Template Version: This will define the version of the template. 

Additional information could be things like software annotator, its version, etc. 

To serialise this information into SDRF, we can have multiple approaches: 

1 - Currently for example, lesSDRF uses extra `comment` columns to annotate these information, but it is quite repeated information with mainly only one value per column. 

```
source name characteristics[organism] ... source name technology type ... comment[sdrf version] comment[sdrf template]
```

2- Use the notations of some genomics formats as VCF and use comments for this information something like: 

```
#sdrf version: 1.1.0
#sdrf template: human
#sdrf template version: 1.0.0
#annotation software: lesSDRF

source name characteristics[organism] ... source name technology type ... 
```

3- Use the same version than before but at the bottom of the file, like: 

```
source name characteristics[organism] ... source name technology type ... 

#sdrf version: 1.1.0
#sdrf template: human
#sdrf template version: 1.0.0
#annotation software: lesSDRF
```

2 and 3 solutions are compatible with all python ecosystem including pandas etc. Other solutions? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Capturing metadata about the file itself #758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Capturing metadata about the file itself #758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions