Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major PR to update some inconsistencies in the specification. #726

Merged
merged 12 commits into from
Oct 11, 2024
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,4 @@ release.properties

new_path/
/new_path/
.vscode/
21 changes: 20 additions & 1 deletion sdrf-proteomics/README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ The list of ontologies/controlled vocabularies (CV) supported are:
- NCBI organismal classification
- PATO - the Phenotype and Trait Ontology
- PRIDE Controlled Vocabulary (CV)
- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources.
ypriverol marked this conversation as resolved.
Show resolved Hide resolved

[[sdrf-file-format]]
== SDRF-Proteomics file format
Expand Down Expand Up @@ -190,7 +191,7 @@ NOTE: Additional characteristics can be added depending on the type of the exper

Some important notes:

- Each characteristic name in the column header SHOULD be a CV term from the EFO ontology. For example, the header _characteristics[organism]_ corresponds to the ontology term Organism.
- Each characteristic name in the column header SHOULD be a CV term from the EFO ontology. For example, the header _characteristics[organism]_ corresponds to the ontology term Organism. However the values could be from EFO or other ontologies. For example, for diseases we RECOMMEND to use MONDO for diseases because it has better coverage than EFO.
ypriverol marked this conversation as resolved.
Show resolved Hide resolved

- Multiple values (columns) for the same characteristics term are allowed in SDRF-Proteomics. However, it is RECOMMENDED not to use the same column in the same file. If you have multiple phenotypes, you can specify what it refers to or use another more specific term, e.g., "immunophenotype".

Expand Down Expand Up @@ -242,6 +243,24 @@ The model of the mass spectrometer SHOULD be specified as _comment[instrument]_.

Additionally, it is strongly RECOMMENDED to include comment[MS2 analyzer type]. This is important, e.g., for Orbitrap models where MS2 scans can be acquired either in the Orbitrap or in the ion trap. Setting this value allows differentiating high-resolution MS/MS data. Possible values of _comment[MS2 analyzer type]_ are mass analyzer types.

[[technology-type]]
=== Technology type

Technology type is used in SDRF and MAGE-TAB formats to specify the technology applied in the study to capture the data. For transcriptomics, common values include technologies such as microarray, RNA-seq, and ChIP-seq (as seen in https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-13567[ArrayExpress Example]). In SDRF-Proteomics, the technology type field is REQUIRED to describe the experimental approach used to generate the data. We RECOMMEND including the technology type column immediately after the `assay name`` column in the SDRF file, clearly indicating which technology was used to produce the data files.

|===
| | assay name | technology type
|sample 1| run 1 | proteomic profiling by mass spectrometry
|===

NOTE: While we RECOMMEND positioning the technology type column after the assay name, in some original templates, this column was placed before the assay name. We will allow the technology type column to appear either directly before or after the assay name column but RECOMMEND placing it after the assay name for consistency.

For proteomics experiments the possible values for technology types can be obtained from PRIDE Ontology term https://www.ebi.ac.uk/ols4/ontologies/pride/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPRIDE_0000663[technology type].

Here, the list of valid values:

- proteomic profiling by mass spectrometer

[[additional-data-files]]
=== Additional Data files technical properties

Expand Down
3 changes: 2 additions & 1 deletion templates/README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ NOTE: Each of the templates is a tsv file with the minimum columns to describe t
*Sample attributes*: Minimum sample attributes for primary cells from different species and cell lines

|===
| | Default |Human | Vertebrates | Invertebrates | Plants | Cell lines
| | Default |Human | Vertebrates | Invertebrates | Plants | Cell lines
|source name | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
|characteristics[organism] | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
|characteristics[strain/breed] | | | |:zero: | |:zero:
Expand All @@ -35,6 +35,7 @@ NOTE: Each of the templates is a tsv file with the minimum columns to describe t
|characteristics[biological replicate] |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
| | | | | | |
|assay name | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
|technology type | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
|comment[data file] | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
|comment[technical replicate] | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
|comment[fraction identifier] | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
Expand Down
2 changes: 1 addition & 1 deletion templates/sdrf-cell-line.sdrf.tsv
Original file line number Diff line number Diff line change
@@ -1 +1 @@
source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[disease] characteristics[cell line] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument]
source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[disease] characteristics[cell line] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument]
2 changes: 1 addition & 1 deletion templates/sdrf-default.sdrf.tsv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument]
source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument]

2 changes: 1 addition & 1 deletion templates/sdrf-human.sdrf.tsv
Original file line number Diff line number Diff line change
@@ -1 +1 @@
source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[ancestry category] characteristics[age] characteristics[sex] characteristics[disease] characteristics[individual] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details]
source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[ancestry category] characteristics[age] characteristics[sex] characteristics[disease] characteristics[individual] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details]
2 changes: 1 addition & 1 deletion templates/sdrf-invertebrates.sdrf.tsv
Original file line number Diff line number Diff line change
@@ -1 +1 @@
source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[cell type] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details]
source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[cell type] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details]
1 change: 0 additions & 1 deletion templates/sdrf-nonvertebrates.sdrf.tsv

This file was deleted.

2 changes: 1 addition & 1 deletion templates/sdrf-plants.sdrf.tsv
Original file line number Diff line number Diff line change
@@ -1 +1 @@
source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[disease] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details]
source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[disease] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details]
2 changes: 1 addition & 1 deletion templates/sdrf-vertebrates.sdrf.tsv
Original file line number Diff line number Diff line change
@@ -1 +1 @@
source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[developmental stage] characteristics[disease] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument]
source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[developmental stage] characteristics[disease] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument]