Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small fixes + data size estimations #1

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions datatypes/microscopy.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# Confocal

Confocal image raw data will be stored and submitted to the BioImage Archive repository (https://www.ebi.ac.uk/bioimage-archive/) following their guidelines and metadata requirements.

# Electron microscopy

electron microscopy images ( grayscale electron micrographic images: file sizes average 100 to 160Kb)

Electron microscopy raw data will be submitted to EMPIAR (https://www.ebi.ac.uk/pdbe/emdb/empiar/) , including the metadata as instructed by this repository
11 changes: 10 additions & 1 deletion datatypes/phenotyping.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,20 @@
# Phenotyping

Plant phenotyping data will be stored and made available upon publication in PIPPA (https://pippa.psb.ugent.be), maintained by the VIB-UGent Center for Plant Systems Biology.
Plant phenotyping data will be stored and made available upon publication in PIPPA (https://pippa.psb.ugent.be), maintained by the VIB-UGent Center for Plant Systems Biology. PIPPA uses and is compliant with MIAPPE (https://github.com/MIAPPE/MIAPPE), The Minimal Information About Plant Phenotyping Experiment.

## Measurements

## RGB Images
Phenovision (& WIWAM) : 4Mb / image

For Phenovision typically 7 RGB images per pot/timepoint (3 sideway by 2 cameras (60 degree angles), en 1 from top).

## Thermal images
Infrared (LWIR): 1Mb / image

## Hyperspectral images
(grouped in dirs with 'capture' subdirectory containing .raw en .hdr files, 3/witreference + 3/zwartreference & image itself)

SWIR: 99mb
VNIR: 85mb
totalling to +- 185~200 Mb/ hyperspectral image
3 changes: 3 additions & 0 deletions datatypes/protein_interactions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Protein interaction

protein interactions will be submitted to the IntAct Molecular Interaction Database (https://www.ebi.ac.uk/intact/) in PSI-MI TAB (miTab) format (http://www.psidev.info/groups/molecular-interactions).
10 changes: 9 additions & 1 deletion datatypes/proteomics.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
## Proteomics

Raw MS spectra data will be stored. Upon publication, data will be submitted to PRIDE. Metadata will be captured conform the standards of this repository.
Raw MS spectra data will be stored. Upon publication, proteomics data will be submitted to PRIDE. Metadata will be captured conform the MIAPE standards (http://www.psidev.info/miape) of this repository.

MS-spectra

## Affinity-purification mass spectrometry (AP-MS)

_while waiting for (more) accurate volume estimates we will use 500Mb per sample for Arabidopsis
(somewhat reflects what is alreada in PRIDE, eg this entry https://www.ebi.ac.uk/pride/archive/projects/PXD002606/files)
data file sizes seems to be in reference to the organism?_

According to Dominique (PSB, plant AP-MS serivce) one AP-MS run produces approx. 1Gb per run (for a 1h LC-MS/MS). For a 3h run it's likley more towards 1.2-1.5Gb per run

## Tandem Affinity Purification

## Protein binding Microarray
Expand Down
15 changes: 14 additions & 1 deletion datatypes/rnaseq.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

Raw RNA sequencing data will be stored as compressed fastq files. Processed data are tab-delimited text files containing read counts per gene.
Upon publication, data will be submitted to ArrayExpress.
Details of the analysis methodology will be provided. Metadata will be captured conform the standards of this repository
Details of the analysis methodology will be provided. Metadata will be captured conform the standards of this repository. The minimum set of data and meta-data for a given study/experiment is defined in MINSEQE ( http://fged.org/projects/minseqe/) for NGS.

### RNA-seq full version

Expand All @@ -20,10 +20,23 @@ Section 2 Question 2: Describe the origin, type and format of the data (per data

* Raw digital data, compressed FastQ data, ~1-5GB per dataset

example:

Digital data:

RNAseq data; Next generation sequencing raw data, compressed fastq format
16 genotypes, 1 tissue from maize; per sample 20M 75bp PE illumina reads; 3 replicates; totalling 96Gb;

#### Volume

Project dependent

SIZE estimates
* 20M, 75bp, SE illumina reads: 1Gb / sample
* 20M, 75bp, PE illumina reads: 2Gb / sample

* 20M, 100bp, PE illumina reads: 3Gb / sample

Section 4

### What documentation will be provided to enable understanding and reuse of the data collected/generated in this project?
Expand Down
4 changes: 2 additions & 2 deletions datatypes/sequencing_based.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,11 @@ Public Repository: ArrayExpress

Raw sequencing data will be stored as compressed fastq files. Upon publication, data will be submitted to ArrayExpress. Metadata will be captured conform the standards of this repository.

### Genome Sequencing: EMS
### Genome Sequencing: EMS

Raw sequencing data will be stored as compressed fastq files. Upon publication, data will be submitted to ENA. Metadata will be captured conform the standards of this repository.


### Genome Assembly
### Genome Assembly /genome re-seq

Raw sequencing data will be stored as compressed fastq files. Upon publication, data will be submitted to ENA. Metadata will be captured conform the standards of this repository.
8 changes: 4 additions & 4 deletions dmp_guidelines_fwo_full.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ These costs will be covered by the FWO project.

### 5.5 Data security: how will you ensure that the data are securely stored and not accessed or modified by unauthorized persons?

Access is controlled through standard IT practices, restricting access to relevant personnel.
Access is controlled through standard IT practices, restricting access to relevant personnel only.

## 6. Data preservation after the end of the FWO project

Expand All @@ -174,13 +174,13 @@ Access is controlled through standard IT practices, restricting access to releva

<!--*In case only a selection of the data can/will be preserved, clearly state the reasons for this (legal or contractual restrictions, physical preservation issues...).*-->

All relevant data for reuse will be preserved. Details are available in the "Sharing and reuse" section of this DMP.
All relevant data for reuse and experiment replicability will be preserved. Details are available in the "Sharing and reuse" section of this DMP.

*Guideline: here we only list exceptions.*

### 6.2 Where will these data be archived (= stored for the long term)?

Data will be default archived at the VIB-UGent Center for Plant Systems Biology. Upon publication, relevant data will be made available either in appropriate repositories or e.g. using generic services such as zenodo.org, dryad, etc. These services will ensure long term storage.
Data will be by default archived at the VIB-UGent Center for Plant Systems Biology. Upon publication, relevant data will be made available either in appropriate repositories or e.g. using generic services such as zenodo.org, dryad, etc. These services will ensure long term storage.

### 6.3 What are the expected costs for data preservation during these 5 years? How will the costs be covered?

Expand Down Expand Up @@ -221,7 +221,7 @@ Data will be made available Open Access

### 7.6 What are the expected costs for data sharing? How will these costs be covered?

Data will be default using public repositories or services such as zenodo.org, which are free for use. In case institutional repositories are used, the institute will cover these costs.
Data will be made available using public repositories or services such as zenodo.org, which are free for use. In case institutional repositories are used, the institute will cover these costs.

## 8. Responsabilities

Expand Down