Skip to content

Commit 77316f4

Browse files
committed
Merge pull request #1669 from antgonza/cleaning-docs
Cleaning docs
2 parents 39387ec + faa011a commit 77316f4

File tree

11 files changed

+222
-178
lines changed

11 files changed

+222
-178
lines changed

README.rst

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,14 @@ Current features
4646
* Search over existing studies (see known issues).
4747
* Generate basic visualizations with the available studies and datasets.
4848

49+
Accepted raw files
50+
------------------
51+
52+
* Multiplexed SFF
53+
* Multiplexed FASTQ: forward, reverse (optional), and barcodes
54+
* Per sample FASTQ: forward
55+
* Multiplexed FASTA/qual files
56+
4957
Known issues
5058
------------
5159

@@ -65,7 +73,7 @@ future.
6573
external sources. For example, metabolomics processing in
6674
`GNPS <http://gnps.ucsd.edu>`__ and data visualization in Qiita.
6775
* Creation of a REST API to query and access the data hosted by Qiita.
68-
* Improved analysis pipeline for 16S datasets.
76+
* Improved analysis pipeline for target gene datasets.
6977
* Crowd-sourced metadata curation of existing studies: improve the metadata of
7078
existing studies by submitting “fix proposals” to the authors of the study.
7179

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
.navbar-brand > img {
22
display: inline-block;
33
}
4+
5+
.red {
6+
color:red;
7+
}

qiita_pet/support_files/doc/source/faq.rst

Lines changed: 46 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -4,26 +4,24 @@ Frequently Asked Questions
44
What kind of data can I upload to Qiita for processing?
55
-------------------------------------------------------
66

7-
We need 3 things: raw data, sample template, and prep template. At this
8-
moment, raw data is fastq files without demultiplexing with forward,
9-
reverse (optional) and barcode reads. We should have before the end of
10-
the week SFF processing so it's OK to upload. Note that we are accepting
11-
any kind of target gene (16S, 18S, ITS, whatever) as long as they have
12-
some kind of demultiplexing strategy and that you can also upload WGS.
13-
However, WGS processing is not ready.
7+
Processing in Qiita requires 3 things: raw data, sample and prep information
8+
files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__`
9+
you can find a list of currently supported raw files files. Note that we are
10+
accepting any kind of target gene (16S, 18S, ITS, whatever). You can also upload
11+
WGS however, WGS processing is not ready.
1412

15-
What's the difference between a sample and a prep template?
16-
-----------------------------------------------------------
13+
What's the difference between a sample and a prep information file?
14+
-------------------------------------------------------------------
1715

18-
Sample template is the information about your samples, including
19-
environmental and other important information about them. The prep
20-
template is basically what kind of wet lab work all or a subset of the
21-
samples had. If you collected 100 samples, you are going to need 100
22-
rows in your sample template describing each of them, this includes
23-
blanks, etc. Then you prepared 95 of them for 16S and 50 of them for
24-
18S. Thus, you are going to need 2 prep templates: one with 95 rows
25-
describing the preparation for 16S, and another one with 50 to
26-
describing the 18S. For a more complex example go
16+
A sample information file describes the samples in a study, including
17+
environmental factors relating to the associated host. The prep information
18+
file has information on how the sample was processed in the wet lab. If you
19+
collected 100 samples for your study, you will need 100 rows in your sample
20+
information file describing each of them, and additional rows for blanks and other
21+
control samples. If you prepared 95 of them for 16S and 50 of them for 18S,
22+
you will need 2 prep information files: one with 95 rows describing the preparation
23+
for 16S, and another one with 50 describing the 18S. For a more complex
24+
example go
2725
`here <#h.eddzjlm5e6l6>`__ and for examples of these files you can go to
2826
the "Upload instructions"
2927
`here <https://www.google.com/url?q=https%3A%2F%2Fvamps.mbl.edu%2Fmobe_workshop%2Fwiki%2Findex.php%2FMain_Page&sa=D&sntz=1&usg=AFQjCNE4PTOKIvFNlWtHmJyLLy11mfzF8A>`__.
@@ -33,32 +31,35 @@ Example study processing workflow
3331

3432
A few more instructions: for the example above the workflow should be:
3533

36-
#. Create a new study
37-
#. Add a sample template, you can add 1, try to process it and the
34+
#. **Create a new study.**
35+
#. **Add a sample information file.** You can add 1, try to process it and the
3836
system will let you know if you have errors or missing columns. The
3937
most common errors are: the sample name column should be named
40-
sample\_name, duplicated sample names are not permitted, and the prep
41-
template should contain all the samples in the sample template or a
42-
subset. Finally, if you haven't processed your sample templates and
43-
can add a column to your template named sloan\_status with this info:
44-
SLOAN (funded by Sloan), SLOAN\_COMPATIBLE (not Sloan funded but with
45-
compatible metadata, usually public), NOT\_SLOAN (not included i.e.
46-
private study), that will be great!
47-
#. Add a raw data. Depending on your barcoding/sequencing strategy you
48-
might need 1 or 2 raw datas for the example above. If you have two
49-
different fastq file sets (forward, reverse (optional) and barcodes)
50-
you will need two raw datas but if you only have one set, you only
51-
need one.
52-
#. You can link your raw data to your files
53-
#. You can add a prep template to your raw data. If you have the case
54-
with only one fastq set (forward, reverse (optional) and barcodes),
55-
you can add 2 different prep templates. Common missing fields here
56-
are: emp\_status, center\_name, run\_prefix, platform,
57-
library\_construction\_protocol, experiment\_design\_description,
58-
center\_project\_name. Note that if you get a 500 error at this stage
59-
is highly probable because emp\_status only accepts 3 values: 'EMP',
60-
'EMP\_Processed', 'NOT\_EMP', if errors persist please do not
61-
hesitate to contact us.
62-
#. You can preprocess your files. For target gene, this means
63-
demultiplexing and QC.
64-
38+
sample\_name, duplicated sample names are not permitted. For a full list of
39+
required fields, visit :doc:`tutorials/prepare-information-files`.
40+
#. **Add a prep information file to your study for each data type.** The prep
41+
information file should contain all the samples in the sample information
42+
file or a subset. If you have more than one FASTQ file set (forward,
43+
reverse (optional) and barcodes) you will need to add a
44+
:ref:`run_prefix <required-fields-for-preprocessing-target-gene-data>`
45+
column.
46+
A prep information file and a QIIME compatible mapping file will
47+
be available for download after the prep information file is added
48+
successfully.
49+
#. **Upload and link your raw data to each of your prep information files.**
50+
Depending on your barcoding/sequencing strategy you might need 1 or more
51+
raw data file sets. If you have 2 raw data sets you may have to rename one
52+
set so that each set has a different name. If they have the same name they
53+
will over-write on upload. Note that you can have one FASTQ file set linked
54+
to more than one prep information file.
55+
#. **Preprocess your files.** For target gene amplicon sequencing, this will demux
56+
and QC. There are multiple options for preprocessing depending on the
57+
barcode format and the data output from the sequencing center - this may
58+
require a series of trial and error to establish the correct option for
59+
your data files. After demultiplexing a log file is generated with
60+
statistics about the files demultiplexed including the number of sequences
61+
assigned per sample.
62+
#. **Process each of your preprocessed data types.** For target gene, this will
63+
perform closed OTU picking against the latest version of Greengenes and can
64+
be quite time consuming depending on the number of samples and the depth
65+
of sequencing.
-36.8 KB
Loading

qiita_pet/support_files/doc/source/qiita-philosophy/index.rst

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -35,44 +35,47 @@ can request an administrator to validate their study information and make it
3535
private and possibly submit to a permanent repository, where it can also be
3636
kept private until the user wants to make it public. At this stage in Qiita
3737
the whole study (including all processed data) is private. This process is
38-
completely automatic via the GUI. Currently sequence data is being deposited
39-
for permanent storage to the European Nucleotide Archive (ENA), part of the
40-
European Bioinformatics Institute (EBI). Finally, when the user is ready,
41-
usually when the main manuscript of the study is ready for publication, the
42-
user can request for the artifact to become public, both in Qiita and the
43-
permanent repository, Figure 2.
38+
completely automatic via the Graphical User Interface (GUI). Currently sequence
39+
data is being deposited for permanent storage to the European Nucleotide
40+
Archive (ENA), part of the European Bioinformatics Institute (EBI). Finally,
41+
when the user is ready, usually when the main manuscript of the study is ready
42+
for publication, the user can request for the artifact to be made public
43+
public, both in Qiita and the permanent repository, Figure 2.
4444

4545

4646
.. figure:: images/figure1.png
4747
:align: center
4848

49-
**Figure 1. Qiita’s main structure: from single to multiple studies.** More
50-
and more a simple study is composed by a multiple samples which have been
51-
prepared chemically to identify diverse microbial composition parts of them.
52-
For example, 16S to see which kind of bacteria lives on them, Metabolomics
53-
to see the substance formed by the community, and/or ITS for the fungi.
54-
Additionally, Qiita allows users to compare their studies with other public
55-
ones already available in the system.
49+
**Figure 1. Qiita’s main structure: from single to multiple studies.**
50+
Increasingly, a simple study is composed of multiple samples which have
51+
been prepared using different protocols to identify different microbial
52+
features. For example, 16S rRNA amplification to identify the bacteria in
53+
or on the sample, metabolomics to identify chemical components formed by
54+
the microbial community or within the sample, and/or ITS amplification for
55+
identification of fungal organisms that may also be present. Additionally,
56+
Qiita allows users to compare their studies with other public ones already
57+
available in the system.
5658

5759

5860
.. figure:: images/figure2.png
5961
:align: center
6062

6163
**Figure 2. Possible Qiita artifact states.** Artifacts are any file,
6264
either uploaded by users or generated by the system. There are 3 possible
63-
states: sanboxed, private and public. In the sandboxed and private states
64-
no other user has access to the artifacts, except if the owner invites a
65-
guest. In the public state, the artifact is open to all users in the
66-
system, and the study can be searched from the study listing page.
65+
states: sandboxed, private and public. In the sandboxed and private states
66+
no other user has access to the artifacts, unless the owner grants access
67+
by sharing the study. In the public state, the artifact is open to all
68+
users in the system, and the study can be found by searching from the
69+
study listing page.
6770

6871

6972
Portals
7073
-------
7174

72-
Qiita allows to host multiple portals within the same infrastructure. This
73-
allows each portal to have a subset of studies in a different URL but sharing
74-
the same resources. Sharing the same backend resources avoids having multiple
75-
sites and data getting out of sync.
75+
Qiita allows the hosting of multiple portals within the same infrastructure.
76+
This allows each portal to have a subset of studies (often with a similar
77+
theme) in a different URL but sharing the same resources. Sharing the same
78+
backend resources avoids having multiple sites and data getting out of sync.
7679

7780
The current available portals are:
7881

qiita_pet/support_files/doc/source/tutorials/ebi-submission.rst

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,30 @@
22

33
.. index:: ebi-submission
44

5+
.. role:: red
6+
57
EBI submission via Qiita
68
========================
79

810
Qiita allows users to deposit their study, sample, experiment and sequence data to the
9-
`European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is a permanent repository
10-
part of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to
11+
`European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is the permanent data
12+
repository of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to
1113
this repository will provide you with a unique identifier for your study, which is generally a
12-
requirement for publication.
14+
requirement for publication. Your study will be housed with all other Qiita submissions
15+
and so we require adherence to the MiXs standard.
1316

1417
EBI/ENA requires a given set of column fields to describe your samples and experiments, for more
15-
information visit :doc:`prepare-templates` and pay most attention to EBI required fields,
18+
information visit :doc:`prepare-information-files` and pay most attention to EBI required fields,
1619
without these **Qiita Admins** will not be able to submit. If you want to submit your data or need
17-
help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__.
20+
help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__. Help will include
21+
advice on additional fields to add to ensure MiXs compliance.
1822

19-
Note that this kind of submissions are time consuming and need full collaboration from the user.
20-
Thus, do not wait until the last minute to request help. In general, the best time to request a submission
21-
is when you are writing your paper. Remember that the data can be submitted to EBI and can be
22-
kept private and simply make public when the paper is accepted. Note that EBI/ENA takes up to 15 days to
23-
change the status from private to public, so consider this when submitting data and your manuscript.
23+
Note that submissions are time consuming and need full collaboration from the user.
24+
:red:`Do not wait until the last minute to request help.` In general, the best
25+
time to request a submission is when you are writing your paper. Remember that the
26+
data can be submitted to EBI and can be kept private and simply make public when
27+
the paper is accepted. Note that EBI/ENA takes up to 15 days to change the status
28+
from private to public, so consider this when submitting data and your manuscript.
2429

2530
.. note::
2631
For convenience Qiita allows you to upload a QIIME mapping file to process your data. However,

qiita_pet/support_files/doc/source/tutorials/getting-started.rst

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ from another page.
155155
Once your file(s) have been uploaded, you can process them in Qiita.
156156
From the upload tool, click on “Go to study description” and, once
157157
there, click on the “Sample template” tab.  Select your sample template
158-
from the dropdown menu and, lastly, click “Process sample template”.
158+
from the dropdown menu and, lastly, click “Process sample template”.
159159

160160
.. figure:: images/process-sample-template.png
161161
:align: center
@@ -230,9 +230,8 @@ Preprocessing data
230230

231231
Once you have linked files to your raw data and your prep template has
232232
been processed, you can then proceed to preprocessing your data.
233-
Currently we only support fastq files for target gene preprocessing
234-
(including reverse complementing the prep template barcodes). We are
235-
working on adding more options and preprocessing pipelines.
233+
`Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__`
234+
a list of currently supported raw files files.
236235

237236
.. figure:: images/image08.png
238237
:align: center
@@ -309,7 +308,8 @@ Study status
309308

310309
- Sandbox. When a study is in this status, all the required metadata
311310
columns must be present in the metadata files (sample and prep), but
312-
the values don't have to be filled in or finalized yet. The purpose
311+
the values don't have to be filled in or finalized yet. We suggest adding
312+
TBD as the temporal values of these fields. The purpose
313313
of this status is so that users can quickly upload their sequence
314314
files and some (possibly incomplete) metadata in order to have a
315315
preliminary look at their data.
@@ -323,5 +323,6 @@ Study status
323323
- Public. Once a study is made administrator-approved and becomes
324324
private, the user can choose when to make it public. Making a study
325325
public means that it will be available to anyone with a Qiita user
326-
account (e.g., for data downloads and meta-analyses).
327-
326+
account (e.g., for data downloads and meta-analyses). When a study
327+
is public it cannot be changed. All associated templates will be public
328+
as well.

qiita_pet/support_files/doc/source/tutorials/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ The following is a full list of the available tutorials:
77
:maxdepth: 2
88

99
account-creation
10-
prepare-templates
10+
prepare-information-files
1111
ebi-submission
1212
getting-started
1313
analyze-data
1414
no-raw-sequences
15-
join-pair-ends
15+
join-paired-end-reads
1616

1717
To request documentation on any administration use-cases not addressed here,
1818
please add an issue `here <https://github.com/biocore/qiita/issues>`__.

qiita_pet/support_files/doc/source/tutorials/join-pair-ends.rst renamed to qiita_pet/support_files/doc/source/tutorials/join-paired-end-reads.rst

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
.. _join-pair-ends:
1+
.. _join-paired-end-reads:
22

3-
.. index:: join-pair-ends
3+
.. index:: join-paired-end-reads
44

5-
Join pair ends
6-
==============
5+
Join paired end reads
6+
=====================
77

88
Having high quality, longer reads helps with taxonomy assignment and classification.
99
Thus, if your forward and reverse reads overlap you should join them. Note that this
@@ -20,7 +20,7 @@ Joining forward and reverse reads for raw files
2020

2121
You could use `join_paired_ends.py <http://qiime.org/scripts/join_paired_ends.html>`__
2222
and then upload your joined sequence and barcode files for processing. Then you
23-
will upload your resulted joined file to Qiita.
23+
will upload the resulting joined file to Qiita.
2424

2525
.. _join_forward_and_reverse_reads_for_per_sample_fastq_files_without_barcodes_and_primers:
2626

@@ -29,7 +29,7 @@ Joining forward and reverse reads for per sample FASTQ files without barcodes an
2929

3030
You could use `multiple_join_paired_ends.py <http://qiime.org/scripts/multiple_join_paired_ends.html>`__
3131
and then upload your joined sequence and barcode files for processing. Then you
32-
will upload your resulted joined per sample files to Qiita.
32+
will upload the resulting joined per sample files to Qiita.
3333

3434

3535
.. _per_sample_fastq_files_without_barcodes_but_with_primer_information_with_overlapping_regions:
@@ -41,7 +41,8 @@ To process this kind of files you will need to run two steps:
4141

4242
#. Run multiple_join_paired_ends.py to stitch the reads. See
4343
`multiple_join_paired_ends.py <http://qiime.org/scripts/multiple_join_paired_ends.html>`__.
44-
#. Run multiple_extract_barcodes.py to strip out the primers. You need to use a parameter file with:
44+
#. Run multiple_extract_barcodes.py to strip out the primers. You will need to use a
45+
parameter file with:
4546

4647
.. code:: bash
4748

0 commit comments

Comments
 (0)