Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,14 @@ Current features
* Search over existing studies (see known issues).
* Generate basic visualizations with the available studies and datasets.

Accepted raw files
------------------

* Multiplexed SFF
* Multiplexed FASTQ: forward, reverse (optional), and barcodes
* Per sample FASTQ: forward
* Multiplexed FASTA/qual files

Known issues
------------

Expand All @@ -65,7 +73,7 @@ future.
external sources. For example, metabolomics processing in
`GNPS <http://gnps.ucsd.edu>`__ and data visualization in Qiita.
* Creation of a REST API to query and access the data hosted by Qiita.
* Improved analysis pipeline for 16S datasets.
* Improved analysis pipeline for target gene datasets.
* Crowd-sourced metadata curation of existing studies: improve the metadata of
existing studies by submitting “fix proposals” to the authors of the study.

Expand Down
4 changes: 4 additions & 0 deletions qiita_pet/support_files/doc/source/_static/my-styles.css
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.navbar-brand > img {
display: inline-block;
}

.red {
color:red;
}
91 changes: 46 additions & 45 deletions qiita_pet/support_files/doc/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,24 @@ Frequently Asked Questions
What kind of data can I upload to Qiita for processing?
-------------------------------------------------------

We need 3 things: raw data, sample template, and prep template. At this
moment, raw data is fastq files without demultiplexing with forward,
reverse (optional) and barcode reads. We should have before the end of
the week SFF processing so it's OK to upload. Note that we are accepting
any kind of target gene (16S, 18S, ITS, whatever) as long as they have
some kind of demultiplexing strategy and that you can also upload WGS.
However, WGS processing is not ready.
Processing in Qiita requires 3 things: raw data, sample and prep information
files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads "Here a list" ... maybe it was intended to be "Here you can find" or "Here is".

you can find a list of currently supported raw files files. Note that we are
accepting any kind of target gene (16S, 18S, ITS, whatever). You can also upload
WGS however, WGS processing is not ready.

What's the difference between a sample and a prep template?
-----------------------------------------------------------
What's the difference between a sample and a prep information file?
-------------------------------------------------------------------

Sample template is the information about your samples, including
environmental and other important information about them. The prep
template is basically what kind of wet lab work all or a subset of the
samples had. If you collected 100 samples, you are going to need 100
rows in your sample template describing each of them, this includes
blanks, etc. Then you prepared 95 of them for 16S and 50 of them for
18S. Thus, you are going to need 2 prep templates: one with 95 rows
describing the preparation for 16S, and another one with 50 to
describing the 18S. For a more complex example go
A sample information file describes the samples in a study, including
environmental factors relating to the associated host. The prep information
file has information on how the sample was processed in the wet lab. If you
collected 100 samples for your study, you will need 100 rows in your sample
information file describing each of them, and additional rows for blanks and other
control samples. If you prepared 95 of them for 16S and 50 of them for 18S,
you will need 2 prep information files: one with 95 rows describing the preparation
for 16S, and another one with 50 describing the 18S. For a more complex
example go
`here <#h.eddzjlm5e6l6>`__ and for examples of these files you can go to
the "Upload instructions"
`here <https://www.google.com/url?q=https%3A%2F%2Fvamps.mbl.edu%2Fmobe_workshop%2Fwiki%2Findex.php%2FMain_Page&sa=D&sntz=1&usg=AFQjCNE4PTOKIvFNlWtHmJyLLy11mfzF8A>`__.
Expand All @@ -33,32 +31,35 @@ Example study processing workflow

A few more instructions: for the example above the workflow should be:

#. Create a new study
#. Add a sample template, you can add 1, try to process it and the
#. **Create a new study.**
#. **Add a sample information file.** You can add 1, try to process it and the
system will let you know if you have errors or missing columns. The
most common errors are: the sample name column should be named
sample\_name, duplicated sample names are not permitted, and the prep
template should contain all the samples in the sample template or a
subset. Finally, if you haven't processed your sample templates and
can add a column to your template named sloan\_status with this info:
SLOAN (funded by Sloan), SLOAN\_COMPATIBLE (not Sloan funded but with
compatible metadata, usually public), NOT\_SLOAN (not included i.e.
private study), that will be great!
#. Add a raw data. Depending on your barcoding/sequencing strategy you
might need 1 or 2 raw datas for the example above. If you have two
different fastq file sets (forward, reverse (optional) and barcodes)
you will need two raw datas but if you only have one set, you only
need one.
#. You can link your raw data to your files
#. You can add a prep template to your raw data. If you have the case
with only one fastq set (forward, reverse (optional) and barcodes),
you can add 2 different prep templates. Common missing fields here
are: emp\_status, center\_name, run\_prefix, platform,
library\_construction\_protocol, experiment\_design\_description,
center\_project\_name. Note that if you get a 500 error at this stage
is highly probable because emp\_status only accepts 3 values: 'EMP',
'EMP\_Processed', 'NOT\_EMP', if errors persist please do not
hesitate to contact us.
#. You can preprocess your files. For target gene, this means
demultiplexing and QC.

sample\_name, duplicated sample names are not permitted. For a full list of
required fields, visit :doc:`tutorials/prepare-information-files`.
#. **Add a prep information file to your study for each data type.** The prep
information file should contain all the samples in the sample information
file or a subset. If you have more than one FASTQ file set (forward,
reverse (optional) and barcodes) you will need to add a
:ref:`run_prefix <required-fields-for-preprocessing-target-gene-data>`
column.
A prep information file and a QIIME compatible mapping file will
be available for download after the prep information file is added
successfully.
#. **Upload and link your raw data to each of your prep information files.**
Depending on your barcoding/sequencing strategy you might need 1 or more
raw data file sets. If you have 2 raw data sets you may have to rename one
set so that each set has a different name. If they have the same name they
will over-write on upload. Note that you can have one FASTQ file set linked
to more than one prep information file.
#. **Preprocess your files.** For target gene amplicon sequencing, this will demux
and QC. There are multiple options for preprocessing depending on the
barcode format and the data output from the sequencing center - this may
require a series of trial and error to establish the correct option for
your data files. After demultiplexing a log file is generated with
statistics about the files demultiplexed including the number of sequences
assigned per sample.
#. **Process each of your preprocessed data types.** For target gene, this will
perform closed OTU picking against the latest version of Greengenes and can
be quite time consuming depending on the number of samples and the depth
of sequencing.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 24 additions & 21 deletions qiita_pet/support_files/doc/source/qiita-philosophy/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,44 +35,47 @@ can request an administrator to validate their study information and make it
private and possibly submit to a permanent repository, where it can also be
kept private until the user wants to make it public. At this stage in Qiita
the whole study (including all processed data) is private. This process is
completely automatic via the GUI. Currently sequence data is being deposited
for permanent storage to the European Nucleotide Archive (ENA), part of the
European Bioinformatics Institute (EBI). Finally, when the user is ready,
usually when the main manuscript of the study is ready for publication, the
user can request for the artifact to become public, both in Qiita and the
permanent repository, Figure 2.
completely automatic via the Graphical User Interface (GUI). Currently sequence
data is being deposited for permanent storage to the European Nucleotide
Archive (ENA), part of the European Bioinformatics Institute (EBI). Finally,
when the user is ready, usually when the main manuscript of the study is ready
for publication, the user can request for the artifact to be made public
public, both in Qiita and the permanent repository, Figure 2.


.. figure:: images/figure1.png
:align: center

**Figure 1. Qiita’s main structure: from single to multiple studies.** More
and more a simple study is composed by a multiple samples which have been
prepared chemically to identify diverse microbial composition parts of them.
For example, 16S to see which kind of bacteria lives on them, Metabolomics
to see the substance formed by the community, and/or ITS for the fungi.
Additionally, Qiita allows users to compare their studies with other public
ones already available in the system.
**Figure 1. Qiita’s main structure: from single to multiple studies.**
Increasingly, a simple study is composed of multiple samples which have
been prepared using different protocols to identify different microbial
features. For example, 16S rRNA amplification to identify the bacteria in
or on the sample, metabolomics to identify chemical components formed by
the microbial community or within the sample, and/or ITS amplification for
identification of fungal organisms that may also be present. Additionally,
Qiita allows users to compare their studies with other public ones already
available in the system.


.. figure:: images/figure2.png
:align: center

**Figure 2. Possible Qiita artifact states.** Artifacts are any file,
either uploaded by users or generated by the system. There are 3 possible
states: sanboxed, private and public. In the sandboxed and private states
no other user has access to the artifacts, except if the owner invites a
guest. In the public state, the artifact is open to all users in the
system, and the study can be searched from the study listing page.
states: sandboxed, private and public. In the sandboxed and private states
no other user has access to the artifacts, unless the owner grants access
by sharing the study. In the public state, the artifact is open to all
users in the system, and the study can be found by searching from the
study listing page.


Portals
-------

Qiita allows to host multiple portals within the same infrastructure. This
allows each portal to have a subset of studies in a different URL but sharing
the same resources. Sharing the same backend resources avoids having multiple
sites and data getting out of sync.
Qiita allows the hosting of multiple portals within the same infrastructure.
This allows each portal to have a subset of studies (often with a similar
theme) in a different URL but sharing the same resources. Sharing the same
backend resources avoids having multiple sites and data getting out of sync.

The current available portals are:

Expand Down
25 changes: 15 additions & 10 deletions qiita_pet/support_files/doc/source/tutorials/ebi-submission.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,30 @@

.. index:: ebi-submission

.. role:: red

EBI submission via Qiita
========================

Qiita allows users to deposit their study, sample, experiment and sequence data to the
`European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is a permanent repository
part of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to
`European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is the permanent data
repository of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to
this repository will provide you with a unique identifier for your study, which is generally a
requirement for publication.
requirement for publication. Your study will be housed with all other Qiita submissions
and so we require adherence to the MiXs standard.

EBI/ENA requires a given set of column fields to describe your samples and experiments, for more
information visit :doc:`prepare-templates` and pay most attention to EBI required fields,
information visit :doc:`prepare-information-files` and pay most attention to EBI required fields,
without these **Qiita Admins** will not be able to submit. If you want to submit your data or need
help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__.
help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__. Help will include
advice on additional fields to add to ensure MiXs compliance.

Note that this kind of submissions are time consuming and need full collaboration from the user.
Thus, do not wait until the last minute to request help. In general, the best time to request a submission
is when you are writing your paper. Remember that the data can be submitted to EBI and can be
kept private and simply make public when the paper is accepted. Note that EBI/ENA takes up to 15 days to
change the status from private to public, so consider this when submitting data and your manuscript.
Note that submissions are time consuming and need full collaboration from the user.
:red:`Do not wait until the last minute to request help.` In general, the best
time to request a submission is when you are writing your paper. Remember that the
data can be submitted to EBI and can be kept private and simply make public when
the paper is accepted. Note that EBI/ENA takes up to 15 days to change the status
from private to public, so consider this when submitting data and your manuscript.

.. note::
For convenience Qiita allows you to upload a QIIME mapping file to process your data. However,
Expand Down
15 changes: 8 additions & 7 deletions qiita_pet/support_files/doc/source/tutorials/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ from another page.
Once your file(s) have been uploaded, you can process them in Qiita.
From the upload tool, click on “Go to study description” and, once
there, click on the “Sample template” tab.  Select your sample template
from the dropdown menu and, lastly, click “Process sample template”.
from the dropdown menu and, lastly, click “Process sample template”.

.. figure:: images/process-sample-template.png
:align: center
Expand Down Expand Up @@ -230,9 +230,8 @@ Preprocessing data

Once you have linked files to your raw data and your prep template has
been processed, you can then proceed to preprocessing your data.
Currently we only support fastq files for target gene preprocessing
(including reverse complementing the prep template barcodes). We are
working on adding more options and preprocessing pipelines.
`Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__`
a list of currently supported raw files files.

.. figure:: images/image08.png
:align: center
Expand Down Expand Up @@ -309,7 +308,8 @@ Study status

- Sandbox. When a study is in this status, all the required metadata
columns must be present in the metadata files (sample and prep), but
the values don't have to be filled in or finalized yet. The purpose
the values don't have to be filled in or finalized yet. We suggest adding
TBD as the temporal values of these fields. The purpose
of this status is so that users can quickly upload their sequence
files and some (possibly incomplete) metadata in order to have a
preliminary look at their data.
Expand All @@ -323,5 +323,6 @@ Study status
- Public. Once a study is made administrator-approved and becomes
private, the user can choose when to make it public. Making a study
public means that it will be available to anyone with a Qiita user
account (e.g., for data downloads and meta-analyses).

account (e.g., for data downloads and meta-analyses). When a study
is public it cannot be changed. All associated templates will be public
as well.
4 changes: 2 additions & 2 deletions qiita_pet/support_files/doc/source/tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ The following is a full list of the available tutorials:
:maxdepth: 2

account-creation
prepare-templates
prepare-information-files
ebi-submission
getting-started
analyze-data
no-raw-sequences
join-pair-ends
join-paired-end-reads

To request documentation on any administration use-cases not addressed here,
please add an issue `here <https://github.com/biocore/qiita/issues>`__.
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
.. _join-pair-ends:
.. _join-paired-end-reads:

.. index:: join-pair-ends
.. index:: join-paired-end-reads

Join pair ends
==============
Join paired end reads
=====================

Having high quality, longer reads helps with taxonomy assignment and classification.
Thus, if your forward and reverse reads overlap you should join them. Note that this
Expand All @@ -20,7 +20,7 @@ Joining forward and reverse reads for raw files

You could use `join_paired_ends.py <http://qiime.org/scripts/join_paired_ends.html>`__
and then upload your joined sequence and barcode files for processing. Then you
will upload your resulted joined file to Qiita.
will upload the resulting joined file to Qiita.

.. _join_forward_and_reverse_reads_for_per_sample_fastq_files_without_barcodes_and_primers:

Expand All @@ -29,7 +29,7 @@ Joining forward and reverse reads for per sample FASTQ files without barcodes an

You could use `multiple_join_paired_ends.py <http://qiime.org/scripts/multiple_join_paired_ends.html>`__
and then upload your joined sequence and barcode files for processing. Then you
will upload your resulted joined per sample files to Qiita.
will upload the resulting joined per sample files to Qiita.


.. _per_sample_fastq_files_without_barcodes_but_with_primer_information_with_overlapping_regions:
Expand All @@ -41,7 +41,8 @@ To process this kind of files you will need to run two steps:

#. Run multiple_join_paired_ends.py to stitch the reads. See
`multiple_join_paired_ends.py <http://qiime.org/scripts/multiple_join_paired_ends.html>`__.
#. Run multiple_extract_barcodes.py to strip out the primers. You need to use a parameter file with:
#. Run multiple_extract_barcodes.py to strip out the primers. You will need to use a
parameter file with:

.. code:: bash

Expand Down
Loading