Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,14 @@ Current features
* Search over existing studies (see known issues).
* Generate basic visualizations with the available studies and datasets.

Accepted raw files
------------------

* Multiplexed SFF
* Multiplexed FASTQ: forward, reverse (optional), and barcodes
* Per sample FASTQ: forward
* Multiplexed FASTA/qual files

Known issues
------------

Expand All @@ -65,7 +73,7 @@ future.
external sources. For example, metabolomics processing in
`GNPS <http://gnps.ucsd.edu>`__ and data visualization in Qiita.
* Creation of a REST API to query and access the data hosted by Qiita.
* Improved analysis pipeline for 16S datasets.
* Improved analysis pipeline for target gene datasets.
* Crowd-sourced metadata curation of existing studies: improve the metadata of
existing studies by submitting “fix proposals” to the authors of the study.

Expand Down
2 changes: 1 addition & 1 deletion qiita_pet/handlers/study_handlers/ebi_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def display_template(self, preprocessed_data_id, msg, msg_level):
# If allow_submission is already false, we technically don't need to
# do the following work. However, there is no clean way to fix this
# using the current structure, so we perform the work as we
# did not fail.
# did so it doesn't fail.
# We currently support only one prep template for submission, so
# grabbing the first one
prep_template = prep_templates[0]
Expand Down
64 changes: 40 additions & 24 deletions qiita_pet/handlers/study_handlers/vamps_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@
from qiita_ware.context import submit
from qiita_ware.demux import stats as demux_stats
from qiita_ware.dispatchable import submit_to_VAMPS
from qiita_db.metadata_template.prep_template import PrepTemplate
from qiita_db.metadata_template.sample_template import SampleTemplate
from qiita_db.study import Study
from qiita_db.exceptions import QiitaDBUnknownIDError
from qiita_db.artifact import Artifact
from qiita_pet.handlers.base_handlers import BaseHandler
Expand All @@ -36,15 +33,26 @@ def display_template(self, preprocessed_data_id, msg, msg_level):
if user.level != 'admin':
raise HTTPError(403, "No permissions of admin, "
"get/VAMPSSubmitHandler: %s!" % user.id)

prep_template = PrepTemplate(preprocessed_data.prep_template)
sample_template = SampleTemplate(preprocessed_data.study)
study = Study(preprocessed_data.study)
prep_templates = preprocessed_data.prep_templates
allow_submission = len(prep_templates) == 1
msg_list = ["Submission to EBI disabled:"]
if not allow_submission:
msg_list.append(
"Only artifacts with a single prep template can be submitted")
# If allow_submission is already false, we technically don't need to
# do the following work. However, there is no clean way to fix this
# using the current structure, so we perform the work as we
# did so it doesn't fail.
# We currently support only one prep template for submission, so
# grabbing the first one
prep_template = prep_templates[0]
study = preprocessed_data.study
sample_template = study.sample_template
stats = [('Number of samples', len(prep_template)),
('Number of metadata headers',
len(sample_template.categories()))]

demux = [path for _, path, ftype in preprocessed_data.get_filepaths()
demux = [path for _, path, ftype in preprocessed_data.filepaths
if ftype == 'preprocessed_demux']
demux_length = len(demux)

Expand All @@ -61,10 +69,21 @@ def display_template(self, preprocessed_data_id, msg, msg_level):
stats.append(('Number of sequences', demux_file_stats.n))
msg_level = 'success'

# In EBI here we check that we have the required field for submission,
# however for VAMPS we don't need that

if not allow_submission:
disabled_msg = "<br/>".join(msg_list)
else:
disabled_msg = None

self.render('vamps_submission.html',
study_title=study.title, stats=stats, message=msg,
study_id=study.id, level=msg_level,
preprocessed_data_id=preprocessed_data_id)
preprocessed_data_id=preprocessed_data_id,
investigation_type=prep_template.investigation_type,
allow_submission=allow_submission,
disabled_msg=disabled_msg)

@authenticated
def get(self, preprocessed_data_id):
Expand All @@ -73,33 +92,30 @@ def get(self, preprocessed_data_id):
@authenticated
@execute_as_transaction
def post(self, preprocessed_data_id):
user = self.current_user
# make sure user is admin and can therefore actually submit to VAMPS
if self.current_user.level != 'admin':
if user.level != 'admin':
raise HTTPError(403, "User %s cannot submit to VAMPS!" %
self.current_user.id)
user.id)
msg = ''
msg_level = 'success'
preprocessed_data = Artifact(preprocessed_data_id)
state = preprocessed_data.submitted_to_vamps_status()

demux = [path for _, path, ftype in preprocessed_data.get_filepaths()
if ftype == 'preprocessed_demux']
demux_length = len(demux)

if state in ('submitting', 'success'):
study = Artifact(preprocessed_data_id).study
study_id = study.id
state = study.ebi_submission_status
if state == 'submitting':
msg = "Cannot resubmit! Current state is: %s" % state
msg_level = 'danger'
elif demux_length != 1:
msg = "The study doesn't have demux files or have too many" % state
msg_level = 'danger'
else:
channel = self.current_user.id
channel = user.id
job_id = submit(channel, submit_to_VAMPS,
int(preprocessed_data_id))

self.render('compute_wait.html',
job_id=job_id, title='VAMPS Submission',
completion_redirect='/compute_complete/%s' % job_id)
completion_redirect=('/study/description/%s?top_tab='
'preprocessed_data_tab&sub_tab=%s'
% (study_id,
preprocessed_data_id)))
return

self.display_template(preprocessed_data_id, msg, msg_level)
4 changes: 4 additions & 0 deletions qiita_pet/support_files/doc/source/_static/my-styles.css
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.navbar-brand > img {
display: inline-block;
}

.red {
color:red;
}
91 changes: 46 additions & 45 deletions qiita_pet/support_files/doc/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,24 @@ Frequently Asked Questions
What kind of data can I upload to Qiita for processing?
-------------------------------------------------------

We need 3 things: raw data, sample template, and prep template. At this
moment, raw data is fastq files without demultiplexing with forward,
reverse (optional) and barcode reads. We should have before the end of
the week SFF processing so it's OK to upload. Note that we are accepting
any kind of target gene (16S, 18S, ITS, whatever) as long as they have
some kind of demultiplexing strategy and that you can also upload WGS.
However, WGS processing is not ready.
Processing in Qiita requires 3 things: raw data, sample and prep information
files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__`
you can find a list of currently supported raw files files. Note that we are
accepting any kind of target gene (16S, 18S, ITS, whatever). You can also upload
WGS however, WGS processing is not ready.

What's the difference between a sample and a prep template?
-----------------------------------------------------------
What's the difference between a sample and a prep information file?
-------------------------------------------------------------------

Sample template is the information about your samples, including
environmental and other important information about them. The prep
template is basically what kind of wet lab work all or a subset of the
samples had. If you collected 100 samples, you are going to need 100
rows in your sample template describing each of them, this includes
blanks, etc. Then you prepared 95 of them for 16S and 50 of them for
18S. Thus, you are going to need 2 prep templates: one with 95 rows
describing the preparation for 16S, and another one with 50 to
describing the 18S. For a more complex example go
A sample information file describes the samples in a study, including
environmental factors relating to the associated host. The prep information
file has information on how the sample was processed in the wet lab. If you
collected 100 samples for your study, you will need 100 rows in your sample
information file describing each of them, and additional rows for blanks and other
control samples. If you prepared 95 of them for 16S and 50 of them for 18S,
you will need 2 prep information files: one with 95 rows describing the preparation
for 16S, and another one with 50 describing the 18S. For a more complex
example go
`here <#h.eddzjlm5e6l6>`__ and for examples of these files you can go to
the "Upload instructions"
`here <https://www.google.com/url?q=https%3A%2F%2Fvamps.mbl.edu%2Fmobe_workshop%2Fwiki%2Findex.php%2FMain_Page&sa=D&sntz=1&usg=AFQjCNE4PTOKIvFNlWtHmJyLLy11mfzF8A>`__.
Expand All @@ -33,32 +31,35 @@ Example study processing workflow

A few more instructions: for the example above the workflow should be:

#. Create a new study
#. Add a sample template, you can add 1, try to process it and the
#. **Create a new study.**
#. **Add a sample information file.** You can add 1, try to process it and the
system will let you know if you have errors or missing columns. The
most common errors are: the sample name column should be named
sample\_name, duplicated sample names are not permitted, and the prep
template should contain all the samples in the sample template or a
subset. Finally, if you haven't processed your sample templates and
can add a column to your template named sloan\_status with this info:
SLOAN (funded by Sloan), SLOAN\_COMPATIBLE (not Sloan funded but with
compatible metadata, usually public), NOT\_SLOAN (not included i.e.
private study), that will be great!
#. Add a raw data. Depending on your barcoding/sequencing strategy you
might need 1 or 2 raw datas for the example above. If you have two
different fastq file sets (forward, reverse (optional) and barcodes)
you will need two raw datas but if you only have one set, you only
need one.
#. You can link your raw data to your files
#. You can add a prep template to your raw data. If you have the case
with only one fastq set (forward, reverse (optional) and barcodes),
you can add 2 different prep templates. Common missing fields here
are: emp\_status, center\_name, run\_prefix, platform,
library\_construction\_protocol, experiment\_design\_description,
center\_project\_name. Note that if you get a 500 error at this stage
is highly probable because emp\_status only accepts 3 values: 'EMP',
'EMP\_Processed', 'NOT\_EMP', if errors persist please do not
hesitate to contact us.
#. You can preprocess your files. For target gene, this means
demultiplexing and QC.

sample\_name, duplicated sample names are not permitted. For a full list of
required fields, visit :doc:`tutorials/prepare-information-files`.
#. **Add a prep information file to your study for each data type.** The prep
information file should contain all the samples in the sample information
file or a subset. If you have more than one FASTQ file set (forward,
reverse (optional) and barcodes) you will need to add a
:ref:`run_prefix <required-fields-for-preprocessing-target-gene-data>`
column.
A prep information file and a QIIME compatible mapping file will
be available for download after the prep information file is added
successfully.
#. **Upload and link your raw data to each of your prep information files.**
Depending on your barcoding/sequencing strategy you might need 1 or more
raw data file sets. If you have 2 raw data sets you may have to rename one
set so that each set has a different name. If they have the same name they
will over-write on upload. Note that you can have one FASTQ file set linked
to more than one prep information file.
#. **Preprocess your files.** For target gene amplicon sequencing, this will demux
and QC. There are multiple options for preprocessing depending on the
barcode format and the data output from the sequencing center - this may
require a series of trial and error to establish the correct option for
your data files. After demultiplexing a log file is generated with
statistics about the files demultiplexed including the number of sequences
assigned per sample.
#. **Process each of your preprocessed data types.** For target gene, this will
perform closed OTU picking against the latest version of Greengenes and can
be quite time consuming depending on the number of samples and the depth
of sequencing.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 24 additions & 21 deletions qiita_pet/support_files/doc/source/qiita-philosophy/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,44 +35,47 @@ can request an administrator to validate their study information and make it
private and possibly submit to a permanent repository, where it can also be
kept private until the user wants to make it public. At this stage in Qiita
the whole study (including all processed data) is private. This process is
completely automatic via the GUI. Currently sequence data is being deposited
for permanent storage to the European Nucleotide Archive (ENA), part of the
European Bioinformatics Institute (EBI). Finally, when the user is ready,
usually when the main manuscript of the study is ready for publication, the
user can request for the artifact to become public, both in Qiita and the
permanent repository, Figure 2.
completely automatic via the Graphical User Interface (GUI). Currently sequence
data is being deposited for permanent storage to the European Nucleotide
Archive (ENA), part of the European Bioinformatics Institute (EBI). Finally,
when the user is ready, usually when the main manuscript of the study is ready
for publication, the user can request for the artifact to be made public
public, both in Qiita and the permanent repository, Figure 2.


.. figure:: images/figure1.png
:align: center

**Figure 1. Qiita’s main structure: from single to multiple studies.** More
and more a simple study is composed by a multiple samples which have been
prepared chemically to identify diverse microbial composition parts of them.
For example, 16S to see which kind of bacteria lives on them, Metabolomics
to see the substance formed by the community, and/or ITS for the fungi.
Additionally, Qiita allows users to compare their studies with other public
ones already available in the system.
**Figure 1. Qiita’s main structure: from single to multiple studies.**
Increasingly, a simple study is composed of multiple samples which have
been prepared using different protocols to identify different microbial
features. For example, 16S rRNA amplification to identify the bacteria in
or on the sample, metabolomics to identify chemical components formed by
the microbial community or within the sample, and/or ITS amplification for
identification of fungal organisms that may also be present. Additionally,
Qiita allows users to compare their studies with other public ones already
available in the system.


.. figure:: images/figure2.png
:align: center

**Figure 2. Possible Qiita artifact states.** Artifacts are any file,
either uploaded by users or generated by the system. There are 3 possible
states: sanboxed, private and public. In the sandboxed and private states
no other user has access to the artifacts, except if the owner invites a
guest. In the public state, the artifact is open to all users in the
system, and the study can be searched from the study listing page.
states: sandboxed, private and public. In the sandboxed and private states
no other user has access to the artifacts, unless the owner grants access
by sharing the study. In the public state, the artifact is open to all
users in the system, and the study can be found by searching from the
study listing page.


Portals
-------

Qiita allows to host multiple portals within the same infrastructure. This
allows each portal to have a subset of studies in a different URL but sharing
the same resources. Sharing the same backend resources avoids having multiple
sites and data getting out of sync.
Qiita allows the hosting of multiple portals within the same infrastructure.
This allows each portal to have a subset of studies (often with a similar
theme) in a different URL but sharing the same resources. Sharing the same
backend resources avoids having multiple sites and data getting out of sync.

The current available portals are:

Expand Down
25 changes: 15 additions & 10 deletions qiita_pet/support_files/doc/source/tutorials/ebi-submission.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,30 @@

.. index:: ebi-submission

.. role:: red

EBI submission via Qiita
========================

Qiita allows users to deposit their study, sample, experiment and sequence data to the
`European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is a permanent repository
part of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to
`European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is the permanent data
repository of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to
this repository will provide you with a unique identifier for your study, which is generally a
requirement for publication.
requirement for publication. Your study will be housed with all other Qiita submissions
and so we require adherence to the MiXs standard.

EBI/ENA requires a given set of column fields to describe your samples and experiments, for more
information visit :doc:`prepare-templates` and pay most attention to EBI required fields,
information visit :doc:`prepare-information-files` and pay most attention to EBI required fields,
without these **Qiita Admins** will not be able to submit. If you want to submit your data or need
help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__.
help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__. Help will include
advice on additional fields to add to ensure MiXs compliance.

Note that this kind of submissions are time consuming and need full collaboration from the user.
Thus, do not wait until the last minute to request help. In general, the best time to request a submission
is when you are writing your paper. Remember that the data can be submitted to EBI and can be
kept private and simply make public when the paper is accepted. Note that EBI/ENA takes up to 15 days to
change the status from private to public, so consider this when submitting data and your manuscript.
Note that submissions are time consuming and need full collaboration from the user.
:red:`Do not wait until the last minute to request help.` In general, the best
time to request a submission is when you are writing your paper. Remember that the
data can be submitted to EBI and can be kept private and simply make public when
the paper is accepted. Note that EBI/ENA takes up to 15 days to change the status
from private to public, so consider this when submitting data and your manuscript.

.. note::
For convenience Qiita allows you to upload a QIIME mapping file to process your data. However,
Expand Down
Loading