initial commit

antgonza · antgonza · commit 6a0ff1c04cb4 · 2016-02-28T12:50:28.000-07:00
diff --git a/README.rst b/README.rst
@@ -46,6 +46,14 @@ Current features
 * Search over existing studies (see known issues).
 * Generate basic visualizations with the available studies and datasets.
 
+Accepted raw files
+------------------
+
+* Multiplexed SFF
+* Multiplexed FASTQ: forward, reverse (optional), and barcodes
+* Per sample FASTQ: forward
+* Multiplexed fasta/qual files.
+
 Known issues
 ------------
 
@@ -65,7 +73,7 @@ future.
   external sources. For example, metabolomics processing in
   `GNPS <http://gnps.ucsd.edu>`__ and data visualization in Qiita.
 * Creation of a REST API to query and access the data hosted by Qiita.
-* Improved analysis pipeline for 16S datasets.
+* Improved analysis pipeline for target gene datasets.
 * Crowd-sourced metadata curation of existing studies: improve the metadata of
   existing studies by submitting “fix proposals” to the authors of the study.
 
diff --git a/qiita_pet/support_files/doc/source/faq.rst b/qiita_pet/support_files/doc/source/faq.rst
@@ -4,26 +4,25 @@ Frequently Asked Questions
 What kind of data can I upload to Qiita for processing?
 -------------------------------------------------------
 
-We need 3 things: raw data, sample template, and prep template. At this
-moment, raw data is fastq files without demultiplexing with forward,
-reverse (optional) and barcode reads. We should have before the end of
-the week SFF processing so it's OK to upload. Note that we are accepting
-any kind of target gene (16S, 18S, ITS, whatever) as long as they have
-some kind of demultiplexing strategy and that you can also upload WGS.
-However, WGS processing is not ready.
+Processing in Qiita requires 3 things: raw data, sample and prep information
+files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__`
+a list of currently supported raw files files. Note that we are accepting
+any kind of target gene (16S, 18S, ITS, whatever) as long as there is
+some kind of demultiplexing strategy. You can also upload WGS however, WGS
+processing is not ready.
 
-What's the difference between a sample and a prep template?
------------------------------------------------------------
+What's the difference between a sample and a prep information file?
+-------------------------------------------------------------------
 
-Sample template is the information about your samples, including
-environmental and other important information about them. The prep
-template is basically what kind of wet lab work all or a subset of the
-samples had. If you collected 100 samples, you are going to need 100
-rows in your sample template describing each of them, this includes
-blanks, etc. Then you prepared 95 of them for 16S and 50 of them for
-18S. Thus, you are going to need 2 prep templates: one with 95 rows
-describing the preparation for 16S, and another one with 50 to
-describing the 18S. For a more complex example go
+Sample information file is the information about the samples, including
+environmental factors relating to the associated host. The prep information
+file has information on how the sample was processed in the wet lab. If you
+collected 100 samples for your study, you will need 100 rows in your sample
+information file describing each of them, and additional rows for blanks and other
+control samples. If you prepared 95 of them for 16S and 50 of them for 18S,
+you will need 2 prep information files: one with 95 rows describing the preparation
+for 16S, and another one with 50 describing the 18S. For a more complex
+example go
 `here <#h.eddzjlm5e6l6>`__ and for examples of these files you can go to
 the "Upload instructions"
 `here <https://www.google.com/url?q=https%3A%2F%2Fvamps.mbl.edu%2Fmobe_workshop%2Fwiki%2Findex.php%2FMain_Page&sa=D&sntz=1&usg=AFQjCNE4PTOKIvFNlWtHmJyLLy11mfzF8A>`__.
@@ -34,31 +33,32 @@ Example study processing workflow
 A few more instructions: for the example above the workflow should be:
 
 #. Create a new study
-#. Add a sample template, you can add 1, try to process it and the
+#. Add a sample information file, you can add 1, try to process it and the
    system will let you know if you have errors or missing columns. The
    most common errors are: the sample name column should be named
-   sample\_name, duplicated sample names are not permitted, and the prep
-   template should contain all the samples in the sample template or a
-   subset. Finally, if you haven't processed your sample templates and
-   can add a column to your template named sloan\_status with this info:
-   SLOAN (funded by Sloan), SLOAN\_COMPATIBLE (not Sloan funded but with
-   compatible metadata, usually public), NOT\_SLOAN (not included i.e.
-   private study), that will be great!
-#. Add a raw data. Depending on your barcoding/sequencing strategy you
-   might need 1 or 2 raw datas for the example above. If you have two
-   different fastq file sets (forward, reverse (optional) and barcodes)
-   you will need two raw datas but if you only have one set, you only
-   need one.
-#. You can link your raw data to your files
-#. You can add a prep template to your raw data. If you have the case
-   with only one fastq set (forward, reverse (optional) and barcodes),
-   you can add 2 different prep templates. Common missing fields here
-   are: emp\_status, center\_name, run\_prefix, platform,
-   library\_construction\_protocol, experiment\_design\_description,
-   center\_project\_name. Note that if you get a 500 error at this stage
-   is highly probable because emp\_status only accepts 3 values: 'EMP',
-   'EMP\_Processed', 'NOT\_EMP', if errors persist please do not
-   hesitate to contact us.
-#. You can preprocess your files. For target gene, this means
-   demultiplexing and QC.
-
+   sample\_name, duplicated sample names are not permitted. For a full list of
+   required fields, visit :doc:`tutorials/prepare-information-files`.
+#. Add a prep information file to your study for each data type. The prep
+   information file should contain all the samples in the sample information
+   file or a subset. If you have more than one FASTQ file set (forward,
+   reverse (optional) and barcodes) you will need to add a run_prefix
+   column. A prep information file and a QIIME compatible mapping file will
+   be available for download after the prep information file is added
+   successfully.
+#. Upload and link your raw data to each of your prep information files.
+   Depending on your barcoding/sequencing strategy you might need 1 or more
+   raw datas file sets. If you have 2 raw data sets you may have to rename one
+   set so that each set has a different name. If they have the same name they
+   will over-write on upload. Note that you can have one FASTQ file set linked
+   to more than one prep information file.
+#. Preprocess your files. For target gene amplicon sequencing, this will demux
+   and QC. There are multiple options for preprocessing depending on the
+   barcode format and the data output from the sequencing center - this may
+   require a series of trial and error to establish the correct option for
+   your data files. After demultiplexing a log file is generated with
+   statistics about the files demultiplexed including the number of sequences
+   assigned per sample.
+#. Process each of your preprocessed data types. For target gene, this will
+   perform close OTU picking against the latest version of Greengenes and can
+   be quite time consuming depending on the number of samples and the depth
+   of sequencing.
diff --git a/qiita_pet/support_files/doc/source/qiita-philosophy/images/figure1.png b/qiita_pet/support_files/doc/source/qiita-philosophy/images/figure1.png
diff --git a/qiita_pet/support_files/doc/source/qiita-philosophy/index.rst b/qiita_pet/support_files/doc/source/qiita-philosophy/index.rst
@@ -13,7 +13,7 @@ environments for each pipeline gives the freedom of adding any pipeline with
 any software dependencies to Qiita. Artifacts, basically any file in the
 system, from raw sequence to contingency tables or even data visualizations,
 permits the system to store any kind of data but also define within each
-pipelines which commands and parameters can applied to them.
+pipelines which commands and parameter can applied to them.
 
 The current plugins available are:
 
@@ -35,44 +35,47 @@ can request an administrator to validate their study information and make it
 private and possibly submit to a permanent repository, where it can also be
 kept private until the user wants to make it public. At this stage in Qiita
 the whole study (including all processed data) is private. This process is
-completely automatic via the GUI. Currently sequence data is being deposited
-for permanent storage to the European Nucleotide Archive (ENA), part of the
-European Bioinformatics Institute (EBI). Finally, when the user is ready,
-usually when the main manuscript of the study is ready for publication, the
-user can request for the artifact to become public, both in Qiita and the
-permanent repository, Figure 2.
+completely automatic via the Graphical User Interface (GUI). Currently sequence
+data is being deposited for permanent storage to the European Nucleotide
+Archive (ENA), part of the European Bioinformatics Institute (EBI). Finally,
+when the user is ready, usually when the main manuscript of the study is ready
+for publication, the user can request for the artifact to be made public
+public, both in Qiita and the permanent repository, Figure 2.
 
 
 .. figure::  images/figure1.png
    :align:   center
 
-   **Figure 1. Qiita’s main structure: from single to multiple studies.** More
-   and more a simple study is composed by a multiple samples which have been
-   prepared chemically to identify diverse microbial composition parts of them.
-   For example, 16S to see which kind of bacteria lives on them, Metabolomics
-   to see the substance formed by the community, and/or ITS for the fungi.
-   Additionally, Qiita allows users to compare their studies with other public
-   ones already available in the system.
+   **Figure 1. Qiita’s main structure: from single to multiple studies.**
+   Increasingly, a simple study is composed of multiple samples which have
+   been prepared using different protocols to identify different microbial
+   features. For example, 16S rRNA amplification to identify the bacteria in
+   or on the sample, metabolomics to identify chemical components formed by
+   the microbial community or within the sample, and/or ITS amplification for
+   identification of fungal organisms that may also be present. Additionally,
+   Qiita allows users to compare their studies with other public ones already
+   available in the system.
 
 
 .. figure::  images/figure2.png
    :align:   center
 
    **Figure 2. Possible Qiita artifact states.** Artifacts are any file,
    either uploaded by users or generated by the system. There are 3 possible
-   states: sanboxed, private and public. In the sandboxed and private states
-   no other user has access to the artifacts, except if the owner invites a
-   guest. In the public state, the artifact is open to all users in the
-   system, and the study can be searched from the study listing page.
+   states: sandboxed, private and public. In the sandboxed and private states
+   no other user has access to the artifacts, unless the owner grants access
+   by sharing the study. In the public state, the artifact is open to all
+   users in the system, and the study can be found by searching from the
+   study listing page.
 
 
 Portals
 -------
 
-Qiita allows to host multiple portals within the same infrastructure. This
-allows each portal to have a subset of studies in a different URL but sharing
-the same resources. Sharing the same backend resources avoids having multiple
-sites and data getting out of sync.
+Qiita allows the hosting of multiple portals within the same infrastructure.
+This allows each portal to have a subset of studies (often with a similar
+theme) in a different URL but sharing the same resources. Sharing the same
+backend resources avoids having multiple sites and data getting out of sync.
 
 The current available portals are:
 
diff --git a/qiita_pet/support_files/doc/source/tutorials/ebi-submission.rst b/qiita_pet/support_files/doc/source/tutorials/ebi-submission.rst
@@ -6,18 +6,20 @@ EBI submission via Qiita
 ========================
 
 Qiita allows users to deposit their study, sample, experiment and sequence data to the
-`European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is a permanent repository
-part of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to
+`European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is the permanent data
+repository of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to
 this repository will provide you with a unique identifier for your study, which is generally a
-requirement for publication.
+requirement for publication. Your study will be housed with all other Qiita submissions
+and so we require adherence to the MiXs standard.
 
 EBI/ENA requires a given set of column fields to describe your samples and experiments, for more
-information visit :doc:`prepare-templates` and pay most attention to EBI required fields,
+information visit :doc:`prepare-information-files` and pay most attention to EBI required fields,
 without these **Qiita Admins** will not be able to submit. If you want to submit your data or need
-help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__.
+help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__. Help will include
+advice on additional fields to add to ensure MiXs compliance.
 
-Note that this kind of submissions are time consuming and need full collaboration from the user.
-Thus, do not wait until the last minute to request help. In general, the best time to request a submission
+Note that submissions are time consuming and need full collaboration from the user.
+Do not wait until the last minute to request help. In general, the best time to request a submission
 is when you are writing your paper. Remember that the data can be submitted to EBI and can be
 kept private and simply make public when the paper is accepted. Note that EBI/ENA takes up to 15 days to
 change the status from private to public, so consider this when submitting data and your manuscript.
diff --git a/qiita_pet/support_files/doc/source/tutorials/getting-started.rst b/qiita_pet/support_files/doc/source/tutorials/getting-started.rst
@@ -155,7 +155,7 @@ from another page.
 Once your file(s) have been uploaded, you can process them in Qiita.
 From the upload tool, click on “Go to study description” and, once
 there, click on the “Sample template” tab.  Select your sample template
-from the dropdown menu and, lastly, click “Process sample template”. 
+from the dropdown menu and, lastly, click “Process sample template”.
 
 .. figure::  images/process-sample-template.png
    :align:   center
@@ -230,9 +230,8 @@ Preprocessing data
 
 Once you have linked files to your raw data and your prep template has
 been processed, you can then proceed to preprocessing your data.
-Currently we only support fastq files for target gene preprocessing
-(including reverse complementing the prep template barcodes). We are
-working on adding more options and preprocessing pipelines.
+`Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__`
+a list of currently supported raw files files.
 
 .. figure::  images/image08.png
    :align:   center
@@ -309,7 +308,8 @@ Study status
 
 -  Sandbox. When a study is in this status, all the required metadata
    columns must be present in the metadata files (sample and prep), but
-   the values don't have to be filled in or finalized yet. The purpose
+   the values don't have to be filled in or finalized yet. We suggest adding
+   TBD as the temporal values of these fields. The purpose
    of this status is so that users can quickly upload their sequence
    files and some (possibly incomplete) metadata in order to have a
    preliminary look at their data.
@@ -323,5 +323,6 @@ Study status
 -  Public. Once a study is made administrator-approved and becomes
    private, the user can choose when to make it public. Making a study
    public means that it will be available to anyone with a Qiita user
-   account (e.g., for data downloads and meta-analyses).
-
+   account (e.g., for data downloads and meta-analyses). When a study
+   is public it cannot be changed. All associated templates will be public
+   as well.
diff --git a/qiita_pet/support_files/doc/source/tutorials/index.rst b/qiita_pet/support_files/doc/source/tutorials/index.rst
@@ -7,12 +7,12 @@ The following is a full list of the available tutorials:
    :maxdepth: 2
 
    account-creation
-   prepare-templates
+   prepare-information-files
    ebi-submission
    getting-started
    analyze-data
    no-raw-sequences
-   join-pair-ends
+   join-paired-end-reads
 
 To request documentation on any administration use-cases not addressed here,
 please add an issue `here <https://github.com/biocore/qiita/issues>`__.
diff --git a/qiita_pet/support_files/doc/source/tutorials/join-paired-end-reads.rst b/qiita_pet/support_files/doc/source/tutorials/join-paired-end-reads.rst
@@ -1,9 +1,9 @@
-.. _join-pair-ends:
+.. _join-paired-end-reads:
 
-.. index:: join-pair-ends
+.. index:: join-paired-end-reads
 
-Join pair ends
-==============
+Join paired end reads
+=====================
 
 Having high quality, longer reads helps with taxonomy assignment and classification.
 Thus, if your forward and reverse reads overlap you should join them. Note that this
@@ -20,7 +20,7 @@ Joining forward and reverse reads for raw files
 
 You could use `join_paired_ends.py <http://qiime.org/scripts/join_paired_ends.html>`__
 and then upload your joined sequence and barcode files for processing. Then you
-will upload your resulted joined file to Qiita.
+will upload the resulting joined file to Qiita.
 
 .. _join_forward_and_reverse_reads_for_per_sample_fastq_files_without_barcodes_and_primers:
 
@@ -29,7 +29,7 @@ Joining forward and reverse reads for per sample FASTQ files without barcodes an
 
 You could use `multiple_join_paired_ends.py <http://qiime.org/scripts/multiple_join_paired_ends.html>`__
 and then upload your joined sequence and barcode files for processing. Then you
-will upload your resulted joined per sample files to Qiita.
+will upload the resulting joined per sample files to Qiita.
 
 
 .. _per_sample_fastq_files_without_barcodes_but_with_primer_information_with_overlapping_regions:
@@ -41,7 +41,8 @@ To process this kind of files you will need to run two steps:
 
 #. Run multiple_join_paired_ends.py to stitch the reads. See
    `multiple_join_paired_ends.py <http://qiime.org/scripts/multiple_join_paired_ends.html>`__.
-#. Run multiple_extract_barcodes.py to strip out the primers. You need to use a parameter file with:
+#. Run multiple_extract_barcodes.py to strip out the primers. You will need to use a
+   parameter file with:
 
    .. code:: bash
 
diff --git a/qiita_pet/support_files/doc/source/tutorials/no-raw-sequences.rst b/qiita_pet/support_files/doc/source/tutorials/no-raw-sequences.rst
diff --git a/qiita_pet/support_files/doc/source/tutorials/prepare-information-files.rst b/qiita_pet/support_files/doc/source/tutorials/prepare-information-files.rst