@@ -4,26 +4,24 @@ Frequently Asked Questions
44What kind of data can I upload to Qiita for processing?
55------------------------------------------------------- 
66
7- We need 3 things: raw data, sample template, and prep template. At this
8- moment, raw data is fastq files without demultiplexing with forward,
9- reverse (optional) and barcode reads. We should have before the end of
10- the week SFF processing so it's OK to upload. Note that we are accepting
11- any kind of target gene (16S, 18S, ITS, whatever) as long as they have
12- some kind of demultiplexing strategy and that you can also upload WGS.
13- However, WGS processing is not ready.
7+ Processing in Qiita requires 3 things: raw data, sample and prep information
8+ files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__ `
9+ you can find a list of currently supported raw files files. Note that we are
10+ accepting any kind of target gene (16S, 18S, ITS, whatever). You can also upload
11+ WGS however, WGS processing is not ready.
1412
15- What's the difference between a sample and a prep template ?
16- ----------------------------------------------------------- 
13+ What's the difference between a sample and a prep information file ?
14+ -------------------------------------------------------------------  
1715
18- Sample template is  the information about your samples , including
19- environmental and other important information about them . The prep
20- template is basically what kind of wet lab work all or a subset of the 
21- samples had. If you  collected 100 samples, you are going to  need 100
22- rows in your sample template  describing each of them, this includes 
23- blanks, etc. Then  you prepared 95 of them for 16S and 50 of them for
24- 18S. Thus,  you are going to  need 2 prep templates : one with 95 rows
25- describing the preparation  for 16S, and another one with 50 to 
26- describing the 18S. For a more complex  example go
16+ A sample information file describes  the samples in a study , including
17+ environmental factors relating to the associated host . The prep information 
18+ file has information on how the sample was processed in the wet lab. If you 
19+ collected 100 samples for your study, you will  need 100 rows in your sample 
20+ information file  describing each of them, and additional rows for blanks and other 
21+ control samples. If  you prepared 95 of them for 16S and 50 of them for 18S, 
22+ you will  need 2 prep information files : one with 95 rows describing the preparation 
23+ for 16S, and another one with 50 describing the 18S. For a more complex 
24+ example go
2725`here <#h.eddzjlm5e6l6>`__ and for examples of these files you can go to
2826the "Upload instructions"
2927`here  <https://www.google.com/url?q=https%3A%2F%2Fvamps.mbl.edu%2Fmobe_workshop%2Fwiki%2Findex.php%2FMain_Page&sa=D&sntz=1&usg=AFQjCNE4PTOKIvFNlWtHmJyLLy11mfzF8A >`__.
@@ -33,32 +31,35 @@ Example study processing workflow
3331
3432A few more instructions: for the example above the workflow should be:
3533
36- #. Create a new study
37- #. Add a sample template, you  can add 1, try to process it and the
34+ #. ** Create a new study.  ** 
35+ #. ** Add a sample information file.  ** You  can add 1, try to process it and the
3836   system will let you know if you have errors or missing columns. The
3937   most common errors are: the sample name column should be named
40-    sample\_ name, duplicated sample names are not permitted, and the prep
41-    template should contain all the samples in the sample template or a
42-    subset. Finally, if you haven't processed your sample templates and
43-    can add a column to your template named sloan\_ status with this info:
44-    SLOAN (funded by Sloan), SLOAN\_ COMPATIBLE (not Sloan funded but with
45-    compatible metadata, usually public), NOT\_ SLOAN (not included i.e.
46-    private study), that will be great!
47- #. Add a raw data. Depending on your barcoding/sequencing strategy you
48-    might need 1 or 2 raw datas for the example above. If you have two
49-    different fastq file sets (forward, reverse (optional) and barcodes)
50-    you will need two raw datas but if you only have one set, you only
51-    need one.
52- #. You can link your raw data to your files
53- #. You can add a prep template to your raw data. If you have the case
54-    with only one fastq set (forward, reverse (optional) and barcodes),
55-    you can add 2 different prep templates. Common missing fields here
56-    are: emp\_ status, center\_ name, run\_ prefix, platform,
57-    library\_ construction\_ protocol, experiment\_ design\_ description,
58-    center\_ project\_ name. Note that if you get a 500 error at this stage
59-    is highly probable because emp\_ status only accepts 3 values: 'EMP',
60-    'EMP\_ Processed', 'NOT\_ EMP', if errors persist please do not
61-    hesitate to contact us.
62- #. You can preprocess your files. For target gene, this means
63-    demultiplexing and QC.
64- 
38+    sample\_ name, duplicated sample names are not permitted. For a full list of
39+    required fields, visit :doc: `tutorials/prepare-information-files `.
40+ #. **Add a prep information file to your study for each data type. ** The prep
41+    information file should contain all the samples in the sample information
42+    file or a subset. If you have more than one FASTQ file set (forward,
43+    reverse (optional) and barcodes) you will need to add a
44+    :ref: `run_prefix  <required-fields-for-preprocessing-target-gene-data >`
45+    column.
46+    A prep information file and a QIIME compatible mapping file will
47+    be available for download after the prep information file is added
48+    successfully.
49+ #. **Upload and link your raw data to each of your prep information files. **
50+    Depending on your barcoding/sequencing strategy you might need 1 or more
51+    raw data file sets. If you have 2 raw data sets you may have to rename one
52+    set so that each set has a different name. If they have the same name they
53+    will over-write on upload. Note that you can have one FASTQ file set linked
54+    to more than one prep information file.
55+ #. **Preprocess your files. ** For target gene amplicon sequencing, this will demux
56+    and QC. There are multiple options for preprocessing depending on the
57+    barcode format and the data output from the sequencing center - this may
58+    require a series of trial and error to establish the correct option for
59+    your data files. After demultiplexing a log file is generated with
60+    statistics about the files demultiplexed including the number of sequences
61+    assigned per sample.
62+ #. **Process each of your preprocessed data types. ** For target gene, this will
63+    perform closed OTU picking against the latest version of Greengenes and can
64+    be quite time consuming depending on the number of samples and the depth
65+    of sequencing.
0 commit comments