@@ -4,26 +4,25 @@ Frequently Asked Questions
44What kind of data can I upload to Qiita for processing?
55-------------------------------------------------------
66
7- We need 3 things: raw data, sample template, and prep template. At this
8- moment, raw data is fastq files without demultiplexing with forward,
9- reverse (optional) and barcode reads. We should have before the end of
10- the week SFF processing so it's OK to upload. Note that we are accepting
11- any kind of target gene (16S, 18S, ITS, whatever) as long as they have
12- some kind of demultiplexing strategy and that you can also upload WGS.
13- However, WGS processing is not ready.
7+ Processing in Qiita requires 3 things: raw data, sample and prep information
8+ files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__ `
9+ a list of currently supported raw files files. Note that we are accepting
10+ any kind of target gene (16S, 18S, ITS, whatever) as long as there is
11+ some kind of demultiplexing strategy. You can also upload WGS however, WGS
12+ processing is not ready.
1413
15- What's the difference between a sample and a prep template ?
16- -----------------------------------------------------------
14+ What's the difference between a sample and a prep information file ?
15+ -------------------------------------------------------------------
1716
18- Sample template is the information about your samples, including
19- environmental and other important information about them . The prep
20- template is basically what kind of wet lab work all or a subset of the
21- samples had. If you collected 100 samples, you are going to need 100
22- rows in your sample template describing each of them, this includes
23- blanks, etc. Then you prepared 95 of them for 16S and 50 of them for
24- 18S. Thus, you are going to need 2 prep templates : one with 95 rows
25- describing the preparation for 16S, and another one with 50 to
26- describing the 18S. For a more complex example go
17+ Sample information file is the information about the samples, including
18+ environmental factors relating to the associated host . The prep information
19+ file has information on how the sample was processed in the wet lab. If you
20+ collected 100 samples for your study, you will need 100 rows in your sample
21+ information file describing each of them, and additional rows for blanks and other
22+ control samples. If you prepared 95 of them for 16S and 50 of them for 18S,
23+ you will need 2 prep information files : one with 95 rows describing the preparation
24+ for 16S, and another one with 50 describing the 18S. For a more complex
25+ example go
2726`here <#h.eddzjlm5e6l6>`__ and for examples of these files you can go to
2827the "Upload instructions"
2928`here <https://www.google.com/url?q=https%3A%2F%2Fvamps.mbl.edu%2Fmobe_workshop%2Fwiki%2Findex.php%2FMain_Page&sa=D&sntz=1&usg=AFQjCNE4PTOKIvFNlWtHmJyLLy11mfzF8A >`__.
@@ -34,31 +33,32 @@ Example study processing workflow
3433A few more instructions: for the example above the workflow should be:
3534
3635#. Create a new study
37- #. Add a sample template , you can add 1, try to process it and the
36+ #. Add a sample information file , you can add 1, try to process it and the
3837 system will let you know if you have errors or missing columns. The
3938 most common errors are: the sample name column should be named
40- sample\_ name, duplicated sample names are not permitted, and the prep
41- template should contain all the samples in the sample template or a
42- subset. Finally, if you haven't processed your sample templates and
43- can add a column to your template named sloan\_ status with this info:
44- SLOAN (funded by Sloan), SLOAN\_ COMPATIBLE (not Sloan funded but with
45- compatible metadata, usually public), NOT\_ SLOAN (not included i.e.
46- private study), that will be great!
47- #. Add a raw data. Depending on your barcoding/sequencing strategy you
48- might need 1 or 2 raw datas for the example above. If you have two
49- different fastq file sets (forward, reverse (optional) and barcodes)
50- you will need two raw datas but if you only have one set, you only
51- need one.
52- #. You can link your raw data to your files
53- #. You can add a prep template to your raw data. If you have the case
54- with only one fastq set (forward, reverse (optional) and barcodes),
55- you can add 2 different prep templates. Common missing fields here
56- are: emp\_ status, center\_ name, run\_ prefix, platform,
57- library\_ construction\_ protocol, experiment\_ design\_ description,
58- center\_ project\_ name. Note that if you get a 500 error at this stage
59- is highly probable because emp\_ status only accepts 3 values: 'EMP',
60- 'EMP\_ Processed', 'NOT\_ EMP', if errors persist please do not
61- hesitate to contact us.
62- #. You can preprocess your files. For target gene, this means
63- demultiplexing and QC.
64-
39+ sample\_ name, duplicated sample names are not permitted. For a full list of
40+ required fields, visit :doc: `tutorials/prepare-information-files `.
41+ #. Add a prep information file to your study for each data type. The prep
42+ information file should contain all the samples in the sample information
43+ file or a subset. If you have more than one FASTQ file set (forward,
44+ reverse (optional) and barcodes) you will need to add a run_prefix
45+ column. A prep information file and a QIIME compatible mapping file will
46+ be available for download after the prep information file is added
47+ successfully.
48+ #. Upload and link your raw data to each of your prep information files.
49+ Depending on your barcoding/sequencing strategy you might need 1 or more
50+ raw datas file sets. If you have 2 raw data sets you may have to rename one
51+ set so that each set has a different name. If they have the same name they
52+ will over-write on upload. Note that you can have one FASTQ file set linked
53+ to more than one prep information file.
54+ #. Preprocess your files. For target gene amplicon sequencing, this will demux
55+ and QC. There are multiple options for preprocessing depending on the
56+ barcode format and the data output from the sequencing center - this may
57+ require a series of trial and error to establish the correct option for
58+ your data files. After demultiplexing a log file is generated with
59+ statistics about the files demultiplexed including the number of sequences
60+ assigned per sample.
61+ #. Process each of your preprocessed data types. For target gene, this will
62+ perform close OTU picking against the latest version of Greengenes and can
63+ be quite time consuming depending on the number of samples and the depth
64+ of sequencing.
0 commit comments