@@ -6,15 +6,14 @@ What kind of data can I upload to Qiita for processing?
66
77Processing in Qiita requires 3 things: raw data, sample and prep information
88files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__ `
9- a list of currently supported raw files files. Note that we are accepting
10- any kind of target gene (16S, 18S, ITS, whatever) as long as there is
11- some kind of demultiplexing strategy. You can also upload WGS however, WGS
12- processing is not ready.
9+ you can find a list of currently supported raw files files. Note that we are
10+ accepting any kind of target gene (16S, 18S, ITS, whatever). You can also upload
11+ WGS however, WGS processing is not ready.
1312
1413What's the difference between a sample and a prep information file?
1514-------------------------------------------------------------------
1615
17- Sample information file is the information about the samples , including
16+ A sample information file describes the samples in a study , including
1817environmental factors relating to the associated host. The prep information
1918file has information on how the sample was processed in the wet lab. If you
2019collected 100 samples for your study, you will need 100 rows in your sample
@@ -32,33 +31,35 @@ Example study processing workflow
3231
3332A few more instructions: for the example above the workflow should be:
3433
35- #. Create a new study
36- #. Add a sample information file, you can add 1, try to process it and the
34+ #. ** Create a new study. **
35+ #. ** Add a sample information file. ** You can add 1, try to process it and the
3736 system will let you know if you have errors or missing columns. The
3837 most common errors are: the sample name column should be named
3938 sample\_ name, duplicated sample names are not permitted. For a full list of
4039 required fields, visit :doc: `tutorials/prepare-information-files `.
41- #. Add a prep information file to your study for each data type. The prep
40+ #. ** Add a prep information file to your study for each data type. ** The prep
4241 information file should contain all the samples in the sample information
4342 file or a subset. If you have more than one FASTQ file set (forward,
44- reverse (optional) and barcodes) you will need to add a run_prefix
45- column. A prep information file and a QIIME compatible mapping file will
43+ reverse (optional) and barcodes) you will need to add a
44+ :ref: `run_prefix <required-fields-for-preprocessing-target-gene-data >`
45+ column.
46+ A prep information file and a QIIME compatible mapping file will
4647 be available for download after the prep information file is added
4748 successfully.
48- #. Upload and link your raw data to each of your prep information files.
49+ #. ** Upload and link your raw data to each of your prep information files. **
4950 Depending on your barcoding/sequencing strategy you might need 1 or more
50- raw datas file sets. If you have 2 raw data sets you may have to rename one
51+ raw data file sets. If you have 2 raw data sets you may have to rename one
5152 set so that each set has a different name. If they have the same name they
5253 will over-write on upload. Note that you can have one FASTQ file set linked
5354 to more than one prep information file.
54- #. Preprocess your files. For target gene amplicon sequencing, this will demux
55+ #. ** Preprocess your files. ** For target gene amplicon sequencing, this will demux
5556 and QC. There are multiple options for preprocessing depending on the
5657 barcode format and the data output from the sequencing center - this may
5758 require a series of trial and error to establish the correct option for
5859 your data files. After demultiplexing a log file is generated with
5960 statistics about the files demultiplexed including the number of sequences
6061 assigned per sample.
61- #. Process each of your preprocessed data types. For target gene, this will
62- perform close OTU picking against the latest version of Greengenes and can
62+ #. ** Process each of your preprocessed data types. ** For target gene, this will
63+ perform closed OTU picking against the latest version of Greengenes and can
6364 be quite time consuming depending on the number of samples and the depth
6465 of sequencing.
0 commit comments