Skip to content

Commit c33c9de

Browse files
Greg Caporasojairideout
authored andcommitted
ENH: update atacama tutorial (qiime2#87)
Update tutorial to cover only paired-end relevant commands and use smaller data set.
1 parent e4bafae commit c33c9de

File tree

1 file changed

+62
-116
lines changed

1 file changed

+62
-116
lines changed

source/tutorials/atacama-soils.rst

Lines changed: 62 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
"Atacama soil microbiome" tutorial
22
==================================
33

4-
.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
4+
.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>` and completed the :doc:`moving pictures tutorial <moving-pictures>`.
55

6-
This tutorial is designed to be a self-guided exercise that could be run after :doc:`the moving pictures tutorial <moving-pictures>` to gain more experience with QIIME 2. The data in this tutorial is paired-end Illumina MiSeq data, so this tutorial is also useful for learning how to work with paired-end data in QIIME 2.
6+
This tutorial is designed to serve two purposes. First, it illustrates the initial processing steps of paired-end read analysis, up to the point where the analysis steps are identical to single-end read analysis. This includes the importing, demultiplexing, and denoising steps, and results in a feature table and the associated feature sequences. Second, this is intended to be a self-guided exercise that could be run after :doc:`the moving pictures tutorial <moving-pictures>` to gain more experience with QIIME 2. For this exercise, we provide some questions that can be used to guide your analysis, but do not provide commands that will allow you to address each. Instead, you should apply the commands that you learned in :doc:`the moving pictures tutorial <moving-pictures>`.
77

8-
In this tutorial you'll use QIIME 2 to perform an analysis of soil samples from the Atacama Desert in northern Chile...
8+
In this tutorial you'll use QIIME 2 to perform an analysis of soil samples from the Atacama Desert in northern Chile. The Atacama Desert is one of the most arid locations on Earth, with some areas receiving less than a millimeter of rain per decade. Despite this extreme aridity, there are microbes living in the soil. The soil microbiomes profiled in this study follow two east-west transects, *Baquedano* and *Yungay*, across which average soil relative humidity is positively correlated with elevation (higher elevations are less arid and thus have higher average soil relative humidity). Along these transects, pits were dug at each site and soil samples were collected from three depths in each pit.
99

1010
Download data files
1111
-------------------
@@ -16,28 +16,54 @@ Before starting the analysis, explore the sample metadata to familiarize yoursel
1616
:url: https://docs.google.com/spreadsheets/d/1xMP1EjKZDrzdKLnQr7LGVAY35ongxrreT28k0EACtfg/export?gid=0&format=tsv
1717
:saveas: sample-metadata.tsv
1818

19+
20+
Next, you'll download the multiplexed reads. You will download three ``fastq.gz`` files, corresponding to the forward, reverse, and barcode (i.e., index) reads. These files contain a subset of the reads in the full data set generated for this study, which allows for the following commands to be run relatively quickly. If you are only planning to run through the commands presented here to get experience with the first steps of paired-end read analysis, you can use the 1% subsample data set so that the commands will run quickly. If you're planning to work through the questions presented at the end of this document to gain more experience with QIIME analysis and data interpretation, you should use the 10% subsample data set so that the analysis results will be supported by more sequence data.
21+
22+
1% subsample data
23+
~~~~~~~~~~~~~~~~~
24+
25+
.. command-block::
26+
27+
mkdir emp-paired-end-sequences
28+
29+
.. download::
30+
:url: https://dl.dropboxusercontent.com/u/2868868/data/qiime2/tutorials/importing-sequence-data/2017.2/emp-paired-end-sequences/atacama-1p/forward.fastq.gz
31+
:saveas: emp-paired-end-sequences/forward.fastq.gz
32+
33+
.. download::
34+
:url: https://dl.dropboxusercontent.com/u/2868868/data/qiime2/tutorials/importing-sequence-data/2017.2/emp-paired-end-sequences/atacama-1p/reverse.fastq.gz
35+
:saveas: emp-paired-end-sequences/reverse.fastq.gz
36+
37+
.. download::
38+
:url: https://dl.dropboxusercontent.com/u/2868868/data/qiime2/tutorials/importing-sequence-data/2017.2/emp-paired-end-sequences/atacama-1p/barcodes.fastq.gz
39+
:saveas: emp-paired-end-sequences/barcodes.fastq.gz
40+
41+
10% subsample data
42+
~~~~~~~~~~~~~~~~~~
43+
1944
.. command-block::
2045

2146
mkdir emp-paired-end-sequences
2247

2348
.. download::
49+
:no-exec:
2450
:url: https://dl.dropboxusercontent.com/u/2868868/data/qiime2/tutorials/importing-sequence-data/2017.2/emp-paired-end-sequences/atacama-10p/forward.fastq.gz
2551
:saveas: emp-paired-end-sequences/forward.fastq.gz
2652

2753
.. download::
54+
:no-exec:
2855
:url: https://dl.dropboxusercontent.com/u/2868868/data/qiime2/tutorials/importing-sequence-data/2017.2/emp-paired-end-sequences/atacama-10p/reverse.fastq.gz
2956
:saveas: emp-paired-end-sequences/reverse.fastq.gz
3057

3158
.. download::
59+
:no-exec:
3260
:url: https://dl.dropboxusercontent.com/u/2868868/data/qiime2/tutorials/importing-sequence-data/2017.2/emp-paired-end-sequences/atacama-10p/barcodes.fastq.gz
3361
:saveas: emp-paired-end-sequences/barcodes.fastq.gz
3462

35-
.. download::
36-
:url: https://data.qiime2.org/2.0.6/common/silva-119-99-full-length-nb-classifier.qza
37-
:saveas: silva-119-99-full-length-nb-classifier.qza
63+
Paired-end read analysis commands
64+
---------------------------------
3865

39-
Analysis commands
40-
-----------------
66+
To analyze these data, the sequences that you just downloaded must first be imported into an artifact of type ``EMPPairedEndSequences``.
4167

4268
.. command-block::
4369

@@ -46,6 +72,10 @@ Analysis commands
4672
--input-path emp-paired-end-sequences \
4773
--output-path emp-paired-end-sequences.qza
4874

75+
You next can demultiplex the sequence reads. This requires the sample metadata file, and you must indicate which column in that file contains the per-sample barcodes. In this case, that column name is ``BarcodeSequence``. In this data set, the barcode reads are the reverse complement of those included in the sample metadata file, so we additionally include the ``--p-rev-comp-mapping-barcodes`` parameter. After demultiplexing, we can generate and view a summary of how many sequences were obtained per sample.
76+
77+
.. command-block::
78+
4979
qiime demux emp-paired \
5080
--m-barcodes-file sample-metadata.tsv \
5181
--m-barcodes-category BarcodeSequence \
@@ -55,7 +85,17 @@ Analysis commands
5585

5686
qiime demux summarize \
5787
--i-data demux.qza \
58-
--o-visualization demux.qzv \
88+
--o-visualization demux.qzv
89+
90+
After demultiplexing reads, we'll look at the sequence quality based on ten randomly selected samples, and then denoise the data. When you view the quality plots, note that in contrast to the corresponding plots in :doc:`the moving pictures tutorial <moving-pictures>`, there are now two plots per sample. The plot on the left presents the quality scores for the forward reads, and the plot on the right presents the quality scores for the reverse reads. We'll use these plots to determine what trimming parameters we want to use for denoising with DADA2, and then denoise the reads using ``dada2 denoise-paired``.
91+
92+
In this example we have 150-base forward and reverse reads. Since we need the reads to be long enough to overlap when joining paired ends, the first ten bases of the forward and reverse reads are being trimmed, but no trimming is being applied to the ends of the sequences to avoid reducing the read length by too much. In this example, the same values are being provided for ``--p-trim-left-f`` and ``--p-trim-left-f`` and for ``--p-trunc-len-f`` and ``--p-trunc-len-r``, but that is not a requirement.
93+
94+
.. command-block::
95+
96+
qiime dada2 plot-qualities \
97+
--i-demultiplexed-seqs demux.qza \
98+
--o-visualization demux-qualities.qzv \
5999
--p-n 10
60100

61101
qiime dada2 denoise-paired \
@@ -65,111 +105,14 @@ Analysis commands
65105
--p-trim-left-f 10 \
66106
--p-trim-left-r 10 \
67107
--p-trunc-len-f 150 \
68-
--p-trunc-len-r 150 \
69-
--p-n-threads 0 \
70-
--p-n-reads-learn 100000
71-
72-
qiime feature-table summarize \
73-
--i-table table.qza \
74-
--o-visualization table.qzv
75-
76-
qiime feature-table tabulate-seqs \
77-
--i-data rep-seqs.qza \
78-
--o-visualization rep-seqs.qzv
79-
80-
qiime alignment mafft \
81-
--i-sequences rep-seqs.qza \
82-
--o-alignment aligned-rep-seqs.qza
83-
84-
qiime alignment mask \
85-
--i-alignment aligned-rep-seqs.qza \
86-
--o-masked-alignment masked-aligned-rep-seqs.qza
87-
88-
qiime phylogeny fasttree \
89-
--i-alignment masked-aligned-rep-seqs.qza \
90-
--o-tree unrooted-tree.qza
91-
92-
qiime phylogeny midpoint-root \
93-
--i-tree unrooted-tree.qza \
94-
--o-rooted-tree rooted-tree.qza
95-
96-
qiime diversity core-metrics \
97-
--i-phylogeny rooted-tree.qza \
98-
--i-table table.qza \
99-
--p-sampling-depth 2026 \
100-
--output-dir cm2026
101-
102-
qiime diversity alpha-group-significance \
103-
--i-alpha-diversity cm2026/faith_pd_vector.qza \
104-
--m-metadata-file sample-metadata.tsv \
105-
--o-visualization cm2026/faith-pd-group-significance.qzv
106-
107-
qiime diversity alpha-group-significance \
108-
--i-alpha-diversity cm2026/observed_otus_vector.qza \
109-
--m-metadata-file sample-metadata.tsv \
110-
--o-visualization cm2026/observed-otus-group-significance.qzv
111-
112-
qiime diversity alpha-group-significance \
113-
--i-alpha-diversity cm2026/evenness_vector.qza \
114-
--m-metadata-file sample-metadata.tsv \
115-
--o-visualization cm2026/evenness-group-significance.qzv
116-
117-
qiime diversity alpha-correlation \
118-
--i-alpha-diversity cm2026/faith_pd_vector.qza \
119-
--m-metadata-file sample-metadata.tsv \
120-
--o-visualization cm2026/faith-pd-correlation.qzv
121-
122-
qiime diversity alpha-correlation \
123-
--i-alpha-diversity cm2026/evenness_vector.qza \
124-
--m-metadata-file sample-metadata.tsv \
125-
--o-visualization cm2026/evenness-correlation.qzv
126-
127-
qiime emperor plot \
128-
--i-pcoa cm2026/unweighted_unifrac_pcoa_results.qza \
129-
--m-metadata-file sample-metadata.tsv \
130-
--o-visualization cm2026/unweighted-unifrac-emperor.qzv
131-
132-
qiime diversity bioenv \
133-
--i-distance-matrix cm2026/unweighted_unifrac_distance_matrix.qza \
134-
--m-metadata-file sample-metadata.tsv \
135-
--o-visualization cm2026/unweighted-unifrac-bioenv.qzv
136-
137-
qiime feature-classifier classify \
138-
--i-classifier silva-119-99-full-length-nb-classifier.qza \
139-
--i-reads rep-seqs.qza \
140-
--o-classification taxonomy.qza
141-
142-
qiime taxa tabulate \
143-
--i-data taxonomy.qza \
144-
--o-visualization taxonomy.qzv
145-
146-
qiime taxa barplot \
147-
--i-table table.qza \
148-
--i-taxonomy taxonomy.qza \
149-
--m-metadata-file sample-metadata.tsv \
150-
--o-visualization taxa-bar-plots.qzv
151-
152-
qiime taxa collapse \
153-
--i-table table.qza \
154-
--i-taxonomy taxonomy.qza \
155-
--p-level 2 \
156-
--o-collapsed-table table-l2.qza
157-
158-
qiime composition add-pseudocount \
159-
--i-table table-l2.qza \
160-
--o-composition-table comp-table-l2.qza
161-
162-
qiime composition ancom \
163-
--i-table comp-table-l2.qza \
164-
--m-metadata-file sample-metadata.tsv \
165-
--m-metadata-file sample-metadata.tsv \
166-
--m-metadata-category Vegetation \
167-
--o-visualization l2-ancom-Vegetation.qzv
168-
169-
Sequence processing and diversity analyses
170-
------------------------------------------
171-
172-
Use the following questions to guide your analyses of the data.
108+
--p-trunc-len-r 150
109+
110+
At this stage, you will have artifacts containing the feature table and corresponding feature sequences. From this point, analysis of paired-end read data progresses in the same way as analysis of single-end read data. You can therefore continue your analyses of these data following the steps that you ran in :doc:`the moving pictures tutorial <moving-pictures>`.
111+
112+
Questions to guide data analysis
113+
--------------------------------
114+
115+
Use the following questions to guide your further analyses of these data data.
173116

174117
#. What value would you choose to pass for ``--p-sampling-depth``? How many samples will be excluded from your analysis based on this choice? Approximately how many total sequences will you be analyzing in the ``core-metrics`` command?
175118

@@ -179,10 +122,13 @@ Use the following questions to guide your analyses of the data.
179122

180123
#. What discrete sample metadata categories are most strongly associated with the differences in microbial community richness or evenness? Are these differences statistically significant?
181124

182-
#. What differences do you observe between the unweighted UniFrac and Bray-Curtis PCoA plots?
183-
184125
#. In taxonomic composition bar plots, sort the samples by their average soil relative humidity, and visualize them at the phylum level. What are the dominant phyla in these samples? Which phyla increase and which decrease with increasing average soil relative humidity?
185126

186127
#. What phyla differ in abundance across vegetated and unvegetated sites?
187128

129+
Acknowledgements
130+
----------------
131+
132+
The data used in this tutorial is presented in: *Arid Soil Microbiome: Significant Impacts of Increasing Aridity. Neilson, Califf, Cardona, Copeland, van Treuren, Josephson, Knight, Gilbert, Quade, Caporaso, and Maier. mSystems (under review).*
133+
188134
.. _sample metadata: https://docs.google.com/spreadsheets/d/1xMP1EjKZDrzdKLnQr7LGVAY35ongxrreT28k0EACtfg/edit?usp=sharing

0 commit comments

Comments
 (0)