Skip to content

Commit c57dd99

Browse files
jairideoutGreg Caporaso
authored andcommitted
ENH: general importing tutorial; BIOM v2.1.0 import example (qiime2#76)
Merged https://docs.qiime2.org/2.0.6/tutorials/import/ and https://docs.qiime2.org/2.0.6/tutorials/import-sequence-data/ into a general "importing data" tutorial, which will be accessible at https://docs.qiime2.org/2017.2/tutorials/importing/ Added example of importing BIOM v2.1.0. Fixes qiime2#64.
1 parent f7bcaa4 commit c57dd99

File tree

8 files changed

+145
-116
lines changed

8 files changed

+145
-116
lines changed

source/concepts.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Data files: artifacts
88

99
Data produced by QIIME 2 are stored as *artifacts*. An artifact is a file containing data and metadata. The metadata describes things about the data, such as its type, format, and how it was generated (provenance). An artifact typically has the ``.qza`` file extension.
1010

11-
Since QIIME 2 works with artifacts instead of data files (e.g. FASTA files), you can create an artifact by importing data. You can import data at any step in an analysis, though typically you will start by importing raw sequence data. QIIME 2 also has tools to export data from an artifact. See the :doc:`importing guide <tutorials/import>` for details.
11+
Since QIIME 2 works with artifacts instead of data files (e.g. FASTA files), you can create an artifact by importing data. You can import data at any step in an analysis, though typically you will start by importing raw sequence data. QIIME 2 also has tools to export data from an artifact. See the :doc:`importing guide <tutorials/importing>` for details.
1212

1313
By using artifacts instead of simple data files, QIIME 2 can automatically track the type, format, and provenance of data for researchers. Using artifacts instead of data files enables researchers to focus on the analyses they want to perform, instead of the particular format the data needs to be in for an analysis.
1414

source/data-resources.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Naive Bayes classifiers trained on:
2424
Marker gene reference databases
2525
-------------------------------
2626

27-
These marker gene reference databases are formatted for use with QIIME 1 and QIIME 2. If you're using these databases with QIIME 2, you'll need to :doc:`import them into artifacts <./tutorials/import>` before using them.
27+
These marker gene reference databases are formatted for use with QIIME 1 and QIIME 2. If you're using these databases with QIIME 2, you'll need to :doc:`import them into artifacts <./tutorials/importing>` before using them.
2828

2929
Greengenes (16S rRNA)
3030
`````````````````````

source/semantic-types.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ The following semantic types are defined by, and importable from, the `q2-types`
4343

4444
``FeatureData[PairedEndSequence]``: Paired-end sequences (forward and reverse) associated with a feature identifier.
4545

46-
``RawSequences``: Raw, multiplexed sequence data. See :doc:`tutorials/import-sequence-data` for details about importing this data type. **Note:** this semantic type is currently defined in `q2-demux`_, but may be moved to `q2-types`_ in the future, possibly with a different name.
46+
``RawSequences``: Raw, multiplexed sequence data. See :doc:`tutorials/importing` for details about importing this data type. **Note:** this semantic type is currently defined in `q2-demux`_, but may be moved to `q2-types`_ in the future, possibly with a different name.
4747

4848
.. _q2-types: https://github.com/qiime2/q2-types
4949

source/tutorials/fmt.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Fecal microbiota transplant (FMT) study: an exercise
22
====================================================
33

4-
.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
4+
.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install guides <../install/index>`.
55

66
This document is intended to be run after :doc:`the moving pictures tutorial <moving-pictures>`. It is designed to introduce a few new ideas, and to be an exercise in applying the tools that were explored in that document.
77

@@ -33,7 +33,7 @@ Alternatively, the following command will download the sample metadata as tab-se
3333
:url: https://docs.google.com/spreadsheets/d/15kqZlUrIp9FV4U7OSzeCzteuWMtbkaXgYvD_hTZZ9pw/export?gid=0&format=tsv
3434
:saveas: sample-metadata.tsv
3535

36-
Next, download the *demultiplexed sequences* that we'll use in this analysis. In this tutorial we'll work with a small subset (10%) of the complete sequence data so that the commands will run quickly. To learn how to start a QIIME 2 analysis from raw sequence data, see the :doc:`importing data documentation <import>`. We'll need to download two sets of demultiplexed sequences, each corresponding to one of the sequencing runs.
36+
Next, download the *demultiplexed sequences* that we'll use in this analysis. In this tutorial we'll work with a small subset (10%) of the complete sequence data so that the commands will run quickly. To learn how to start a QIIME 2 analysis from raw sequence data, see the :doc:`importing data tutorial <importing>`. We'll need to download two sets of demultiplexed sequences, each corresponding to one of the sequencing runs.
3737

3838
.. download::
3939
:url: https://data.qiime2.org/2.0.6/tutorials/fmt/fmt-tutorial-demux-1-10p.qza

source/tutorials/import-sequence-data.rst

Lines changed: 0 additions & 72 deletions
This file was deleted.

source/tutorials/import.rst

Lines changed: 0 additions & 36 deletions
This file was deleted.

source/tutorials/importing.rst

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
Importing data
2+
==============
3+
4+
.. note:: This tutorial assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
5+
6+
In order to use QIIME 2, we require input data to be stored in *artifacts* (i.e. ``.qza`` files). This is what enables distributed and automatic provenance tracking, as well as semantic type validation and transformations between data formats (see the :doc:`core concepts <../concepts>` page for more details about artifacts). This tutorial demonstrates how to import various data formats into artifacts for use with QIIME 2.
7+
8+
.. note:: This tutorial does not describe all data formats that are currently supported in QIIME 2. It is a work-in-progress that describes some of the most commonly used data formats available in QIIME 2. We are also actively working on supporting additional data formats. If you need to import data in a format that is not covered here, please post to the `QIIME 2 Forum`_ for help.
9+
10+
Importing will typically happen with your initial data (e.g. raw sequences obtained from a sequencing facility), but importing can be performed at any step in your analysis pipeline. For example, if a collaborator provides you with a ``.biom`` file, you can import it into an artifact to perform "downstream" statistical analyses that operate on a feature table.
11+
12+
Importing can be accomplished using any of the QIIME 2 :doc:`interfaces <../interfaces/index>`. This tutorial will focus on using the QIIME 2 command-line interface (``q2cli``) to import data. Each section below briefly describes a data format, provides commands to download example data, and illustrates how to import the data into an artifact.
13+
14+
Sequence data
15+
-------------
16+
17+
"EMP protocol" multiplexed fastq
18+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19+
20+
Format description
21+
******************
22+
23+
In the "Earth Microbiome Project (EMP) protocol" format, there are two ``fastq.gz`` files, one containing sequence reads and one containing the associated barcode reads, with the sequence data still multiplexed. The order of the records in the two ``fastq.gz`` files defines the association between a sequence read and its barcode read.
24+
25+
Obtaining example data
26+
**********************
27+
28+
.. command-block::
29+
30+
mkdir raw-sequences
31+
32+
.. download::
33+
:url: https://data.qiime2.org/2.0.6/tutorials/moving-pictures/raw-sequences/barcodes.fastq.gz
34+
:saveas: raw-sequences/barcodes.fastq.gz
35+
36+
.. download::
37+
:url: https://data.qiime2.org/2.0.6/tutorials/moving-pictures/raw-sequences/sequences.fastq.gz
38+
:saveas: raw-sequences/sequences.fastq.gz
39+
40+
Importing data
41+
**************
42+
43+
.. command-block::
44+
45+
qiime tools import \
46+
--type EMPSingleEndSequences \
47+
--input-path raw-sequences \
48+
--output-path raw-sequences.qza
49+
50+
Casava 1.8 single-end demultiplexed fastq
51+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52+
53+
Format description
54+
******************
55+
56+
In this format, there is one ``fastq.gz`` file for each sample in the study, and the file name includes the sample identifier. The file name for a single sample might look like ``L2S357_15_L001_R1_001.fastq.gz``. The underscore-separated fields in this file name are the sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number.
57+
58+
Obtaining example data
59+
**********************
60+
61+
.. download::
62+
:url: https://data.qiime2.org/2.0.6/tutorials/importing-sequence-data/casava-18-single-end-demultiplexed.zip
63+
:saveas: casava-18-single-end-demultiplexed.zip
64+
65+
.. command-block::
66+
67+
unzip -q casava-18-single-end-demultiplexed.zip
68+
69+
Importing data
70+
**************
71+
72+
.. command-block::
73+
qiime tools import \
74+
--type 'SampleData[SequencesWithQuality]' \
75+
--input-path casava-18-single-end-demultiplexed \
76+
--source-format CasavaOneEightSingleLanePerSampleDirFmt \
77+
--output-path demux.qza
78+
79+
Feature table data
80+
------------------
81+
82+
BIOM v1.0.0
83+
~~~~~~~~~~~
84+
85+
Format description
86+
******************
87+
88+
See the `BIOM v1.0.0 format specification`_ for details.
89+
90+
Obtaining example data
91+
**********************
92+
93+
.. download::
94+
:url: https://data.qiime2.org/2.0.6/tutorials/examples/feature-table.biom
95+
:saveas: feature-table.biom
96+
97+
Importing data
98+
**************
99+
100+
.. command-block::
101+
102+
qiime tools import \
103+
--input-path feature-table.biom \
104+
--type "FeatureTable[Frequency]" \
105+
--source-format BIOMV100Format \
106+
--output-path feature-table.qza
107+
108+
BIOM v2.1.0
109+
~~~~~~~~~~~
110+
111+
Format description
112+
******************
113+
114+
See the `BIOM v2.1.0 format specification`_ for details.
115+
116+
Obtaining example data
117+
**********************
118+
119+
.. download::
120+
:url: https://data.qiime2.org/2017.2/tutorials/importing/feature-table-v210.biom
121+
:saveas: feature-table-v210.biom
122+
123+
Importing data
124+
**************
125+
126+
.. command-block::
127+
128+
qiime tools import \
129+
--input-path feature-table-v210.biom \
130+
--type "FeatureTable[Frequency]" \
131+
--source-format BIOMV210Format \
132+
--output-path feature-table-v210.qza
133+
134+
.. _QIIME 2 Forum: https://forum.qiime2.org
135+
136+
.. _BIOM v1.0.0 format specification: http://biom-format.org/documentation/format_versions/biom-1.0.html
137+
138+
.. _BIOM v2.1.0 format specification: http://biom-format.org/documentation/format_versions/biom-2.1.html

source/tutorials/index.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ Tutorials
77
moving-pictures
88
fmt
99
88soils
10-
import
11-
feature-classifier
12-
import-sequence-data
10+
importing
1311
filtering
12+
feature-classifier

0 commit comments

Comments
 (0)