ENH: general importing tutorial; BIOM v2.1.0 import example (qiime2#76)

jairideout · Greg Caporaso · commit c57dd99756ce · 2017-02-01T08:55:55.000-07:00
Merged https://docs.qiime2.org/2.0.6/tutorials/import/ and https://docs.qiime2.org/2.0.6/tutorials/import-sequence-data/ into a general "importing data" tutorial, which will be accessible at https://docs.qiime2.org/2017.2/tutorials/importing/ Added example of importing BIOM v2.1.0. Fixes qiime2#64.
diff --git a/source/concepts.rst b/source/concepts.rst
@@ -8,7 +8,7 @@ Data files: artifacts
 
 Data produced by QIIME 2 are stored as *artifacts*. An artifact is a file containing data and metadata. The metadata describes things about the data, such as its type, format, and how it was generated (provenance). An artifact typically has the ``.qza`` file extension.
 
-Since QIIME 2 works with artifacts instead of data files (e.g. FASTA files), you can create an artifact by importing data. You can import data at any step in an analysis, though typically you will start by importing raw sequence data. QIIME 2 also has tools to export data from an artifact. See the :doc:`importing guide <tutorials/import>` for details.
+Since QIIME 2 works with artifacts instead of data files (e.g. FASTA files), you can create an artifact by importing data. You can import data at any step in an analysis, though typically you will start by importing raw sequence data. QIIME 2 also has tools to export data from an artifact. See the :doc:`importing guide <tutorials/importing>` for details.
 
 By using artifacts instead of simple data files, QIIME 2 can automatically track the type, format, and provenance of data for researchers. Using artifacts instead of data files enables researchers to focus on the analyses they want to perform, instead of the particular format the data needs to be in for an analysis.
 
diff --git a/source/data-resources.rst b/source/data-resources.rst
@@ -24,7 +24,7 @@ Naive Bayes classifiers trained on:
 Marker gene reference databases
 -------------------------------
 
-These marker gene reference databases are formatted for use with QIIME 1 and QIIME 2. If you're using these databases with QIIME 2, you'll need to :doc:`import them into artifacts <./tutorials/import>` before using them.
+These marker gene reference databases are formatted for use with QIIME 1 and QIIME 2. If you're using these databases with QIIME 2, you'll need to :doc:`import them into artifacts <./tutorials/importing>` before using them.
 
 Greengenes (16S rRNA)
 `````````````````````
diff --git a/source/semantic-types.rst b/source/semantic-types.rst
@@ -43,7 +43,7 @@ The following semantic types are defined by, and importable from, the `q2-types`
 
 ``FeatureData[PairedEndSequence]``: Paired-end sequences (forward and reverse) associated with a feature identifier.
 
-``RawSequences``: Raw, multiplexed sequence data. See :doc:`tutorials/import-sequence-data` for details about importing this data type. **Note:** this semantic type is currently defined in `q2-demux`_, but may be moved to `q2-types`_ in the future, possibly with a different name.
+``RawSequences``: Raw, multiplexed sequence data. See :doc:`tutorials/importing` for details about importing this data type. **Note:** this semantic type is currently defined in `q2-demux`_, but may be moved to `q2-types`_ in the future, possibly with a different name.
 
 .. _q2-types: https://github.com/qiime2/q2-types
 
diff --git a/source/tutorials/fmt.rst b/source/tutorials/fmt.rst
@@ -1,7 +1,7 @@
 Fecal microbiota transplant (FMT) study: an exercise
 ====================================================
 
-.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
+.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install guides <../install/index>`.
 
 This document is intended to be run after :doc:`the moving pictures tutorial <moving-pictures>`. It is designed to introduce a few new ideas, and to be an exercise in applying the tools that were explored in that document.
 
@@ -33,7 +33,7 @@ Alternatively, the following command will download the sample metadata as tab-se
    :url: https://docs.google.com/spreadsheets/d/15kqZlUrIp9FV4U7OSzeCzteuWMtbkaXgYvD_hTZZ9pw/export?gid=0&format=tsv
    :saveas: sample-metadata.tsv
 
-Next, download the *demultiplexed sequences* that we'll use in this analysis. In this tutorial we'll work with a small subset (10%) of the complete sequence data so that the commands will run quickly. To learn how to start a QIIME 2 analysis from raw sequence data, see the :doc:`importing data documentation <import>`. We'll need to download two sets of demultiplexed sequences, each corresponding to one of the sequencing runs.
+Next, download the *demultiplexed sequences* that we'll use in this analysis. In this tutorial we'll work with a small subset (10%) of the complete sequence data so that the commands will run quickly. To learn how to start a QIIME 2 analysis from raw sequence data, see the :doc:`importing data tutorial <importing>`. We'll need to download two sets of demultiplexed sequences, each corresponding to one of the sequencing runs.
 
 .. download::
    :url: https://data.qiime2.org/2.0.6/tutorials/fmt/fmt-tutorial-demux-1-10p.qza
diff --git a/source/tutorials/import-sequence-data.rst b/source/tutorials/import-sequence-data.rst
diff --git a/source/tutorials/import.rst b/source/tutorials/import.rst
diff --git a/source/tutorials/importing.rst b/source/tutorials/importing.rst
@@ -0,0 +1,138 @@
+Importing data
+==============
+
+.. note:: This tutorial assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
+
+In order to use QIIME 2, we require input data to be stored in *artifacts* (i.e. ``.qza`` files). This is what enables distributed and automatic provenance tracking, as well as semantic type validation and transformations between data formats (see the :doc:`core concepts <../concepts>` page for more details about artifacts). This tutorial demonstrates how to import various data formats into artifacts for use with QIIME 2.
+
+.. note:: This tutorial does not describe all data formats that are currently supported in QIIME 2. It is a work-in-progress that describes some of the most commonly used data formats available in QIIME 2. We are also actively working on supporting additional data formats. If you need to import data in a format that is not covered here, please post to the `QIIME 2 Forum`_ for help.
+
+Importing will typically happen with your initial data (e.g. raw sequences obtained from a sequencing facility), but importing can be performed at any step in your analysis pipeline. For example, if a collaborator provides you with a ``.biom`` file, you can import it into an artifact to perform "downstream" statistical analyses that operate on a feature table.
+
+Importing can be accomplished using any of the QIIME 2 :doc:`interfaces <../interfaces/index>`. This tutorial will focus on using the QIIME 2 command-line interface (``q2cli``) to import data. Each section below briefly describes a data format, provides commands to download example data, and illustrates how to import the data into an artifact.
+
+Sequence data
+-------------
+
+"EMP protocol" multiplexed fastq
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Format description
+******************
+
+In the "Earth Microbiome Project (EMP) protocol" format, there are two ``fastq.gz`` files, one containing sequence reads and one containing the associated barcode reads, with the sequence data still multiplexed. The order of the records in the two ``fastq.gz`` files defines the association between a sequence read and its barcode read.
+
+Obtaining example data
+**********************
+
+.. command-block::
+
+   mkdir raw-sequences
+
+.. download::
+   :url: https://data.qiime2.org/2.0.6/tutorials/moving-pictures/raw-sequences/barcodes.fastq.gz
+   :saveas: raw-sequences/barcodes.fastq.gz
+
+.. download::
+   :url: https://data.qiime2.org/2.0.6/tutorials/moving-pictures/raw-sequences/sequences.fastq.gz
+   :saveas: raw-sequences/sequences.fastq.gz
+
+Importing data
+**************
+
+.. command-block::
+
+   qiime tools import \
+     --type EMPSingleEndSequences \
+     --input-path raw-sequences \
+     --output-path raw-sequences.qza
+
+Casava 1.8 single-end demultiplexed fastq
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Format description
+******************
+
+In this format, there is one ``fastq.gz`` file for each sample in the study, and the file name includes the sample identifier. The file name for a single sample might look like ``L2S357_15_L001_R1_001.fastq.gz``. The underscore-separated fields in this file name are the sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number.
+
+Obtaining example data
+**********************
+
+.. download::
+   :url: https://data.qiime2.org/2.0.6/tutorials/importing-sequence-data/casava-18-single-end-demultiplexed.zip
+   :saveas: casava-18-single-end-demultiplexed.zip
+
+.. command-block::
+
+   unzip -q casava-18-single-end-demultiplexed.zip
+
+Importing data
+**************
+
+.. command-block::
+   qiime tools import \
+     --type 'SampleData[SequencesWithQuality]' \
+     --input-path casava-18-single-end-demultiplexed \
+     --source-format CasavaOneEightSingleLanePerSampleDirFmt \
+     --output-path demux.qza
+
+Feature table data
+------------------
+
+BIOM v1.0.0
+~~~~~~~~~~~
+
+Format description
+******************
+
+See the `BIOM v1.0.0 format specification`_ for details.
+
+Obtaining example data
+**********************
+
+.. download::
+   :url: https://data.qiime2.org/2.0.6/tutorials/examples/feature-table.biom
+   :saveas: feature-table.biom
+
+Importing data
+**************
+
+.. command-block::
+
+   qiime tools import \
+     --input-path feature-table.biom \
+     --type "FeatureTable[Frequency]" \
+     --source-format BIOMV100Format \
+     --output-path feature-table.qza
+
+BIOM v2.1.0
+~~~~~~~~~~~
+
+Format description
+******************
+
+See the `BIOM v2.1.0 format specification`_ for details.
+
+Obtaining example data
+**********************
+
+.. download::
+   :url: https://data.qiime2.org/2017.2/tutorials/importing/feature-table-v210.biom
+   :saveas: feature-table-v210.biom
+
+Importing data
+**************
+
+.. command-block::
+
+   qiime tools import \
+     --input-path feature-table-v210.biom \
+     --type "FeatureTable[Frequency]" \
+     --source-format BIOMV210Format \
+     --output-path feature-table-v210.qza
+
+.. _QIIME 2 Forum: https://forum.qiime2.org
+
+.. _BIOM v1.0.0 format specification: http://biom-format.org/documentation/format_versions/biom-1.0.html
+
+.. _BIOM v2.1.0 format specification: http://biom-format.org/documentation/format_versions/biom-2.1.html
diff --git a/source/tutorials/index.rst b/source/tutorials/index.rst
@@ -7,7 +7,6 @@ Tutorials
    moving-pictures
    fmt
    88soils
-   import
-   feature-classifier
-   import-sequence-data
+   importing
    filtering
+   feature-classifier