|
| 1 | +Importing data |
| 2 | +============== |
| 3 | + |
| 4 | +.. note:: This tutorial assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`. |
| 5 | + |
| 6 | +In order to use QIIME 2, we require input data to be stored in *artifacts* (i.e. ``.qza`` files). This is what enables distributed and automatic provenance tracking, as well as semantic type validation and transformations between data formats (see the :doc:`core concepts <../concepts>` page for more details about artifacts). This tutorial demonstrates how to import various data formats into artifacts for use with QIIME 2. |
| 7 | + |
| 8 | +.. note:: This tutorial does not describe all data formats that are currently supported in QIIME 2. It is a work-in-progress that describes some of the most commonly used data formats available in QIIME 2. We are also actively working on supporting additional data formats. If you need to import data in a format that is not covered here, please post to the `QIIME 2 Forum`_ for help. |
| 9 | + |
| 10 | +Importing will typically happen with your initial data (e.g. raw sequences obtained from a sequencing facility), but importing can be performed at any step in your analysis pipeline. For example, if a collaborator provides you with a ``.biom`` file, you can import it into an artifact to perform "downstream" statistical analyses that operate on a feature table. |
| 11 | + |
| 12 | +Importing can be accomplished using any of the QIIME 2 :doc:`interfaces <../interfaces/index>`. This tutorial will focus on using the QIIME 2 command-line interface (``q2cli``) to import data. Each section below briefly describes a data format, provides commands to download example data, and illustrates how to import the data into an artifact. |
| 13 | + |
| 14 | +Sequence data |
| 15 | +------------- |
| 16 | + |
| 17 | +"EMP protocol" multiplexed fastq |
| 18 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 19 | + |
| 20 | +Format description |
| 21 | +****************** |
| 22 | + |
| 23 | +In the "Earth Microbiome Project (EMP) protocol" format, there are two ``fastq.gz`` files, one containing sequence reads and one containing the associated barcode reads, with the sequence data still multiplexed. The order of the records in the two ``fastq.gz`` files defines the association between a sequence read and its barcode read. |
| 24 | + |
| 25 | +Obtaining example data |
| 26 | +********************** |
| 27 | + |
| 28 | +.. command-block:: |
| 29 | + |
| 30 | + mkdir raw-sequences |
| 31 | + |
| 32 | +.. download:: |
| 33 | + :url: https://data.qiime2.org/2.0.6/tutorials/moving-pictures/raw-sequences/barcodes.fastq.gz |
| 34 | + :saveas: raw-sequences/barcodes.fastq.gz |
| 35 | + |
| 36 | +.. download:: |
| 37 | + :url: https://data.qiime2.org/2.0.6/tutorials/moving-pictures/raw-sequences/sequences.fastq.gz |
| 38 | + :saveas: raw-sequences/sequences.fastq.gz |
| 39 | + |
| 40 | +Importing data |
| 41 | +************** |
| 42 | + |
| 43 | +.. command-block:: |
| 44 | + |
| 45 | + qiime tools import \ |
| 46 | + --type EMPSingleEndSequences \ |
| 47 | + --input-path raw-sequences \ |
| 48 | + --output-path raw-sequences.qza |
| 49 | + |
| 50 | +Casava 1.8 single-end demultiplexed fastq |
| 51 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 52 | + |
| 53 | +Format description |
| 54 | +****************** |
| 55 | + |
| 56 | +In this format, there is one ``fastq.gz`` file for each sample in the study, and the file name includes the sample identifier. The file name for a single sample might look like ``L2S357_15_L001_R1_001.fastq.gz``. The underscore-separated fields in this file name are the sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number. |
| 57 | + |
| 58 | +Obtaining example data |
| 59 | +********************** |
| 60 | + |
| 61 | +.. download:: |
| 62 | + :url: https://data.qiime2.org/2.0.6/tutorials/importing-sequence-data/casava-18-single-end-demultiplexed.zip |
| 63 | + :saveas: casava-18-single-end-demultiplexed.zip |
| 64 | + |
| 65 | +.. command-block:: |
| 66 | + |
| 67 | + unzip -q casava-18-single-end-demultiplexed.zip |
| 68 | + |
| 69 | +Importing data |
| 70 | +************** |
| 71 | + |
| 72 | +.. command-block:: |
| 73 | + qiime tools import \ |
| 74 | + --type 'SampleData[SequencesWithQuality]' \ |
| 75 | + --input-path casava-18-single-end-demultiplexed \ |
| 76 | + --source-format CasavaOneEightSingleLanePerSampleDirFmt \ |
| 77 | + --output-path demux.qza |
| 78 | + |
| 79 | +Feature table data |
| 80 | +------------------ |
| 81 | + |
| 82 | +BIOM v1.0.0 |
| 83 | +~~~~~~~~~~~ |
| 84 | + |
| 85 | +Format description |
| 86 | +****************** |
| 87 | + |
| 88 | +See the `BIOM v1.0.0 format specification`_ for details. |
| 89 | + |
| 90 | +Obtaining example data |
| 91 | +********************** |
| 92 | + |
| 93 | +.. download:: |
| 94 | + :url: https://data.qiime2.org/2.0.6/tutorials/examples/feature-table.biom |
| 95 | + :saveas: feature-table.biom |
| 96 | + |
| 97 | +Importing data |
| 98 | +************** |
| 99 | + |
| 100 | +.. command-block:: |
| 101 | + |
| 102 | + qiime tools import \ |
| 103 | + --input-path feature-table.biom \ |
| 104 | + --type "FeatureTable[Frequency]" \ |
| 105 | + --source-format BIOMV100Format \ |
| 106 | + --output-path feature-table.qza |
| 107 | + |
| 108 | +BIOM v2.1.0 |
| 109 | +~~~~~~~~~~~ |
| 110 | + |
| 111 | +Format description |
| 112 | +****************** |
| 113 | + |
| 114 | +See the `BIOM v2.1.0 format specification`_ for details. |
| 115 | + |
| 116 | +Obtaining example data |
| 117 | +********************** |
| 118 | + |
| 119 | +.. download:: |
| 120 | + :url: https://data.qiime2.org/2017.2/tutorials/importing/feature-table-v210.biom |
| 121 | + :saveas: feature-table-v210.biom |
| 122 | + |
| 123 | +Importing data |
| 124 | +************** |
| 125 | + |
| 126 | +.. command-block:: |
| 127 | + |
| 128 | + qiime tools import \ |
| 129 | + --input-path feature-table-v210.biom \ |
| 130 | + --type "FeatureTable[Frequency]" \ |
| 131 | + --source-format BIOMV210Format \ |
| 132 | + --output-path feature-table-v210.qza |
| 133 | + |
| 134 | +.. _QIIME 2 Forum: https://forum.qiime2.org |
| 135 | + |
| 136 | +.. _BIOM v1.0.0 format specification: http://biom-format.org/documentation/format_versions/biom-1.0.html |
| 137 | + |
| 138 | +.. _BIOM v2.1.0 format specification: http://biom-format.org/documentation/format_versions/biom-2.1.html |
0 commit comments