-
Couldn't load subscription status.
- Fork 79
Cleaning docs #1669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaning docs #1669
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,26 +4,25 @@ Frequently Asked Questions | |
| What kind of data can I upload to Qiita for processing? | ||
| ------------------------------------------------------- | ||
|
|
||
| We need 3 things: raw data, sample template, and prep template. At this | ||
| moment, raw data is fastq files without demultiplexing with forward, | ||
| reverse (optional) and barcode reads. We should have before the end of | ||
| the week SFF processing so it's OK to upload. Note that we are accepting | ||
| any kind of target gene (16S, 18S, ITS, whatever) as long as they have | ||
| some kind of demultiplexing strategy and that you can also upload WGS. | ||
| However, WGS processing is not ready. | ||
| Processing in Qiita requires 3 things: raw data, sample and prep information | ||
| files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>__` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This reads "Here a list" ... maybe it was intended to be "Here you can find" or "Here is". |
||
| a list of currently supported raw files files. Note that we are accepting | ||
| any kind of target gene (16S, 18S, ITS, whatever) as long as there is | ||
| some kind of demultiplexing strategy. You can also upload WGS however, WGS | ||
|
||
| processing is not ready. | ||
|
|
||
| What's the difference between a sample and a prep template? | ||
| ----------------------------------------------------------- | ||
| What's the difference between a sample and a prep information file? | ||
| ------------------------------------------------------------------- | ||
|
|
||
| Sample template is the information about your samples, including | ||
| environmental and other important information about them. The prep | ||
| template is basically what kind of wet lab work all or a subset of the | ||
| samples had. If you collected 100 samples, you are going to need 100 | ||
| rows in your sample template describing each of them, this includes | ||
| blanks, etc. Then you prepared 95 of them for 16S and 50 of them for | ||
| 18S. Thus, you are going to need 2 prep templates: one with 95 rows | ||
| describing the preparation for 16S, and another one with 50 to | ||
| describing the 18S. For a more complex example go | ||
| Sample information file is the information about the samples, including | ||
|
||
| environmental factors relating to the associated host. The prep information | ||
| file has information on how the sample was processed in the wet lab. If you | ||
| collected 100 samples for your study, you will need 100 rows in your sample | ||
| information file describing each of them, and additional rows for blanks and other | ||
| control samples. If you prepared 95 of them for 16S and 50 of them for 18S, | ||
| you will need 2 prep information files: one with 95 rows describing the preparation | ||
| for 16S, and another one with 50 describing the 18S. For a more complex | ||
| example go | ||
| `here <#h.eddzjlm5e6l6>`__ and for examples of these files you can go to | ||
| the "Upload instructions" | ||
| `here <https://www.google.com/url?q=https%3A%2F%2Fvamps.mbl.edu%2Fmobe_workshop%2Fwiki%2Findex.php%2FMain_Page&sa=D&sntz=1&usg=AFQjCNE4PTOKIvFNlWtHmJyLLy11mfzF8A>`__. | ||
|
|
@@ -34,31 +33,32 @@ Example study processing workflow | |
| A few more instructions: for the example above the workflow should be: | ||
|
|
||
| #. Create a new study | ||
| #. Add a sample template, you can add 1, try to process it and the | ||
| #. Add a sample information file, you can add 1, try to process it and the | ||
| system will let you know if you have errors or missing columns. The | ||
| most common errors are: the sample name column should be named | ||
| sample\_name, duplicated sample names are not permitted, and the prep | ||
| template should contain all the samples in the sample template or a | ||
| subset. Finally, if you haven't processed your sample templates and | ||
| can add a column to your template named sloan\_status with this info: | ||
| SLOAN (funded by Sloan), SLOAN\_COMPATIBLE (not Sloan funded but with | ||
| compatible metadata, usually public), NOT\_SLOAN (not included i.e. | ||
| private study), that will be great! | ||
| #. Add a raw data. Depending on your barcoding/sequencing strategy you | ||
| might need 1 or 2 raw datas for the example above. If you have two | ||
| different fastq file sets (forward, reverse (optional) and barcodes) | ||
| you will need two raw datas but if you only have one set, you only | ||
| need one. | ||
| #. You can link your raw data to your files | ||
| #. You can add a prep template to your raw data. If you have the case | ||
| with only one fastq set (forward, reverse (optional) and barcodes), | ||
| you can add 2 different prep templates. Common missing fields here | ||
| are: emp\_status, center\_name, run\_prefix, platform, | ||
| library\_construction\_protocol, experiment\_design\_description, | ||
| center\_project\_name. Note that if you get a 500 error at this stage | ||
| is highly probable because emp\_status only accepts 3 values: 'EMP', | ||
| 'EMP\_Processed', 'NOT\_EMP', if errors persist please do not | ||
| hesitate to contact us. | ||
| #. You can preprocess your files. For target gene, this means | ||
| demultiplexing and QC. | ||
|
|
||
| sample\_name, duplicated sample names are not permitted. For a full list of | ||
| required fields, visit :doc:`tutorials/prepare-information-files`. | ||
| #. Add a prep information file to your study for each data type. The prep | ||
| information file should contain all the samples in the sample information | ||
| file or a subset. If you have more than one FASTQ file set (forward, | ||
| reverse (optional) and barcodes) you will need to add a run_prefix | ||
|
||
| column. A prep information file and a QIIME compatible mapping file will | ||
| be available for download after the prep information file is added | ||
| successfully. | ||
| #. Upload and link your raw data to each of your prep information files. | ||
| Depending on your barcoding/sequencing strategy you might need 1 or more | ||
| raw datas file sets. If you have 2 raw data sets you may have to rename one | ||
|
||
| set so that each set has a different name. If they have the same name they | ||
| will over-write on upload. Note that you can have one FASTQ file set linked | ||
| to more than one prep information file. | ||
| #. Preprocess your files. For target gene amplicon sequencing, this will demux | ||
| and QC. There are multiple options for preprocessing depending on the | ||
| barcode format and the data output from the sequencing center - this may | ||
| require a series of trial and error to establish the correct option for | ||
| your data files. After demultiplexing a log file is generated with | ||
| statistics about the files demultiplexed including the number of sequences | ||
| assigned per sample. | ||
| #. Process each of your preprocessed data types. For target gene, this will | ||
|
||
| perform close OTU picking against the latest version of Greengenes and can | ||
|
||
| be quite time consuming depending on the number of samples and the depth | ||
| of sequencing. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,7 +13,7 @@ environments for each pipeline gives the freedom of adding any pipeline with | |
| any software dependencies to Qiita. Artifacts, basically any file in the | ||
| system, from raw sequence to contingency tables or even data visualizations, | ||
| permits the system to store any kind of data but also define within each | ||
| pipelines which commands and parameters can applied to them. | ||
| pipelines which commands and parameter can applied to them. | ||
|
||
|
|
||
| The current plugins available are: | ||
|
|
||
|
|
@@ -35,44 +35,47 @@ can request an administrator to validate their study information and make it | |
| private and possibly submit to a permanent repository, where it can also be | ||
| kept private until the user wants to make it public. At this stage in Qiita | ||
| the whole study (including all processed data) is private. This process is | ||
| completely automatic via the GUI. Currently sequence data is being deposited | ||
| for permanent storage to the European Nucleotide Archive (ENA), part of the | ||
| European Bioinformatics Institute (EBI). Finally, when the user is ready, | ||
| usually when the main manuscript of the study is ready for publication, the | ||
| user can request for the artifact to become public, both in Qiita and the | ||
| permanent repository, Figure 2. | ||
| completely automatic via the Graphical User Interface (GUI). Currently sequence | ||
| data is being deposited for permanent storage to the European Nucleotide | ||
| Archive (ENA), part of the European Bioinformatics Institute (EBI). Finally, | ||
| when the user is ready, usually when the main manuscript of the study is ready | ||
| for publication, the user can request for the artifact to be made public | ||
| public, both in Qiita and the permanent repository, Figure 2. | ||
|
|
||
|
|
||
| .. figure:: images/figure1.png | ||
| :align: center | ||
|
|
||
| **Figure 1. Qiita’s main structure: from single to multiple studies.** More | ||
| and more a simple study is composed by a multiple samples which have been | ||
| prepared chemically to identify diverse microbial composition parts of them. | ||
| For example, 16S to see which kind of bacteria lives on them, Metabolomics | ||
| to see the substance formed by the community, and/or ITS for the fungi. | ||
| Additionally, Qiita allows users to compare their studies with other public | ||
| ones already available in the system. | ||
| **Figure 1. Qiita’s main structure: from single to multiple studies.** | ||
| Increasingly, a simple study is composed of multiple samples which have | ||
| been prepared using different protocols to identify different microbial | ||
| features. For example, 16S rRNA amplification to identify the bacteria in | ||
| or on the sample, metabolomics to identify chemical components formed by | ||
| the microbial community or within the sample, and/or ITS amplification for | ||
| identification of fungal organisms that may also be present. Additionally, | ||
| Qiita allows users to compare their studies with other public ones already | ||
| available in the system. | ||
|
|
||
|
|
||
| .. figure:: images/figure2.png | ||
| :align: center | ||
|
|
||
| **Figure 2. Possible Qiita artifact states.** Artifacts are any file, | ||
| either uploaded by users or generated by the system. There are 3 possible | ||
| states: sanboxed, private and public. In the sandboxed and private states | ||
| no other user has access to the artifacts, except if the owner invites a | ||
| guest. In the public state, the artifact is open to all users in the | ||
| system, and the study can be searched from the study listing page. | ||
| states: sandboxed, private and public. In the sandboxed and private states | ||
| no other user has access to the artifacts, unless the owner grants access | ||
| by sharing the study. In the public state, the artifact is open to all | ||
| users in the system, and the study can be found by searching from the | ||
| study listing page. | ||
|
|
||
|
|
||
| Portals | ||
| ------- | ||
|
|
||
| Qiita allows to host multiple portals within the same infrastructure. This | ||
| allows each portal to have a subset of studies in a different URL but sharing | ||
| the same resources. Sharing the same backend resources avoids having multiple | ||
| sites and data getting out of sync. | ||
| Qiita allows the hosting of multiple portals within the same infrastructure. | ||
| This allows each portal to have a subset of studies (often with a similar | ||
| theme) in a different URL but sharing the same resources. Sharing the same | ||
| backend resources avoids having multiple sites and data getting out of sync. | ||
|
|
||
| The current available portals are: | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,18 +6,20 @@ EBI submission via Qiita | |
| ======================== | ||
|
|
||
| Qiita allows users to deposit their study, sample, experiment and sequence data to the | ||
| `European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is a permanent repository | ||
| part of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to | ||
| `European Nucleotide Archive (ENA) <https://www.ebi.ac.uk/ena>`__, which is the permanent data | ||
| repository of the `European Bioinformatics Institute (EBI) <https://www.ebi.ac.uk/>`__. Submitting to | ||
| this repository will provide you with a unique identifier for your study, which is generally a | ||
| requirement for publication. | ||
| requirement for publication. Your study will be housed with all other Qiita submissions | ||
| and so we require adherence to the MiXs standard. | ||
|
|
||
| EBI/ENA requires a given set of column fields to describe your samples and experiments, for more | ||
| information visit :doc:`prepare-templates` and pay most attention to EBI required fields, | ||
| information visit :doc:`prepare-information-files` and pay most attention to EBI required fields, | ||
| without these **Qiita Admins** will not be able to submit. If you want to submit your data or need | ||
| help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__. | ||
| help send an email to `qiita.help@gmail.com <qiita.help@gmail.com>`__. Help will include | ||
| advice on additional fields to add to ensure MiXs compliance. | ||
|
|
||
| Note that this kind of submissions are time consuming and need full collaboration from the user. | ||
| Thus, do not wait until the last minute to request help. In general, the best time to request a submission | ||
| Note that submissions are time consuming and need full collaboration from the user. | ||
| Do not wait until the last minute to request help. In general, the best time to request a submission | ||
|
||
| is when you are writing your paper. Remember that the data can be submitted to EBI and can be | ||
| kept private and simply make public when the paper is accepted. Note that EBI/ENA takes up to 15 days to | ||
| change the status from private to public, so consider this when submitting data and your manuscript. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fasta -> FASTA and perhaps remove the period at the end of this sentence or add it to the other points? ... that is of course a minor suggestion, but yeah I noticed that