You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>Many of the common bioinformatic tools work on a per-sample basis. You can imagine that if you have hundreds of samples that you all want to process, it would become a quite painstaking process. For this purpose there exist computational pipelines which enable you to process many samples jointly, making the whole workflow more user-friendly. These pipelines also help to produce a consistent, documented, and therefore reproducible workflow, allowing you to process your target capture data even if you are not a trained bioinformatician. If you are completely new to Illumina data or target capture data in particular, make yourself cozy and read through <ahref="https://www.frontiersin.org/articles/10.3389/fgene.2019.01407/full">this review article</a>, which explains the whole process from project planning to producing, processing, and analyzing your data. Not now though, because we are here to get our hands dirty with some real data processing.</p>
45
45
<p>We are going to use the <ahref="https://github.com/AntonelliLab/seqcap_processor">SECAPR pipeline</a> (see SECAPR's original publication <ahref="https://peerj.com/articles/5175/">here</a>) on a dataset of <strong>Ultraconserved Elements (UCEs)</strong> that were sampled for the hummingbird genus <em>Topaza</em> in South America. This dataset consists of sequence data for about 2500 individual loci, which were enriched prior to sequencing using target capture.</p>
46
-
<p>It should be mentioned that there also is the <ahref="https://github.com/faircloth-lab/phyluce">Phyluce</a> pipeline, which is made and fine-tuned specifcially for UCE data. Usually you would use Phyluce for any UCE dataset. However in this case we will use SECAPR, since it is a more generalized pipeline that may come in handy for your non-UCE target capture dataset and therefore might be more useful for you to learn.</p>
46
+
<p>It should be mentioned that there also is the <ahref="https://github.com/faircloth-lab/phyluce">Phyluce</a> pipeline, which is made and fine-tuned specifically for UCE data. Usually you would use Phyluce for any UCE dataset. However in this case we will use SECAPR, since it is a more generalized pipeline that may come in handy for your non-UCE target capture dataset and therefore might be more useful for you to learn.</p>
<p>For this genus it is not clear if the existing morphological species assignments are justified and if there might be cryptic species within these morphospecies. We want to use this UCE dataset to generate multiple sequecnce alignments (MSAs), to eventually estimate a phylogeny (species tree) of these samples and define coalescent species.</p>
48
+
<p>For this genus it is not clear if the existing morphological species assignments are justified and if there might be cryptic species within these morphospecies. We want to use this UCE dataset to generate multiple sequence alignments (MSAs), to eventually estimate a phylogeny (species tree) of these samples and define coalescent species.</p>
49
49
<p>Below is an outline of the main functions and workflow of the SECAPR pipeline. This tutorial will cover all these processing steps, producing different sets of MSAs with different properties.</p>
<p>Now you have produced different set of MSAs which you can use for phylogenetic inference. With these tools at hand you will be able to process almost any target capture dataset. You could now produce individual gene trees from the alignments and then estimate the most likely species tree, using summary coalescent methods. Alternatively you could also co-estimate gene trees and species trees using *BEAST. <ahref="https://htmlpreview.github.io/?https://raw.githubusercontent.com/AntonelliLab/seqcap_processor/master/docs/documentation/subdocs/phylogeny_msc.html">Here</a> you can find a tutorial for running allele MSAs like the one we produced here in *BEAST.</p>
199
+
<p>Now you have produced different set of MSAs which you can use for phylogenetic inference. With these tools at hand you will be able to process almost any target capture dataset. You could now produce individual gene trees from the alignments and then estimate the most likely species tree, using summary coalescent methods. Alternatively you could also co-estimate gene trees and species trees using *BEAST. <ahref="https://htmlpreview.github.io/?https://github.com/AntonelliLab/seqcap_processor/blob/1bd840efe68fbe5e520f0166e1ba6524a78c4512/docs/documentation/subdocs/phylogeny_msc.html">Here</a> you can find a tutorial for running allele MSAs like the one we produced here in *BEAST.</p>
<p>Many of the common bioinformatic tools work on a per-sample basis. You can imagine that if you have hundreds of samples that you all want to process, it would become a quite painstaking process. For this purpose there exist computational pipelines which enable you to process many samples jointly, making the whole workflow more user-friendly. These pipelines also help to produce a consistent, documented, and therefore reproducible workflow, allowing you to process your target capture data even if you are not a trained bioinformatician. If you are completely new to Illumina data or target capture data in particular, make yourself cozy and read through <ahref="https://www.frontiersin.org/articles/10.3389/fgene.2019.01407/full">this review article</a>, which explains the whole process from project planning to producing, processing, and analyzing your data. Not now though, because we are here to get our hands dirty with some real data processing.</p>
45
45
<p>We are going to use the <ahref="https://github.com/AntonelliLab/seqcap_processor">SECAPR pipeline</a> (see SECAPR's original publication <ahref="https://peerj.com/articles/5175/">here</a>) on a dataset of <strong>Ultraconserved Elements (UCEs)</strong> that were sampled for the hummingbird genus <em>Topaza</em> in South America. This dataset consists of sequence data for about 2500 individual loci, which were enriched prior to sequencing using target capture.</p>
46
-
<p>It should be mentioned that there also is the <ahref="https://github.com/faircloth-lab/phyluce">Phyluce</a> pipeline, which is made and fine-tuned specifcially for UCE data. Usually you would use Phyluce for any UCE dataset. However in this case we will use SECAPR, since it is a more generalized pipeline that may come in handy for your non-UCE target capture dataset and therefore might be more useful for you to learn.</p>
46
+
<p>It should be mentioned that there also is the <ahref="https://github.com/faircloth-lab/phyluce">Phyluce</a> pipeline, which is made and fine-tuned specifically for UCE data. Usually you would use Phyluce for any UCE dataset. However in this case we will use SECAPR, since it is a more generalized pipeline that may come in handy for your non-UCE target capture dataset and therefore might be more useful for you to learn.</p>
<p>For this genus it is not clear if the existing morphological species assignments are justified and if there might be cryptic species within these morphospecies. We want to use this UCE dataset to generate multiple sequecnce alignments (MSAs), to eventually estimate a phylogeny (species tree) of these samples and define coalescent species.</p>
48
+
<p>For this genus it is not clear if the existing morphological species assignments are justified and if there might be cryptic species within these morphospecies. We want to use this UCE dataset to generate multiple sequence alignments (MSAs), to eventually estimate a phylogeny (species tree) of these samples and define coalescent species.</p>
49
49
<p>Below is an outline of the main functions and workflow of the SECAPR pipeline. This tutorial will cover all these processing steps, producing different sets of MSAs with different properties.</p>
<p>Now you have produced different set of MSAs which you can use for phylogenetic inference. With these tools at hand you will be able to process almost any target capture dataset. You could now produce individual gene trees from the alignments and then estimate the most likely species tree, using summary coalescent methods. Alternatively you could also co-estimate gene trees and species trees using *BEAST. <ahref="https://htmlpreview.github.io/?https://raw.githubusercontent.com/AntonelliLab/seqcap_processor/master/docs/documentation/subdocs/phylogeny_msc.html">Here</a> you can find a tutorial for running allele MSAs like the one we produced here in *BEAST.</p>
199
+
<p>Now you have produced different set of MSAs which you can use for phylogenetic inference. With these tools at hand you will be able to process almost any target capture dataset. You could now produce individual gene trees from the alignments and then estimate the most likely species tree, using summary coalescent methods. Alternatively you could also co-estimate gene trees and species trees using *BEAST. <ahref="https://htmlpreview.github.io/?https://github.com/AntonelliLab/seqcap_processor/blob/1bd840efe68fbe5e520f0166e1ba6524a78c4512/docs/documentation/subdocs/phylogeny_msc.html">Here</a> you can find a tutorial for running allele MSAs like the one we produced here in *BEAST.</p>
0 commit comments