Skip to content

Commit bb56470

Browse files
authored
Get the microhaplot sticker in there! (#19)
* updated roxygen note * Update README to Rmd and give it a sticker and CRAN badge * Rerun pkgdown stuff
1 parent d7f767f commit bb56470

File tree

22 files changed

+574
-347
lines changed

22 files changed

+574
-347
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Imports:
2525
ggiraph (>= 0.6.0)
2626
URL: https://github.com/ngthomas/microhaplot
2727
BugReports: https://github.com/ngthomas/microhaplot/issues
28-
RoxygenNote: 7.1.0
28+
RoxygenNote: 7.1.1
2929
Suggests:
3030
knitr,
3131
rmarkdown

README.Rmd

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
---
2+
output: github_document
3+
---
4+
5+
```{r setup, include=FALSE}
6+
knitr::opts_chunk$set(echo = TRUE)
7+
```
8+
9+
# microhaplot <img src="man/figures/microhaplot-sticker.png" align="right" width="200"/>
10+
11+
<!-- badges: start -->
12+
[![CRAN status](https://www.r-pkg.org/badges/version/microhaplot)](https://CRAN.R-project.org/package=microhaplot)
13+
<!-- badges: end -->
14+
15+
`microhaplot` generates visual summaries of microhaplotypes found in short read alignments. All you need are alignment SAM
16+
files and a variant call VCF file. (The latter tells `microhaplot` which SNPs to include into microhaplotypes). It was
17+
designed for extracting and visualized haplotypes from high-quality amplicon sequencing data. We have used it extensively
18+
to process amplicon sequencing data (with 100 to 500 amplicons) from rockfish and Chinook salmon, generated on an Illumina
19+
MiSeq sequencer. It should be extensible to sequences from capture arrays, like RAPTURE data.
20+
21+
This software exists as an R package `microhaplot` that includes within it the code to set up and
22+
establish an Rstudio/Shiny server to visualize and manipulate the data. There are two key steps in
23+
the `microhaplot` workflow:
24+
25+
1. The first step is to summarize alignment and variant (SNP) data into a single data frame that is
26+
easily operated upon. This is done using the function `microhaplot::prepHaplotFiles`. You must supply a
27+
VCF file that includes variants that you are interested in extracting, and as many SAM files
28+
(one for each individual) that you want to extract read information from at each of the variants.
29+
The function `microhaplot::prepHaplotFiles` makes a call
30+
to PERL to parse the CIGAR strings in the SAM files to extract the variant information at each read
31+
and store this information into a data frame which gets saved with the installed Shiny app (see below)
32+
for later use. Depending on the size of the data set, this can take a few minutes.
33+
34+
2. The second step is to run the microhaplot Shiny app to visualize the sequence information, call genotypes using
35+
simple read-depth based filtering criteria, and curate the loci. microhaplot is suitable for quick assessment
36+
and quality control of haplotypes generated from library runs. Plot summaries include read depth, fraction of callable haplotypes, Hardy-Weinberg
37+
equilibrium plots, and more.
38+
39+
40+
See the **Example Data** section to learn about how to run each of these steps on the example data that are provided
41+
with the package.
42+
43+
44+
## Installation and Quick Start
45+
46+
### required Perl dependencies:
47+
You need to have Perl (version >5.014) installed in your OS in order to run Microhaplot.
48+
For Window users, we recommend install it via http://strawberryperl.com/.
49+
For Mac and Linux users, Perl can be downloaded from https://www.perl.org/get.html
50+
51+
You can either clone the repository and build the `microhaplot` package yourself, or, more easily, you can
52+
install it using [devtools](https://github.com/hadley/devtools). You can get `devtools` by `install.packages("devtools")`.
53+
54+
**To mac user: remember to install [XQuartz](https://www.xquartz.org/), when upgrading your macOS to a new major version.**
55+
56+
Once you have `devtools` available in R, you can get `microhaplot` this way:
57+
```r
58+
devtools::install_github("ngthomas/microhaplot", build_vignettes = TRUE, build_opts = c("--no-resave-data", "--no-manual"))
59+
```
60+
61+
Once you have installed the `microhaplot` R package with devtools there you need to use the `microhaplot::mvHaplotype`
62+
to establish the microhaplot Shiny App in a convenient location on your system. The following line
63+
creates the directory `Shiny` in my home directory and then within that it creates the
64+
directory `microhaplot` and fills it with the Shiny app as well as the example data that go
65+
along with that.
66+
67+
```r
68+
microhaplot::mvShinyHaplot("~/Shiny") # provide a directory path to host the microhaplot app
69+
```
70+
To start familiarizing yourself with microhaplot using the provided example data. We recommend
71+
going through our first vignette. Call it up with:
72+
```r
73+
browseVignettes("microhaplot")
74+
```
75+
and check out `microhaplot-walkthrough`.
76+
77+
Now, having done that, we can launch Shiny microhaplot on the example data:
78+
```r
79+
library(microhaplot)
80+
app.path <- "~/Shiny/microhaplot"
81+
runShinyHaplot(app.path)
82+
```
83+
84+
## Quick Guide to use microhaplot to parse out SAM and VCF files
85+
86+
This microhaplot package comes with a small customized sample data drawn from an actual run
87+
of short read sequencing run on Rockfish species. The sample data
88+
contains sequences of eight genomic loci for four populations of five individuals each,
89+
with a total of twenty individuals.
90+
91+
First you need to create a tab-separate **label** file with 3 info columns: path to SAM file name, individual ID, and group label (in this particular order). If you do not want assign any group label for the individuals, you can just leave it as "NA". It is recommended that you have all of the SAM files under one directory to make this labeling task easier.
92+
93+
The `label` file looks like this:
94+
```txt
95+
s6.sam s6 copper
96+
s11.sam s11 copper
97+
s13.sam s13 gold
98+
s14.sam s14 kelp
99+
s18.sam s18 gold
100+
```
101+
102+
Once you have the label file in place, you can run `prepHaplotFiles`, a R function that generates tables of microhaplotype, by providing the following:
103+
* a label to display in haPLOType
104+
* path to the directory with all SAM files
105+
* path to the `label` file you just created
106+
* path to the VCF file
107+
* optional number of threads (for non-Windows user); recommend 2 * # of processors
108+
109+
```R
110+
library(microhaplot)
111+
112+
# to access package sample case study dataset of rockfish
113+
run.label <- "sebastes"
114+
115+
sam.path <- tempdir()
116+
untar(system.file("extdata",
117+
"sebastes_sam.tar.gz",
118+
package="microhaplot"),
119+
exdir = sam.path)
120+
121+
label.path <- file.path(sam.path, "label.txt")
122+
vcf.path <- file.path(sam.path, "sebastes.vcf")
123+
out.path <- tempdir()
124+
app.path <- "~/Shiny/microhaplot"
125+
126+
# for your dataset: customize the following paths
127+
# sam.path <- "~/microhaplot/extdata/"
128+
# label.path <- "~/microhaplot/extdata/label.txt"
129+
# vcf.path <- "~/microhaplot/extdata/sebastes.vcf"
130+
# app.path <- "~/Shiny/microhaplot"
131+
132+
haplo.read.tbl <- prepHaplotFiles(run.label = run.label,
133+
sam.path = sam.path,
134+
out.path = out.path,
135+
label.path = label.path,
136+
vcf.path = vcf.path,
137+
app.path = app.path,
138+
n.jobs = 4) # assume running on dual core
139+
140+
141+
runShinyHaplot(app.path)
142+
```
143+
144+
145+
## Suggestions
146+
- SAM files: For pair-ended experiment, both directional reads should be flashed into one.
147+
148+

README.md

Lines changed: 115 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,100 +1,139 @@
1-
# microhaplot
2-
3-
`microhaplot` generates visual summaries of microhaplotypes found in short read alignments. All you need are alignment SAM
4-
files and a variant call VCF file. (The latter tells `microhaplot` which SNPs to include into microhaplotypes). It was
5-
designed for extracting and visualized haplotypes from high-quality amplicon sequencing data. We have used it extensively
6-
to process amplicon sequencing data (with 100 to 500 amplicons) from rockfish and Chinook salmon, generated on an Illumina
7-
MiSeq sequencer. It should be extensible to sequences from capture arrays, like RAPTURE data.
8-
9-
This software exists as an R package `microhaplot` that includes within it the code to set up and
10-
establish an Rstudio/Shiny server to visualize and manipulate the data. There are two key steps in
11-
the `microhaplot` workflow:
12-
13-
1. The first step is to summarize alignment and variant (SNP) data into a single data frame that is
14-
easily operated upon. This is done using the function `microhaplot::prepHaplotFiles`. You must supply a
15-
VCF file that includes variants that you are interested in extracting, and as many SAM files
16-
(one for each individual) that you want to extract read information from at each of the variants.
17-
The function `microhaplot::prepHaplotFiles` makes a call
18-
to PERL to parse the CIGAR strings in the SAM files to extract the variant information at each read
19-
and store this information into a data frame which gets saved with the installed Shiny app (see below)
20-
for later use. Depending on the size of the data set, this can take a few minutes.
21-
22-
2. The second step is to run the microhaplot Shiny app to visualize the sequence information, call genotypes using
23-
simple read-depth based filtering criteria, and curate the loci. microhaplot is suitable for quick assessment
24-
and quality control of haplotypes generated from library runs. Plot summaries include read depth, fraction of callable haplotypes, Hardy-Weinberg
25-
equilibrium plots, and more.
26-
27-
28-
See the **Example Data** section to learn about how to run each of these steps on the example data that are provided
29-
with the package.
30-
31-
32-
### Installation and Quick Start
33-
34-
#### required Perl dependencies:
35-
You need to have Perl (version >5.014) installed in your OS in order to run Microhaplot.
36-
For Window users, we recommend install it via http://strawberryperl.com/.
37-
For Mac and Linux users, Perl can be downloaded from https://www.perl.org/get.html
38-
39-
You can either clone the repository and build the `microhaplot` package yourself, or, more easily, you can
40-
install it using [devtools](https://github.com/hadley/devtools). You can get `devtools` by `install.packages("devtools")`.
41-
42-
**To mac user: remember to install [XQuartz](https://www.xquartz.org/), when upgrading your macOS to a new major version.**
43-
44-
Once you have `devtools` available in R, you can get `microhaplot` this way:
45-
```r
1+
2+
# microhaplot <img src="man/figures/microhaplot-sticker.png" align="right" width="200"/>
3+
4+
<!-- badges: start -->
5+
6+
[![CRAN
7+
status](https://www.r-pkg.org/badges/version/microhaplot)](https://CRAN.R-project.org/package=microhaplot)
8+
<!-- badges: end -->
9+
10+
`microhaplot` generates visual summaries of microhaplotypes found in
11+
short read alignments. All you need are alignment SAM files and a
12+
variant call VCF file. (The latter tells `microhaplot` which SNPs to
13+
include into microhaplotypes). It was designed for extracting and
14+
visualized haplotypes from high-quality amplicon sequencing data. We
15+
have used it extensively to process amplicon sequencing data (with 100
16+
to 500 amplicons) from rockfish and Chinook salmon, generated on an
17+
Illumina MiSeq sequencer. It should be extensible to sequences from
18+
capture arrays, like RAPTURE data.
19+
20+
This software exists as an R package `microhaplot` that includes within
21+
it the code to set up and establish an Rstudio/Shiny server to visualize
22+
and manipulate the data. There are two key steps in the `microhaplot`
23+
workflow:
24+
25+
1. The first step is to summarize alignment and variant (SNP) data into
26+
a single data frame that is easily operated upon. This is done using
27+
the function `microhaplot::prepHaplotFiles`. You must supply a VCF
28+
file that includes variants that you are interested in extracting,
29+
and as many SAM files (one for each individual) that you want to
30+
extract read information from at each of the variants. The function
31+
`microhaplot::prepHaplotFiles` makes a call to PERL to parse the
32+
CIGAR strings in the SAM files to extract the variant information at
33+
each read and store this information into a data frame which gets
34+
saved with the installed Shiny app (see below) for later use.
35+
Depending on the size of the data set, this can take a few minutes.
36+
37+
2. The second step is to run the microhaplot Shiny app to visualize the
38+
sequence information, call genotypes using simple read-depth based
39+
filtering criteria, and curate the loci. microhaplot is suitable for
40+
quick assessment and quality control of haplotypes generated from
41+
library runs. Plot summaries include read depth, fraction of
42+
callable haplotypes, Hardy-Weinberg equilibrium plots, and more.
43+
44+
See the **Example Data** section to learn about how to run each of these
45+
steps on the example data that are provided with the package.
46+
47+
## Installation and Quick Start
48+
49+
### required Perl dependencies:
50+
51+
You need to have Perl (version \>5.014) installed in your OS in order to
52+
run Microhaplot.
53+
For Window users, we recommend install it via
54+
<http://strawberryperl.com/>.
55+
For Mac and Linux users, Perl can be downloaded from
56+
<https://www.perl.org/get.html>
57+
58+
You can either clone the repository and build the `microhaplot` package
59+
yourself, or, more easily, you can install it using
60+
[devtools](https://github.com/hadley/devtools). You can get `devtools`
61+
by `install.packages("devtools")`.
62+
63+
**To mac user: remember to install [XQuartz](https://www.xquartz.org/),
64+
when upgrading your macOS to a new major version.**
65+
66+
Once you have `devtools` available in R, you can get `microhaplot` this
67+
way:
68+
69+
``` r
4670
devtools::install_github("ngthomas/microhaplot", build_vignettes = TRUE, build_opts = c("--no-resave-data", "--no-manual"))
4771
```
4872

49-
Once you have installed the `microhaplot` R package with devtools there you need to use the `microhaplot::mvHaplotype`
50-
to establish the microhaplot Shiny App in a convenient location on your system. The following line
51-
creates the directory `Shiny` in my home directory and then within that it creates the
52-
directory `microhaplot` and fills it with the Shiny app as well as the example data that go
53-
along with that.
73+
Once you have installed the `microhaplot` R package with devtools there
74+
you need to use the `microhaplot::mvHaplotype` to establish the
75+
microhaplot Shiny App in a convenient location on your system. The
76+
following line creates the directory `Shiny` in my home directory and
77+
then within that it creates the directory `microhaplot` and fills it
78+
with the Shiny app as well as the example data that go along with that.
5479

55-
```r
80+
``` r
5681
microhaplot::mvShinyHaplot("~/Shiny") # provide a directory path to host the microhaplot app
5782
```
58-
To start familiarizing yourself with microhaplot using the provided example data. We recommend
59-
going through our first vignette. Call it up with:
60-
```r
83+
84+
To start familiarizing yourself with microhaplot using the provided
85+
example data. We recommend going through our first vignette. Call it up
86+
with:
87+
88+
``` r
6189
browseVignettes("microhaplot")
6290
```
91+
6392
and check out `microhaplot-walkthrough`.
6493

65-
Now, having done that, we can launch Shiny microhaplot on the example data:
66-
```r
94+
Now, having done that, we can launch Shiny microhaplot on the example
95+
data:
96+
97+
``` r
6798
library(microhaplot)
6899
app.path <- "~/Shiny/microhaplot"
69100
runShinyHaplot(app.path)
70101
```
71102

72-
### Quick Guide to use microhaplot to parse out SAM and VCF files
103+
## Quick Guide to use microhaplot to parse out SAM and VCF files
73104

74-
This microhaplot package comes with a small customized sample data drawn from an actual run
75-
of short read sequencing run on Rockfish species. The sample data
76-
contains sequences of eight genomic loci for four populations of five individuals each,
77-
with a total of twenty individuals.
105+
This microhaplot package comes with a small customized sample data drawn
106+
from an actual run of short read sequencing run on Rockfish species. The
107+
sample data contains sequences of eight genomic loci for four
108+
populations of five individuals each, with a total of twenty
109+
individuals.
78110

79-
First you need to create a tab-separate **label** file with 3 info columns: path to SAM file name, individual ID, and group label (in this particular order). If you do not want assign any group label for the individuals, you can just leave it as "NA". It is recommended that you have all of the SAM files under one directory to make this labeling task easier.
111+
First you need to create a tab-separate **label** file with 3 info
112+
columns: path to SAM file name, individual ID, and group label (in this
113+
particular order). If you do not want assign any group label for the
114+
individuals, you can just leave it as “NA”. It is recommended that you
115+
have all of the SAM files under one directory to make this labeling task
116+
easier.
80117

81118
The `label` file looks like this:
82-
```txt
119+
120+
``` txt
83121
s6.sam s6 copper
84122
s11.sam s11 copper
85123
s13.sam s13 gold
86124
s14.sam s14 kelp
87125
s18.sam s18 gold
88-
```
89-
90-
Once you have the label file in place, you can run `prepHaplotFiles`, a R function that generates tables of microhaplotype, by providing the following:
91-
* a label to display in haPLOType
92-
* path to the directory with all SAM files
93-
* path to the `label` file you just created
94-
* path to the VCF file
95-
* optional number of threads (for non-Windows user); recommend 2 * # of processors
96-
97-
```R
126+
```
127+
128+
Once you have the label file in place, you can run `prepHaplotFiles`, a
129+
R function that generates tables of microhaplotype, by providing the
130+
following: \* a label to display in haPLOType \* path to the directory
131+
with all SAM files \* path to the `label` file you just created \* path
132+
to the VCF file
133+
\* optional number of threads (for non-Windows user); recommend 2 \* \#
134+
of processors
135+
136+
``` r
98137
library(microhaplot)
99138

100139
# to access package sample case study dataset of rockfish
@@ -129,8 +168,7 @@ haplo.read.tbl <- prepHaplotFiles(run.label = run.label,
129168
runShinyHaplot(app.path)
130169
```
131170

171+
## Suggestions
132172

133-
### Suggestions
134-
- SAM files: For pair-ended experiment, both directional reads should be flashed into one.
135-
136-
173+
- SAM files: For pair-ended experiment, both directional reads should
174+
be flashed into one.

0 commit comments

Comments
 (0)