When analyzing Epigenome sequencing data, a common task is to visualize the signal intensity detected at different position of the genome, the distribution and structure of various cis-elements/genes in the corresponding genome region, and flexibly combine them together. UCSC Genome Browser and WashU Epigenome Browser are the greatest and most famous online tools that help to do the visualization task. However, exporting track plots generated by these online tools in a lossless and personalized manner can often be exhausting. Based on ggplot2, a powerful data visualization toolkit in R language, TrackPlotR is designed to generate attractive and high-quality genomic track plots using the most commonly employed data formats in a concise, elegant and highly customizable manner.
TrackPlotR can be installed from GitHub:
devtools::install_github('yimingsun12138/TrackPlotR')
TrackPlotR was designed with the aim of retaining high scalability while minimizing dependency on other packages as much as possible. The packages listed below are necessary for the installation of TrackPlotR:
TrackPlotR currently provides functions for visualizing signal coverage, genome features and transcripts. Meanwhile, TrackPlotR also provides build-in data to help get started quickly.
library(ggplot2)
library(rtracklayer)
library(TrackPlotR)
library(patchwork)
#build-in example data
example_path <- system.file('extdata',package = 'TrackPlotR')
list.files(example_path)
## [1] "Homo_sapiens.GRCh38.105.subset.gtf" "S1_ATAC_coverage.bw"
## [3] "S1_ATAC_peak.bed" "S2_ATAC_coverage.bw"
## [5] "S2_ATAC_peak.bed" "S3_ATAC_coverage.bw"
## [7] "S3_ATAC_peak.bed" "S4_ATAC_coverage.bw"
The build-in data includes Tn5 insertion site coverage for 4 samples (ATAC_coverage.bw), ATAC-seq peak region for 3 samples, and a human gene annotation file (Ensembl release 105).
First load the BigWig file into R using load_BigWig
and assign a sample name for each file:
coverage_info <- paste(example_path,c('S1_ATAC_coverage.bw','S2_ATAC_coverage.bw','S3_ATAC_coverage.bw','S4_ATAC_coverage.bw'),sep = '/')
coverage_info <- load_BigWig(file_path = coverage_info,file_name = c('S1','S2','S3','S4'))
Plot the coverage signal in the specified genome region:
region <- as('chr3:58510000-58690000','GRanges')
p1 <- coverage_vis_region(coverage_table = coverage_info,region = region,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"))
print(p1 + theme(aspect.ratio = 0.1))
Sometimes we are interested in the chromatin accessibility landscape around a certain gene, and TrackPlotR provides a quick method for the visualization task:
human_anno <- paste(example_path,'Homo_sapiens.GRCh38.105.subset.gtf',sep = '/')
human_anno <- rtracklayer::import(con = human_anno,format = 'gtf')
p1 <- coverage_vis_gene(coverage_table = coverage_info,gene_anno = human_anno,column_name = 'gene_name',gene = 'FAM107A',up_extend = 50000,down_extend = 50000,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"))
print(p1 + theme(aspect.ratio = 0.1))
TrackPlotR also provides methods for visualizing cis-elements (features) distribution in the specified genome region or around a certain gene. These cis-elements can be peak list, enhancers, human accelerated regions (HARs) and so on, and should be provided in the bed or GRanges/GRangesList format.
#load peak files and assign sample name
peak_info <- paste(example_path,c('S1_ATAC_peak.bed','S2_ATAC_peak.bed','S3_ATAC_peak.bed'),sep = '/')
names(peak_info) <- c('S1','S2','S3')
#feature plot in the specified genome region
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"))
print(p2 + theme(aspect.ratio = 0.2))
#feature plot around a certain gene
p2 <- feature_vis_gene(features = peak_info,gene_anno = human_anno,column_name = 'gene_name',gene = 'FAM107A',up_extend = 50000,down_extend = 50000,col_pal = c("#D51F26","#272E6A","#208A42"))
print(p2 + theme(aspect.ratio = 0.2))
TrackPlotR also provides methods for visualizing transcripts/genes structure and distribution in the specified genome region or around a certain gene. Gene annotation file can be provided in the GTF or GRanges format and must contain 3 columns: type, gene_id and transcript_id.
#gene plot in the specified genome region
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_id')
print(p3 + theme(aspect.ratio = 0.2))
#plot a certain gene only
p3 <- gene_track_vis(gene_anno = human_anno,column_name = 'gene_name',gene = 'FAM107A',up_extend = 50000,down_extend = 50000,show_only = TRUE)
print(p3 + theme(aspect.ratio = 0.2))
In addition to be displayed by gene, the gene annotation file can also be displayed on a transcript basis.
#display by transcript
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'transcript',show_name = 'transcript_name')
print(p3 + theme(aspect.ratio = 0.5))
What are the interactive relationships between these cis-elements on genome? For example, the activation of enhancer to promoter, co-accessible networks among cis-regulatory elements, etc. TrackPlotR also provides methods for linkage track visualization in the specified genome region or around a certain gene.
#load linkage table and convert to GRanges object
linkage <- paste(example_path,c('peak2gene_linkage.csv'),sep = '/')
linkage <- read.csv(file = linkage,header = TRUE,row.names = 1)
linkage <- linkage_table_to_GRanges(linkage = linkage)
#linkage track plot in the specified genome region
p4 <- linkage_vis_region(linkage = linkage,region = region,color_by = 'Correlation')
print(p4 + theme(aspect.ratio = 0.15))
#linkage plot around a certain gene
p4 <- linkage_vis_gene(linkage = linkage,gene_anno = human_anno,column_name = 'gene_name',gene = 'FAM107A',up_extend = 50000,down_extend = 50000,color_by = 'Correlation')
print(p4 + theme(aspect.ratio = 0.15))
Usually, we would like to observe different track plots together, so it is particularly important to set a uniform style for different track plots. Currently, TrackPlotR has only two build-in plotting styles, and more styles will be added in the future.
p1 <- coverage_vis_region(coverage_table = coverage_info,region = region,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"),style = 'style_1')
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"),style = 'style_1')
p3 <- linkage_vis_region(linkage = linkage,region = region,color_by = 'Correlation',style = 'style_1')
p4 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',style = 'style_1')
Thanks to the high flexibility and scalability of ggplot2, we can make more personalized modifications to the images returned by TrackPlotR and stitch them together easily.
p1 <- p1 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank())
p2 <- p2 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank(),legend.position = 'none')
p3 <- p3 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank())
print(p1 + p2 + p3 + p4 + plot_layout(ncol = 1,heights = c(0.5,0.1,0.1,0.15)))
All borders are removed in 'style_2', making it suitable for visualization with greater width.
#visualize with style_2
p1 <- coverage_vis_region(coverage_table = coverage_info,region = region,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"),style = 'style_2')
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"),style = 'style_2')
p3 <- linkage_vis_region(linkage = linkage,region = region,color_by = 'Correlation',style = 'style_2')
p4 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',style = 'style_2')
p1 <- p1 + theme(axis.title.x = element_blank())
p2 <- p2 + theme(axis.line.x = element_blank(),axis.title.x = element_blank())
p3 <- p3 + theme(axis.line.x = element_blank(),axis.title.x = element_blank())
print(p1 + p2 + p3 + p4 + plot_layout(ncol = 1,heights = c(0.5,0.1,0.1,0.15)))
Sometimes, we want to emphasize our discoveries by highlighting specific genome regions, which can be conveniently achieved using the highlight_region
function. By simply providing the highlight_region
function with the track plot (as a ggplot object) and the genome region we are interested in, we can obtain a ggplot object with added background highlight.
#highlight region
p1 <- coverage_vis_region(coverage_table = coverage_info,region = region,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"),style = 'style_1')
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"),style = 'style_1')
p3 <- linkage_vis_region(linkage = linkage,region = region,color_by = 'Correlation',style = 'style_1')
p4 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',style = 'style_1')
p1 <- highlight_region(gg_object = p1,start = 58622610,end = 58632610)
p2 <- highlight_region(gg_object = p2,start = 58622610,end = 58632610)
p3 <- highlight_region(gg_object = p3,start = 58622610,end = 58632610)
p4 <- highlight_region(gg_object = p4,start = 58622610,end = 58632610)
p1 <- p1 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank())
p2 <- p2 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank(),legend.position = 'none')
p3 <- p3 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank())
print(p1 + p2 + p3 + p4 + plot_layout(ncol = 1,heights = c(0.5,0.1,0.1,0.15)))
Feature track plot and transcript track plot can be displayed with different row compression densities.
# full transcripts
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',display_mode = 'full')
print(p3 + theme(aspect.ratio = 0.3))
In the full mode, each transcript/gene will occupy a single row.
# squish transcripts
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',display_mode = 'squish')
print(p3 + theme(aspect.ratio = 0.2))
In the squish mode, transcripts/genes that do not overlap with each other can occupy the same row.
# collapse transcripts
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = NULL,display_mode = 'collapse')
print(p3 + theme(aspect.ratio = 0.1))
# collapse features
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"),collapse_features = TRUE,overlap_col = 'darkgrey')
print(p2 + theme(aspect.ratio = 0.1))
In the collapse mode, all transcripts/genes and features will be plotted in the same row.
- Pile-Up plot using sequencing fragments
- Hi-C support
- Chromosome plot