Skip to content

CompEpigen/methrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Fast and efficient summarization of generic bedGraph files from Bisufite sequencing

Travis build status Lifecycle: maturing BioC status

Introduction

Bedgraph files generated by BS pipelines often come in various flavors. Critical downstream step requires aggregation of these files into methylation/coverage matrices. This step of data aggregation is done by Methrix, including many other useful downstream functions.

Package overview and usage functions

Summary:

  • Faster summarization of generic bedGraph files with data.table back-end
  • Fills missing CpGs from reference genome
  • Vectorized code (faster, memory expensive) and non-vectorized code (slower, minimal memory)
  • Built upon SummarizedExperiment with custom methods for CpG extraction, sub-setting, and filtering
  • Easy conversion to bsseq object for downstream analysis
  • Extensive one click interactive html report generation
  • Supports serialized arrays with HDF5Array and saveHDF5SummarizedExperiment

Updates:

see here

Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("CompEpigen/methrix")

Usage:

Step-1: Extract all CpG loci from the reference genome

> hg19_cpgs = methrix::extract_CPGs(ref_genome = "BSgenome.Hsapiens.UCSC.hg19")
# Extracting CpGs..
# Here is a Chuck Norris joke while you wait..
# Chuck Norris doesn't chew gum. Chuck Norris chews tin foil.
# Done.
# Extracted 28787054 from 86 contigs.

Step-2: Read in bedgraphs and generate a methrix object

Test data is 3 stranded WGBS bedgraphs from MethylcTools

#Example bedgraph files
> bdg_files = list.files(path = system.file('extdata', package = 'methrix'), pattern = "*bdg\\.gz$", full.names = TRUE)

> meth = methrix::read_bedgraphs(files = bdg_files, ref_cpgs = hg19_cpgs, chr_idx = 1, start_idx = 2, M_idx = 3, U_idx = 4,
  stranded = TRUE, collapse_strands = TRUE)
----------------------------
-Preset:        Custom 
--Missing beta and coverage info. Estimating them from M and U values
-CpGs raw:      28,700,086
-CpGs filtered: 28,217,448
-CpGs stranded: 56,434,896
----------------------------
-Batch:         1/1 
-Processing:    C1.bdg.gz
--CpGs missing: 56,434,254
-Processing:    C2.bdg.gz
--CpGs missing: 56,434,231
-Processing:    N1.bdg.gz
--CpGs missing: 56,434,233
-Processing:    N2.bdg.gz
--CpGs missing: 56,434,233
-Finished in:   00:01:19 elapsed (00:01:52 cpu) 

> meth
An object of class methrix
   n_CpGs: 28,217,448
n_samples: 4
    is_h5: FALSE
Reference: Genome Reference Consortium GRCh37

Methrix operations

What can be done on methrix object? Following are the key functions

#reading and writing:
read_bedgraphs()    #Reads in bedgraph files into methrix
write_bedgraphs()   #Writes bedGraphs from methrix object
#operations
order_by_sd()       #Orders methrix object by SD
region_filter()	    #Filters matrices by region
mask_methrix()      #Masks lowly covered CpGs
coverage_filter()   #Filters methrix object based on coverage
subset_methrix()	  #Subsets methrix object based on given conditions.
remove_uncovered()	#Removes loci that are uncovered across all samples
remove_snps()       #Removes loci overlapping with possible SNPs
#Visualization and QC
methrix_report()    #Creates a detailed interative html summary report from methrix object
methrix_pca()	      #Principal Component Analysis
plot_pca()          #Plots the result of PCA
plot_coverage()     #Plots coverage statistics
plot_density()      #Plots the density distribution of the beta values 
plot_violin()       #Plots the distribution of the beta values on a violin plot
plot_stats()        #Plot descriptive statistics
get_stats()	        #Estimate descriptive statistics of the object
#Other
methrix2bsseq()     #Convert methrix to bsseq object

Speed benchmark

To be added

To do

  • Write methods
  • Add vignette
  • Add examples
  • Clear BiocCheck warnings and notes
  • Submit to BioConductor for Fall (3.10) release

Note

This repository is under active development