Skip to content

A Quality control analysis tool for NGS technologies

License

gkumar09/iSeqQC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iSeqQC: An Expression based Quality Control tool

It is an expression-based quality control tool to detect outliers either produced by batch effects or merely due to dissimilarity within a phenotypic group. It can be utilized by three ways:

Webpage (No Installation required)

iSeqQC is readily available at:

   http://cancerwebpa.jefferson.edu/iSeqQC/

Command line (Local R and Bioconductor libraries installation required)

Running iSeqQC locally requires:

  • Local installation of R or RStudio (version 3.5 or later)- if not available use https://cran.r-project.org/ to download.
  • Installation of bioconductor packages using following commands:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install()
BiocManager::install(c("shiny", "FactoMineR", "factoextra", "som", "psych", "data.table", "ape", "corrplot", "limma", "DESeq2"))
  • After successfully installing R/RStudio and related packages, iSeqQC can simply be run from 'iSeqQC_cli' directory using following command:
Rscript --vanilla iSeqQC_cli/iSeqQC.R exampleData/samplemanifestfile.txt {sample_phenotype_file} exampleData/genesymbol_rawcounts.txt {count_matrix} R {type_of_reads} SYMBOL {type_of_gene_identifier} H {Organism}

where,
type_of_reads: R for raw reads and N for normalized reads
type_of_gene_identifier: SYMBOL if count matrix has gene_symbols in first column and ID if it has gene_ids
Organism: H for Human, M for Mouse and O for others

Local shiny installation (Local R and Bioconductor libraries installation required)

Prerequisities

Running iSeqQC locally requires:

  • Local installation of R or RStudio (version 3.5 or later)- if not available use https://cran.r-project.org/ to download.
  • Installation of bioconductor packages using following commands:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install()
BiocManager::install(c("shiny", "FactoMineR", "factoextra", "som", "psych", "data.table", "ape", "corrplot", "limma", "DESeq2"))
  • After successfully installing R/RStudio and related packages, iSeqQC can simply be run using following commands in R console:
setwd("path_to_local_iSeqQC_installation_directory")
library("shiny")
runApp("iSeqQC")

Please note: iSeqQC shiny installations are successful tested on safari v-12.1, chrome v-79.0, and firefox v-72.2. However, we recommend google chrome for optimum usage.

Input files requirement

iSeqQC requires two files for the analysis. Both files should be ASCII formatted tab-delimited file only

  • File 1- Sample phenotype data: First 4 columns should strictly match the names and order as mentioned below (names case-sensitive)
    Sample names in first column 'samples' should match the names in counts matrix file

column 1: samples
column 2: shortnames
column 3: groups
column 4: include
column 5-11: any factors such as library method, protocol etc.

Example:

samples shortnames groups include
Control_1 C_1 control TRUE
Control_2 C_2 control TRUE
Control_3 C_3 control TRUE
Treated_1 T_1 treated TRUE
Treated_2 T_2 treated TRUE
Treated_3 T_3 treated TRUE
  • File 2- counts matrix file: First column of this file should have official gene symbols or gene ids under the name "gene"(case-sensitive)

Example:

gene_symbol Control_1 Control_2 Control_3 Treated_1 Treated_2 Treated_3
TSPAN6 642 329 704 507 524 629
DPM1 1443 734 1502 1175 1543 1111

or

gene_id Control_1 Control_2 Control_3 Treated_1 Treated_2 Treated_3
ENSG00000000003 642 329 704 507 524 629
ENSG00000000005 1443 734 1502 1175 1543 1111

Results Output

iSeqQC displays the results in a form of a summary table and several plots: Summary statistics, counts distribution, Mapped read density, Housekeeping gene expression, Principal Component variances (zscored normalized), Principal Component variances (un-normalized), Hierarchical relationship between samples, Pearson correlation, Spearman correlation, GC bias)

Workflow

Citation

Kumar G, Ertel A, Feldman G, Kupper J, Fortina P (2020). iSeqQC: A Tool for Expression-Based Quality Control in RNA Sequencing. BMC Bioinformatics. Feb 13;21(1):56. doi: 10.1186/s12859-020-3399-8. PMID: 32054449; PMCID: PMC7020508

About

A Quality control analysis tool for NGS technologies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages