Skip to content

MathijsSanders/AdditionalBAMStatistics

Repository files navigation

Additional BAM statistics

This repository contains code for further annotating ANNOVAR files with statistics helpful for variant filtering. Especially in the context of laser capture microdissection these statistics are useful for detecting false positive variants as a consequence of hairpins.

How do I run it?

AdditionalBAMStatistics is a precompiled multi-threaded JAVA package that amends ANNOVAR files with useful statistics for filtering variants. There are three simple ways of installing AdditionalBAMStatistics.

Recommendation - SNP database

AdditionalBAMStatistics is able to make use of SNP databases to mark reads with too many mismatches not reported as known SNPs. This statistics is especially informative in the context of cross-species contamination or false positive variants that derive from extremely homologous regions.

The following command downloads a database for common SNPs for hg19/GRCh37:

wget -q ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/common_all_20180423.vcf.gz

Necessary - Indexed reference genome

Please make sure that the reference genome is indexed by SAMtools. If not, please run the following command:

samtools faidx <reference_FASTA_file>

1. The easiest way - Singularity container

Requirements

  • Singularity
  • Indexed reference FASTA file

Run information

The JAVA package has been incorporated into a Singularity container available from Singularity hub. In case singularity is installed, simply run:

singularity pull shub://MathijsSanders/AdditionalBAMStatisticsSingularity

Still requires that the reference genome is indexed!

This includes:

  • AdditionalBAMStatistics
  • SAMtools

The following parameters are available:

Parameter Description
-a/--annovarfile* The ANNOVAR file to be further annotated.
-b/--bamfile* The corresponding BAM files of the sample of interest.
-r/--reference* The indexed reference FASTA file used for alignment.
-o/--output-file Output file for writing the results (Default: standard out).
-s/--snp-database SNP database for annotating reads with too many mismatches not reported as SNPs (Either vcf or vcf.gz).
-m/--max-non-snp The maximum number of mismatched not reported as SNP before a read is marked as having too many mutations (Default: 2).
-d/--diff-alignment-score The difference between the current and alternative alignment score before a read is considered multi-mappable (Default: 5).
-t/--threads Number of threads to use (Default: 1).
-c/--current-heapsize The maximum heap size JAVA can use (Default: 10G). This threshold should be increased in case a larger SNP database is used.
-h/--help Help information.
* Required.

2. The easy way - Use precompiled JAR file

Requirements

  • JAVA JDK 11+
  • SAMtools
  • Indexed reference FASTA file

Run information

Simply run the following command to download the repository:

git clone https://github.com/MathijsSanders/AdditionalBAMStatistics.git

Run the following command to start annotating the ANNOVAR file:

java -Xmx10G -jar additionalBamStatistics.jar --input-annovar-file <annovar_file> --input-bam-file <bam_file> --reference <reference_file> --output-file <output_file> --snp-database <snp_database> --max-non-snp <max_non_snp> --difference-alignment-scores <diff_scores> --threads <threads> --help --version
Parameter Description
--input-annovar-file* The ANNOVAR file to be further annotated.
--input-bam-file* The corresponding BAM files of the sample of interest.
--reference* The indexed reference FASTA file used for alignment.
--output-file Output file for writing the results (Default: standard out).
--snp-database SNP database for annotating reads with too many mismatches not reported as SNPs (Either vcf or vcf.gz)
--max-non-snp The maximum number of mismatched not reported as SNP before a read is marked as having too many mutations (Default: 2).
--difference-alignment-score The difference between the current and alternative alignment score before a read is considered multi-mappable (Default: 5).
--threads Number of threads to use (Default: 1).
--help Help information.
--version Version information.
* Required.

3. The difficult way - Compile package

Requirements

  • Maven version 3+ (For compiling only).
  • Java JDK 11+

Run information

The precompiled JAR file is included with the repository, but in case the package needs to be recompiled, please run:

mvn package clean

Once the JAR file is compiled please follow the JAR-specific instructions listed under point 2.

About

Source code for AdditionalBAMStatistics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published