Additional BAM statistics

This repository contains code for further annotating ANNOVAR files with statistics helpful for variant filtering. Especially in the context of laser capture microdissection these statistics are useful for detecting false positive variants as a consequence of hairpins.

How do I run it?

AdditionalBAMStatistics is a precompiled multi-threaded JAVA package that amends ANNOVAR files with useful statistics for filtering variants. There are three simple ways of installing AdditionalBAMStatistics.

Recommendation - SNP database

AdditionalBAMStatistics is able to make use of SNP databases to mark reads with too many mismatches not reported as known SNPs. This statistics is especially informative in the context of cross-species contamination or false positive variants that derive from extremely homologous regions.

The following command downloads a database for common SNPs for hg19/GRCh37:

wget -q ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/common_all_20180423.vcf.gz

Necessary - Indexed reference genome

Please make sure that the reference genome is indexed by SAMtools. If not, please run the following command:

samtools faidx <reference_FASTA_file>

1. The easiest way - Singularity container

Requirements

Singularity
Indexed reference FASTA file

Run information

The JAVA package has been incorporated into a Singularity container available from Singularity hub. In case singularity is installed, simply run:

singularity pull shub://MathijsSanders/AdditionalBAMStatisticsSingularity

Still requires that the reference genome is indexed!

This includes:

AdditionalBAMStatistics
SAMtools

The following parameters are available:

Parameter	Description
-a/--annovarfile*	The ANNOVAR file to be further annotated.
-b/--bamfile*	The corresponding BAM files of the sample of interest.
-r/--reference*	The indexed reference FASTA file used for alignment.
-o/--output-file	Output file for writing the results (Default: standard out).
-s/--snp-database	SNP database for annotating reads with too many mismatches not reported as SNPs (Either vcf or vcf.gz).
-m/--max-non-snp	The maximum number of mismatched not reported as SNP before a read is marked as having too many mutations (Default: 2).
-d/--diff-alignment-score	The difference between the current and alternative alignment score before a read is considered multi-mappable (Default: 5).
-t/--threads	Number of threads to use (Default: 1).
-c/--current-heapsize	The maximum heap size JAVA can use (Default: 10G). This threshold should be increased in case a larger SNP database is used.
-h/--help	Help information.
*	Required.

2. The easy way - Use precompiled JAR file

Requirements

JAVA JDK 11+
SAMtools
Indexed reference FASTA file

Run information

Simply run the following command to download the repository:

git clone https://github.com/MathijsSanders/AdditionalBAMStatistics.git

Run the following command to start annotating the ANNOVAR file:

java -Xmx10G -jar additionalBamStatistics.jar --input-annovar-file <annovar_file> --input-bam-file <bam_file> --reference <reference_file> --output-file <output_file> --snp-database <snp_database> --max-non-snp <max_non_snp> --difference-alignment-scores <diff_scores> --threads <threads> --help --version

Parameter	Description
--input-annovar-file*	The ANNOVAR file to be further annotated.
--input-bam-file*	The corresponding BAM files of the sample of interest.
--reference*	The indexed reference FASTA file used for alignment.
--output-file	Output file for writing the results (Default: standard out).
--snp-database	SNP database for annotating reads with too many mismatches not reported as SNPs (Either vcf or vcf.gz)
--max-non-snp	The maximum number of mismatched not reported as SNP before a read is marked as having too many mutations (Default: 2).
--difference-alignment-score	The difference between the current and alternative alignment score before a read is considered multi-mappable (Default: 5).
--threads	Number of threads to use (Default: 1).
--help	Help information.
--version	Version information.
*	Required.

3. The difficult way - Compile package

Requirements

Maven version 3+ (For compiling only).
Java JDK 11+

Run information

The precompiled JAR file is included with the repository, but in case the package needs to be recompiled, please run:

mvn package clean

Once the JAR file is compiled please follow the JAR-specific instructions listed under point 2.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src/com/sanger/additionalBamStatistics		src/com/sanger/additionalBamStatistics
.gitignore		.gitignore
README.md		README.md
additionalBamStatistics.jar		additionalBamStatistics.jar
pom.xml		pom.xml
runScript.sh		runScript.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Additional BAM statistics

How do I run it?

Recommendation - SNP database

Necessary - Indexed reference genome

1. The easiest way - Singularity container

Requirements

Run information

2. The easy way - Use precompiled JAR file

Requirements

Run information

3. The difficult way - Compile package

Requirements

Run information

About

Releases

Packages

Languages

MathijsSanders/AdditionalBAMStatistics

Folders and files

Latest commit

History

Repository files navigation

Additional BAM statistics

How do I run it?

Recommendation - SNP database

Necessary - Indexed reference genome

1. The easiest way - Singularity container

Requirements

Run information

2. The easy way - Use precompiled JAR file

Requirements

Run information

3. The difficult way - Compile package

Requirements

Run information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages