Skip to content

andersen-lab/covar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

coVar

coVar is a tool for detecting physically-linked mutations in genomic data. Given a sorted, indexed BAM file, reference genome and gene annotation, coVar identifies and counts sequencing reads with unique physically linked mutations.

Installation

Currently, to install coVar, you need to have cargo installed.

Install from crates.io (recommended)

cargo install covar
covar --version

Local build from source (experimental)

git clone https://github.com/andersen-lab/covar.git
cd covar
cargo install --path .
covar --version

Usage

covar --input <INPUT_BAM> --reference <REFERENCE_FASTA> --annotation <ANNOTATION_GFF>

Required arguments

Flag Description
-i, --input <INPUT_BAM> Input BAM file (must be primer trimmed, sorted, and indexed).
-r, --reference <REFERENCE_FASTA> Reference genome in FASTA format.
-a, --annotation <ANNOTATION_GFF> Annotation GFF3 file for translating nucleotide to amino acid mutations.

Optional arguments

Flag Default Description
-o, --output <OUTPUT> stdout Output file path. If not provided, results will be printed to stdout.
-s, --start_site <START> 0 Genomic start position for variant calling.
-e, --end_site <END> reference length Genomic end position for variant calling. Defaults to the length of the reference genome.
-d, --min_depth <DEPTH> 1 Minimum coverage depth for a mutation cluster to be considered.
-f, --min_frequency <FREQ> 0.001 Minimum mutation frequency (cluster depth / total depth).
-q, --min_quality <QUAL> 20 Minimum base quality score for variant calling.
-t, --threads <THREADS> 1 Number of threads to use for processing.

Example Commands

Basic run

covar \
  -i sample.bam \
  -r reference.fasta \
  -a annotation.gff3

Specify genomic region and output file

covar \
  -i sample.bam \
  -r reference.fasta \
  -a annotation.gff3 \
  -s 1000 \
  -e 5000 \
  -o output.tsv

Multi-threaded run with custom depth, quality and frequency thresholds

covar \
  -i sample.bam \
  -r reference.fasta \
  -a annotation.gff3 \
  -d 5 \
  -q 30 \
  -f 0.01 \
  -t 4

Output

The output is a tab-delimited file (.tsv) with the following columns:

Column Description
nt_mutations Nucleotide mutations for this cluster
aa_mutations Corresponding amino acid translations (where possible*)
cluster_depth Total number of read pairs with this cluster of mutations
total_depth Total number of reads spanning this cluster
frequency Mutation frequency (cluster depth / total depth)
coverage_start Maximum read start site for which this cluster was detected
coverage_end Minimum read end site for which this cluster was detected

*Note: Not all nucleotide mutations will have a corresponding amino acid mutations. For example, SNPs in codons that span reads or frameshift indels will be translated as 'Unknown' and 'NA', respectively.

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages