TT-Mars: Structural Variants Assessment Based on Haplotype-resolved Assemblies.
- Clone TT-Mars from github and
cd TT-Mars
. - Create environment and activate:
conda create -n ttmars
andconda activate ttmars_test
. - Run
dowaload_files.sh
to download required files to./ttmars_files
. - Run
download_asm.sh
to download assembly files of 10 samples from HGSVC. - Install packages:
conda install -c bioconda pysam
,conda install -c anaconda numpy
,conda install -c bioconda mappy
,conda install -c conda-forge biopython
,conda install -c bioconda pybedtools
. - Run TT-Mars with following steps:
run_ttmars.sh
includes more instructions. Users can run it to run TT-Mars after setting up.
The main program:
python ttmars.py output_dir centro_file files_dir/assem1_non_cov_regions.bed files_dir/assem2_non_cov_regions.bed vcf_file reference asm_h1 asm_h2 files_dir/lo_pos_assem1_result_compressed.bed files_dir/lo_pos_assem2_result_compressed.bed files_dir/lo_pos_assem1_0_result_compressed.bed files_dir/lo_pos_assem2_0_result_compressed.bed tr_file
Script to combine results and output:
python combine.py output_dir num_X_chr
output_dir
: Output directory.centro_file
: provided centromere file.tr_file
: provided tandem repeats file.vcf_file
: callset file callset.vcf(.gz)reference
: referemce file reference_genome.fasta.asm_h1/2
: assembly files assembly1/2.fa, can be downloaded bydownload_asm.sh
.assem1_non_cov_regions.bed
,assem2_non_cov_regions.bed
,lo_pos_assem1_result_compressed.bed
,lo_pos_assem2_result_compressed.bed
,lo_pos_assem1_0_result_compressed.bed
,lo_pos_assem2_0_result_compressed.bed
: required files, downloaded to./ttmars_files
.num_X_chr
: if male sample: 1; if female sample: 2.
-
ttmars.py:
-n/--not_hg38
: if reference is NOT hg38 (hg19).
-p/--passonly
: if consider PASS calls only.
-s/--seq_resolved
: if consider sequence resolved calls (INS).
-w/--wrong_len
: if count wrong length calls as True.
-g/--gt_vali
: conduct genotype validation. -
combine.py:
-v/--vcf_out
: output results as vcf files (tp (true positive), fp (false positive) and na), must be used together with-f/--vcf_file
.
-f VCF_FILE/--vcf_file VCF_FILE
: input vcf file, use as template.
-g/--gt_vali
: conduct genotype validation.
-n/--false_neg
: output recall, must be used together with-t/--truth_file
and-f/--vcf_file
.
-t/--truth_file
: input truth vcf file, must be used together with-n/--false_neg
.
ttmars_combined_res.txt:
chr | start | end | type | relative length | relative score | validation result | genotype match |
---|---|---|---|---|---|---|---|
chr1 | 893792 | 893827 | DEL | 1.03 | 3.18 | True | True |