Releases: twolinin/longphase
v1.7.3
v1.7.2
Summary
Haplotag will add the @PG
tag to the header and add option --cram
to output CRAM format.
Currently, phase
combines different alignments of the same read into a single alignment.
v1.7.1
Summary
If the INFO field of an SV variant does not contain RNAMES, phasing will not be performed for this variant.
Haplotag now includes documentation on tagging specific regions.
v1.7
Summary
Merge different alignments of a read to improve phasing integrity and adjust parameter weights to enhance phasing accuracy. Allow the use of phased modification VCF to increase the proportion of tagged reads. Address some known issues.
phase (-t 24) | v1.6 SW | v1.6 #Block | v1.6 Block N50 | v1.7 SW | v1.7 #Block | v1.7 Block N50 |
---|---|---|---|---|---|---|
HG002 ONT R10.4.1 10x | 1,137 | 7,212 | 774,928 | 1,117 | 7,100 | 807,877 |
HG002 ONT R10.4.1 20x | 1,225 | 4,257 | 1,560,226 | 1,218 | 4,132 | 1,654,808 |
HG002 ONT R10.4.1 30x | 1,194 | 3,499 | 1,903,900 | 1,180 | 3,372 | 2,042,500 |
HG002 ONT R10.4.1 40x | 1,211 | 3,045 | 2,177,620 | 1,200 | 2,915 | 2,332,195 |
HG002 ONT R10.4.1 50x | 1,216 | 2,797 | 2,470,461 | 1,213 | 2,679 | 2,606,645 |
HG002 ONT R10.4.1 60x | 1,197 | 2,627 | 2,587,166 | 1,195 | 2,513 | 2,830,210 |
SW: Switch Error
v1.6
Summary
- Implement chromosome-level parallelization for the
modcall
andphase
commands. The overall execution time is reduced 71% ~ 88%. - Replace
malloc
withjemalloc
. - Remove and simplify unused parameters to improve memory usage.
- Adjust the weighting of low-quality variants in phasing.
- The VCF generated by
modcall
can be directly imported into IGV. Additionally,modcall
can output all detected coordinates by using the--all
parameter.
phase (-t 24) | v1.5.2 (Time) | v1.5.2 (Memory) | v1.6 (Time) | v1.6 (Memory) |
---|---|---|---|---|
HG002 ONT R10.4.1 10x | 153s | 7.7G | 39s | 15.1G |
HG002 ONT R10.4.1 20x | 444s | 8.2G | 53s | 15.6G |
HG002 ONT R10.4.1 30x | 355s | 8.5G | 68s | 24.4G |
HG002 ONT R10.4.1 40x | 908s | 8.8G | 217s | 26.6G |
HG002 ONT R10.4.1 50x | 1043s | 9.2G | 262s | 22.2G |
HG002 ONT R10.4.1 60x | 640s | 9.5G | 113s | 33.4G |
modcall (-t 24) | v1.5.2 (Time) | v1.5.2 (Memory) | v1.6 (Time) | v1.6 (Memory) |
---|---|---|---|---|
HG002 ONT R10.4.1 10x | 322s | 11.0G | 93s | 22.2G |
HG002 ONT R10.4.1 20x | 635s | 14.6G | 199s | 31.6G |
HG002 ONT R10.4.1 30x | 746s | 18.2G | 125s | 48.1G |
HG002 ONT R10.4.1 40x | 1308s | 21.5G | 292s | 55.8G |
HG002 ONT R10.4.1 50x | 1570s | 25.0G | 317s | 68.8G |
HG002 ONT R10.4.1 60x | 1454s | 28.4G | 248s | 84.0G |
*If the device is running low on memory, you can control memory usage by reducing the number of threads (-t).
v1.5.2
v1.5.1
Release note of v1.5.1
There are two major update in this release.
- Multi-bam input is now supported. The user can specify multiple BAM files by
longphase phase \
-s SNP.vcf \
-b alignment1.bam \
-b alignment2.bam \
-r reference.fasta \
-t 8 \
-o phased_prefix \
--ont # or --pb for PacBio Hifi
- A beta version of SNP and modification co-phasing is now released. Feedback are very welcome. The user can detect allele-specific modifications (5mC only at this moment) by a new command modcall, assuming the MM/ML tags are carried to aligned BAM.
longphase modcall \
-b alignment.bam \
-r reference.fasta \
-t 8 \
-o modcall
The SNP and modification co-phasing can then be invoked by providing the modcall-generated VCF. Co-phasing SNPs and modifications can further improve the phasing contiguity.
longphase phase \
-s SNP.vcf \
--mod-file modcall.vcf \
-b alignment.bam \
-r reference.fasta \
-t 8 \
-o phased_prefix \
--ont # or --pb for PacBio Hifi
v1.5
v1.4
Major change
This version improves phasing accuracy including switch error and hamming distance. The underlying graph improves phasing integrity by multiple edges instead of second-stage block phasing. Haplotag reads are used in the implementation to improve phasing accuracy.
The phasing accuracy is improved under ONT R9.4.1 with different sequencing depths.
v1.3
Major change
This version mainly improved the phasing accuracy, especially at low-coverage depth (10-20x). The underlying graph model now creates and considers multiple edges from local/flanking SNPs during phasing (see below), which increases the phasing accuracy and robustness at regions of dense ONT errors.
As such the running time increases ~20% on average (2.5-8 minutes for 10-60x with 8 Cores on SSD). The block N50 becomes larger at 10-30x and slightly smaller at 50-60x when compared with previous version. The phasing accuracy (SW: switch error rate) improved at all coverage.
Minor change
- The
haplolag
now writes the Phred-scaled phasing quality of each read in the tagged BAM (e.g., PQ:i:40), which was discussed at #19.
- The program will prompt the user if missing the reference genome. #18