Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LongPhase 1.6 Release Notes #46

Merged
merged 30 commits into from
Jan 5, 2024
Merged

LongPhase 1.6 Release Notes #46

merged 30 commits into from
Jan 5, 2024

Conversation

twolinin
Copy link
Owner

@twolinin twolinin commented Jan 5, 2024

Summary

  1. Implement chromosome-level parallelization for the modcall and phase commands. The overall execution time is reduced 71% ~ 88%.
  2. Replace malloc with jemalloc.
  3. Remove and simplify unused parameters to improve memory usage.
  4. Adjust the weighting of low-quality variants in phasing.
  5. The VCF generated by modcall can be directly imported into IGV. Additionally, modcall can output all detected coordinates by using the --all parameter.
phase (-t 24) v1.5.2 (Time) v1.5.2 (Memory) v1.6 (Time) v1.6 (Memory)
HG002 ONT R10.4.1 10x 153s 7.7G 39s 15.1G
HG002 ONT R10.4.1 20x 444s 8.2G 53s 15.6G
HG002 ONT R10.4.1 30x 355s 8.5G 68s 24.4G
HG002 ONT R10.4.1 40x 908s 8.8G 217s 26.6G
HG002 ONT R10.4.1 50x 1043s 9.2G 262s 22.2G
HG002 ONT R10.4.1 60x 640s 9.5G 113s 33.4G
modcall (-t 24) v1.5.2 (Time) v1.5.2 (Memory) v1.6 (Time) v1.6 (Memory)
HG002 ONT R10.4.1 10x 322s 11.0G 93s 22.2G
HG002 ONT R10.4.1 20x 635s 14.6G 199s 31.6G
HG002 ONT R10.4.1 30x 746s 18.2G 125s 48.1G
HG002 ONT R10.4.1 40x 1308s 21.5G 292s 55.8G
HG002 ONT R10.4.1 50x 1570s 25.0G 317s 68.8G
HG002 ONT R10.4.1 60x 1454s 28.4G 248s 84.0G

Changes

  1. Makefile Adjustments

    • Added -fopenmp flag in CPPFLAGS to enable OpenMP support, which allows for efficient multi-threading in the C++ components.
  2. Modifications in ParsingBam.cpp, ParsingBam.h

    • Introduced an additional parameter int &numThreads in the function direct_detect_alleles. This change allows for dynamic allocation of threads based on the processing requirements, improving the handling of multi-threaded operations.
  3. Updates in Phasing.cpp

    • Modified the default value of threads argument to 0. This change signifies that, by default, the program will utilize all available threads, optimizing resource usage.
  4. Major Refactoring in PhasingProcess.cpp

    • Implemented a new function setNumThreads for intelligent distribution of threads between chromosome processing and BAM parsing, enhancing parallel processing efficiency.
    • Established a ChrPhasingResult map to handle phasing results in a thread-safe manner.
    • Merged individual chromosome phasing results into a single mergedPhasingResult, streamlining the result aggregation process.
  5. Adjustments to Modcall Output

    • Added the --all parameter to output all detected modifications in reads. default false.
    • Homozygous variants only recorded MD, UD, and DP counts. The read names covering the variant will not be recorded.
    • High-confidence heterozygous modifications will be recorded as PASS in the FILTER field.
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chr1    11027   .       C       N       .       .       RS=P;   GT:MD:UD:DP     1/1:5:5:16
chr1    11028   .       G       N       .       .       RS=N;   GT:MD:UD:DP     0/0:4:8:12
chr1    11083   .       G       N       .       .       RS=N;   GT:MD:UD:DP     0/1:4:6:12
chr1    11434   .       C       N       .       PASS    RS=P;MR=eb459876-8c81-4714-a496-a90ea8be94d2,6ca3a71f-62fd-416e-8c6e-8c4a9c054e1a,0d8b7c68-d98d-4045-a572-82fedac62da5,71b5dfe9-7cd1-4959-b634-b9d162468edb,5db6fcd3-5780-494b-bfdd-4d9d7282d012,6e205f04-560a-4975-932d-dbd60ead695d,ae2238f8-b622-4cb5-8f02-4c3f54ab8ca3,d7ef87d0-bffe-404d-9f10-62eceb0c5121;NR=8249c7c7-fd04-4a4c-985d-2bbcb2030bc4,e3b03dc0-8399-4e1e-bd0d-e5e4e3e2911a,b8dc7af8-9f78-4dac-b8cc-35e157c51621,0b2638c1-8380-48b4-b08c-a6161495ad9d,7b1e5c16-f0a6-47f6-9726-247c762a10ca,ac7ae685-082e-413e-b5a6-b4c12d49a1c2,c0e0c526-e193-4ee2-81d6-1fbe0c970dc1;  GT:MD:UD:DP     0/1:8:7:16
  1. Add the weight to the edge which connect to the low quality base

    • The weight of the edges connects to the low-quality base (base-quality>=12 ) will change from 1 to 0.1. The data structures used to count the read amounts are change from int to float.
  2. MethFastaParser Utilizing New Structure:

    • Revised the storage structure of references fasta to include chromosome length information, facilitating chromosome processing in the correct numerical order (chr1, chr2, chr3) instead of lexicographical order (chr1, chr11, chr12).
    • This change not only eliminates the need to recalculate chromosome lengths but also enhances execution efficiency in a multithreaded environment.
  3. Modifications in MethBamParser:

    • Introduced an additional parameter int numThreads in the function detectMeth. This change allows for dynamic allocation of threads based on the processing requirements, improving the handling of multi-threaded operations.
  4. Thread Safety Measures:

    • Split the writeResultVCF function into two parts: exportResult and writeResultVCF.
    • exportResult: Handles the processing results for each chromosome, preparing data for VCF file writing.
    • writeResultVCF: Tasked with the actual writing of data into the VCF file, ensuring the integrity and sequentiality of output.
  5. Changes in ModCallProcess:

    • New Function - setModcallNumThreads :Implemented to intelligently allocate threads between chromosome processing and BAM parsing tasks.
  6. Included jemalloc as a dependency in the build configuration.

  7. The phase command will display the values of each parameter and adjust the output messages

  8. In the phasingProcess, the storeResultPath() step has been removed, and phasing results are now directly recorded in edgeConnectResult().

twolinin and others added 29 commits November 29, 2023 09:15
Add .gitignore and Remove Temporary Build Files from htslib
Adjustments to multithread settings and additional comments.
Modcall output all modification
Add the weight to the edge which connect to the low quality base
Remove redundant parameters. Remove storeResultPath(). Output phasing…
@twolinin twolinin changed the title Develop LongPhase 1.6 Release Notes Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants