Description
Hi, I am using WHAMG to perform CNV analysis on individuals of a population.
Using code:
whamg -f BAM -a GRCh38.fa -e ${list} -x 10 2> err_file > vcf
After calling for each individual, the following filter was used:
- Removes CNVs less than 50bp or greater than 1Mb.
- Removes DELs that don't have "TF" greater than 3.
- Removes DUPs that do not have "U" greater than 3.
- Removes SVs that have a max(CW)< 0.2.
After completing the filter, I found that there are about 60 (~3%) individuals with more DUPs than other individuals. For the other individuals, the average number of DUPs called in each individual is 280, but for these about 60 samples, 1000 to 410,000 DUPs can be called. Further inspection of the DUPs in these 60 individuals revealed that:
- In vcf, the CW scores of DUP and INV in the DUPs that are not shared between these individuals and other individuals are almost the same, and both scores are less than or equal to 0.5.
- These unshared DUPs are distributed in almost every region of the genome.
- Using IGV to view the BAM of these individuals, it was found that these individuals contained more split reads and discordant reads.

- In LUMPY, the number of DUPs in these individuals did not show abnormality.
In many previous articles using WHAMG, this situation did not occur in the call set, especially the order of magnitude difference in the number of SVs. I would like to ask you, do you think this situation is more likely to be caused by a problem with my calling, or is it caused by the characteristics of these BAMs themselves?