-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hi there,
I'm currently testing out metaDMG-cpp to replace my earlier workflows using metaDMG core. When I ran metaDMG core, I got the expected results — the damage vs. significance plots closely resembled those in the original bioRxiv publication, which was great.
I'm now running metaDMG-cpp to get both damage estimates and LCA taxonomic information. Here's the loop I'm using (based on what I understood from the GitHub instructions):
for sample in $samples; do bname=$(basename $sample .merged.sam.gz) ~/software/metaDMG-cpp/metaDMG-cpp lca --threads $threads --bam $sample --names $nam --nodes $nod --acc2tax $acc --sim_score_low 0.95 --sim_score_high 1.0 --weight_type 1 --fix_ncbi 0 --out_prefix "$run"_"$bname"_lca ~/software/metaDMG-cpp/metaDMG-cpp dfit "$run"_"$bname"_lca.bdamage.gz --names $nam --nodes $nod --showfits 0 ~/software/metaDMG-cpp/metaDMG-cpp aggregate "$run"_"$bname"_lca.bdamage.gz --lcastat "$run"_"$bname"_lca.stat.gz --names $nam --nodes $nod --dfit "$run"_"$bname"_lca.bdamage.gz.dfit.gz --out "$run"_"$bname" done
Issue 1: Confusion between getdamage and lca .bdamage.gz outputs
I also tried running:
~/software/metaDMG-cpp/metaDMG-cpp getdamage --threads $threads --bam $sample --out_prefix "$run"_"$bname"_damage
This also produces a .bdamage.gz file, but it's very different from the one generated by lca. The getdamage output contains only a single line of data, whereas the lca-based .bdamage.gz has 100,000+ entries.
Could you clarify the intended difference between these two outputs? Should they both be used together, or is getdamage a legacy alternative to lca?
Issue 2: No Dfit or damage values appearing / nonsensical values
I'm trying to extract and visualize damage and Dfit values from the output. Since metaDMG-cpp doesn’t seem to explicitly return Dfit, I calculated it myself as Dfit = Zfit * sigmaD. That approximation aligns well with values in the metaDMG core output (small rounding errors of ≤ 3e-8). However, in the metaDMG-cpp output:
Most taxa have Dfit = 0
Some have astronomically large Zfit values (e.g., Zfit = 45,000), and Dfit = 1
These values don’t look correct, and they don't resemble the cleaner distributions from metaDMG core.
Questions:
-
What’s the functional difference between .bdamage.gz from getdamage and from lca? Which should I input into the dfit and aggregate functions?
-
Should the .bdamage.gz from lca already contain the correct fields to calculate Dfit, or am I missing a key step (e.g., different parameters or postprocessing)?
-
Why are the Dfit and Zfit values so different from my metaDMG-core results, and so extreme?
Would be very appreciative if someone could show me what I'm missing here, this tool is super promising and I’d love to get it working as expected!
Best,
Libby
Here's and example of the damage x significance output from metaDMG core that we were very happy with

And a subset of those run with metaDMGcpp that doesn't look right:
