Final assembly too small #80

NickJeff13 · 2023-12-07T13:13:16Z

Hello,
I ran wenganM using ~60X Illumina coverage and ~10X Nanopore coverage (I know higher depth than this is best but it is what I have), for an expected 2Gb mollusc genome. Wengan finished in <24 hours with no errors that I can see, but the final assembly file, which I think is the SPolished.asm.wengan.fasta file is only about 500Mb, so only 25% of what we expect the genome size to be.

Are there any parts of the log files I can look at to see why the assembly might be smaller than expected? The last few lines of the liger.log are pasted below if they are helpful.

Hits at edge level:
A total of 2 ctgs were selected for polishing from 112 canditates
HIT: cid=199594 eid=1213074 strand=0 qs=32 qe=684 rs=1920 re=2915 cnt=30 mlen=471 blen=1176 min_iden=0.800000
TOTALW=23 GOODW=17 BADW=6 CIGARW=MMMMMSMMMSSSMMMMMMMSSMM SW=63
TOTALW=23 GOODW=17 BADW=6 CIGARC=MMMMMSMMMSSSMMMMMMMSSMM SW=63 B=0 END=22
qs=0 qe=653 qc=100 ts=1919 te=2554 iden=0.870053
HIT: cid=1101547 eid=1213074 strand=1 qs=69 qe=364 rs=1655 re=2555 cnt=28 mlen=834 blen=6771 min_iden=0.800000
TOTALW=11 GOODW=11 BADW=0 CIGARW=MMMMMMMMMMM SW=76
TOTALW=11 GOODW=11 BADW=0 CIGARC=MMMMMMMMMMM SW=76 B=0 END=10
qs=0 qe=296 qc=100 ts=2290 te=2587 iden=0.929277
Time spent in polishing edges :52.4836 secs
Number of CC 1161897
HM_wengan.SPolished.asm.wengan.fasta file created

NickJeff13 · 2023-12-08T17:25:14Z

@adigenova meant to mention you in this, if you have time

adigenova · 2023-12-08T21:17:27Z

Hi Nick,

Can you share the N50 of the shot-read assembly? Did you ran WenganM? Can you try wenganA or WenganD? A similar issue the #54 discuss other ideas but in general this might happen when the short read assemblies are extremely fragmented (n50 <500bp).

Best
Alex

NickJeff13 · 2023-12-11T15:43:41Z

Hi @adigenova,

Yes I ran WenganM. I just tried WenganD but received the error make: *** [HM_wenganD.mk:5: HMwenganD.contigs.disco.fa] Error 1 - I will look into what this error means.

I am trying WenganA now and will report back.

NickJeff13 · 2023-12-14T13:22:53Z

Hello again,
WenganA performed better with my data, with an assembly size of ~1.2Gb but still quite small vs my expected ~2Gb genome size estimated with kmers and flow cytometry. Is the assembly N50 in any of the log files from Wengan?

WenganD gives the error make: *** [HM_wenganD.mk:5: HMwenganD.contigs.disco.fa] Error 1 but I could not figure out how to solve this.

NickJeff13 · 2023-12-29T19:53:35Z

@adigenova do you know if I can fix the WenganD issue? I cannot figure out how to rebuild so that wenganD works. Apologies if you are on holiday, and please do not rush if you are. WenganA worked better than any other assembly software I tried, so just want to see if D will work even better!

Thank you and happy new year.

Artifice120 · 2024-10-10T19:45:58Z

I'm not a developer and this is very late; but,

have you tried using jellyfish with genomescope to predict the genome size based on the read k-mer histogram?

If this size prediction is low with the nanopore reads specifically, the short assembly can be from having fragmentation caused by the nanopore reads since from what I understand, the long reads are used like scaffolds that verify the short reads contigs. So even if the short reads are complete they will end up being limited by the continuity of the long read contig.

Again, not a developer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final assembly too small #80

Final assembly too small #80

NickJeff13 commented Dec 7, 2023

NickJeff13 commented Dec 8, 2023

adigenova commented Dec 8, 2023

NickJeff13 commented Dec 11, 2023

NickJeff13 commented Dec 14, 2023

NickJeff13 commented Dec 29, 2023

Artifice120 commented Oct 10, 2024

Final assembly too small #80

Final assembly too small #80

Comments

NickJeff13 commented Dec 7, 2023

NickJeff13 commented Dec 8, 2023

adigenova commented Dec 8, 2023

NickJeff13 commented Dec 11, 2023

NickJeff13 commented Dec 14, 2023

NickJeff13 commented Dec 29, 2023

Artifice120 commented Oct 10, 2024