Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final assembly too small #80

Open
NickJeff13 opened this issue Dec 7, 2023 · 6 comments
Open

Final assembly too small #80

NickJeff13 opened this issue Dec 7, 2023 · 6 comments

Comments

@NickJeff13
Copy link

Hello,
I ran wenganM using ~60X Illumina coverage and ~10X Nanopore coverage (I know higher depth than this is best but it is what I have), for an expected 2Gb mollusc genome. Wengan finished in <24 hours with no errors that I can see, but the final assembly file, which I think is the SPolished.asm.wengan.fasta file is only about 500Mb, so only 25% of what we expect the genome size to be.

Are there any parts of the log files I can look at to see why the assembly might be smaller than expected? The last few lines of the liger.log are pasted below if they are helpful.

Hits at edge level:
A total of 2 ctgs were selected for polishing from 112 canditates
HIT: cid=199594 eid=1213074 strand=0 qs=32 qe=684 rs=1920 re=2915 cnt=30 mlen=471 blen=1176 min_iden=0.800000
TOTALW=23 GOODW=17 BADW=6 CIGARW=MMMMMSMMMSSSMMMMMMMSSMM SW=63
TOTALW=23 GOODW=17 BADW=6 CIGARC=MMMMMSMMMSSSMMMMMMMSSMM SW=63 B=0 END=22
qs=0 qe=653 qc=100 ts=1919 te=2554 iden=0.870053
HIT: cid=1101547 eid=1213074 strand=1 qs=69 qe=364 rs=1655 re=2555 cnt=28 mlen=834 blen=6771 min_iden=0.800000
TOTALW=11 GOODW=11 BADW=0 CIGARW=MMMMMMMMMMM SW=76
TOTALW=11 GOODW=11 BADW=0 CIGARC=MMMMMMMMMMM SW=76 B=0 END=10
qs=0 qe=296 qc=100 ts=2290 te=2587 iden=0.929277
Time spent in polishing edges :52.4836 secs
Number of CC 1161897
HM_wengan.SPolished.asm.wengan.fasta file created

@NickJeff13
Copy link
Author

@adigenova meant to mention you in this, if you have time

@adigenova
Copy link
Owner

Hi Nick,

Can you share the N50 of the shot-read assembly? Did you ran WenganM? Can you try wenganA or WenganD? A similar issue the #54 discuss other ideas but in general this might happen when the short read assemblies are extremely fragmented (n50 <500bp).

Best
Alex

@NickJeff13
Copy link
Author

Hi @adigenova,

Yes I ran WenganM. I just tried WenganD but received the error make: *** [HM_wenganD.mk:5: HMwenganD.contigs.disco.fa] Error 1 - I will look into what this error means.

I am trying WenganA now and will report back.

@NickJeff13
Copy link
Author

Hello again,
WenganA performed better with my data, with an assembly size of ~1.2Gb but still quite small vs my expected ~2Gb genome size estimated with kmers and flow cytometry. Is the assembly N50 in any of the log files from Wengan?

WenganD gives the error make: *** [HM_wenganD.mk:5: HMwenganD.contigs.disco.fa] Error 1 but I could not figure out how to solve this.

@NickJeff13
Copy link
Author

@adigenova do you know if I can fix the WenganD issue? I cannot figure out how to rebuild so that wenganD works. Apologies if you are on holiday, and please do not rush if you are. WenganA worked better than any other assembly software I tried, so just want to see if D will work even better!

Thank you and happy new year.

@Artifice120
Copy link

I'm not a developer and this is very late; but,

have you tried using jellyfish with genomescope to predict the genome size based on the read k-mer histogram?

If this size prediction is low with the nanopore reads specifically, the short assembly can be from having fragmentation caused by the nanopore reads since from what I understand, the long reads are used like scaffolds that verify the short reads contigs. So even if the short reads are complete they will end up being limited by the continuity of the long read contig.

Again, not a developer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants