-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Final assembly too small #80
Comments
@adigenova meant to mention you in this, if you have time |
Hi Nick, Can you share the N50 of the shot-read assembly? Did you ran WenganM? Can you try wenganA or WenganD? A similar issue the #54 discuss other ideas but in general this might happen when the short read assemblies are extremely fragmented (n50 <500bp). Best |
Hi @adigenova, Yes I ran WenganM. I just tried WenganD but received the error make: *** [HM_wenganD.mk:5: HMwenganD.contigs.disco.fa] Error 1 - I will look into what this error means. I am trying WenganA now and will report back. |
Hello again, WenganD gives the error make: *** [HM_wenganD.mk:5: HMwenganD.contigs.disco.fa] Error 1 but I could not figure out how to solve this. |
@adigenova do you know if I can fix the WenganD issue? I cannot figure out how to rebuild so that wenganD works. Apologies if you are on holiday, and please do not rush if you are. WenganA worked better than any other assembly software I tried, so just want to see if D will work even better! Thank you and happy new year. |
I'm not a developer and this is very late; but, have you tried using jellyfish with genomescope to predict the genome size based on the read k-mer histogram? If this size prediction is low with the nanopore reads specifically, the short assembly can be from having fragmentation caused by the nanopore reads since from what I understand, the long reads are used like scaffolds that verify the short reads contigs. So even if the short reads are complete they will end up being limited by the continuity of the long read contig. Again, not a developer. |
Hello,
I ran wenganM using ~60X Illumina coverage and ~10X Nanopore coverage (I know higher depth than this is best but it is what I have), for an expected 2Gb mollusc genome. Wengan finished in <24 hours with no errors that I can see, but the final assembly file, which I think is the SPolished.asm.wengan.fasta file is only about 500Mb, so only 25% of what we expect the genome size to be.
Are there any parts of the log files I can look at to see why the assembly might be smaller than expected? The last few lines of the liger.log are pasted below if they are helpful.
Hits at edge level:
A total of 2 ctgs were selected for polishing from 112 canditates
HIT: cid=199594 eid=1213074 strand=0 qs=32 qe=684 rs=1920 re=2915 cnt=30 mlen=471 blen=1176 min_iden=0.800000
TOTALW=23 GOODW=17 BADW=6 CIGARW=MMMMMSMMMSSSMMMMMMMSSMM SW=63
TOTALW=23 GOODW=17 BADW=6 CIGARC=MMMMMSMMMSSSMMMMMMMSSMM SW=63 B=0 END=22
qs=0 qe=653 qc=100 ts=1919 te=2554 iden=0.870053
HIT: cid=1101547 eid=1213074 strand=1 qs=69 qe=364 rs=1655 re=2555 cnt=28 mlen=834 blen=6771 min_iden=0.800000
TOTALW=11 GOODW=11 BADW=0 CIGARW=MMMMMMMMMMM SW=76
TOTALW=11 GOODW=11 BADW=0 CIGARC=MMMMMMMMMMM SW=76 B=0 END=10
qs=0 qe=296 qc=100 ts=2290 te=2587 iden=0.929277
Time spent in polishing edges :52.4836 secs
Number of CC 1161897
HM_wengan.SPolished.asm.wengan.fasta file created
The text was updated successfully, but these errors were encountered: