Incredibly small assembly #54

wbrewer5 · 2021-09-30T14:51:32Z

I am trying to assemble a fungal (maybe some bacterial hitchhikers) genome with 12.16 GB zipped Nanopore sequencing and 12.5 GB zipped Illumina reads. The current issue is that my first assembly using Wengan is only 651Kb in length. I will post my code and output below. Which log files are helpful in finding the issue?

wengan.pl -x ontraw
-a M
-s pand2_fwd.fastq.gz,pand2_rev.fastq.gz
-l pandora_clean_nanopore.fastq.gz
-p pandora
-t 8
-g 110
pandora.liger.log

adigenova · 2021-10-05T10:33:30Z

HI,
Since the assembly finished, I think that one possible explanation for the smaller genome size obtained might be some sort of contamination in the short-read data. Have you checked if the assembled contigs are some sort of contamination (smaller genome)? Can you estimate the genome size with genomescope2.0 to have an idea if the Illumina reads are contaminated?

Best,
Alex

wbrewer5 · 2021-10-05T14:04:58Z

HI, Since the assembly finished, I think that one possible explanation for the smaller genome size obtained might be some sort of contamination in the short-read data. Have you checked if the assembled contigs are some sort of contamination (smaller genome)? Can you estimate the genome size with genomescope2.0 to have an idea if the Illumina reads are contaminated?

Best, Alex

The largest contig is 94Kb and the next largest is in the 20Kb range. There are bacteria inside the fungal sample, but I expect at least 3Mb for those genomes. genomescope2.0 estimates 24Mb, which corresponds to my haslr assembly. I have not used this genome size estimator before so I will include the output for the illumina forward reads to see what you think.

wbrewer5 · 2021-10-05T14:05:06Z

property min max
Homozygous (aa) 94.7525% 95.8394%
Heterozygous (ab) 4.16056% 5.24754%
Genome Haploid Length 24,171,583 bp 25,342,343 bp
Genome Repeat Length 20,465,289 bp 21,456,534 bp
Genome Unique Length 3,706,294 bp 3,885,809 bp
Model Fit 26.256% 85.5258%
Read Error Rate 8.30334% 8.30334%

adigenova · 2021-10-06T11:40:50Z

Hi,
it seems that the genome size estimation from short reads is correct, can you share the stats of the short-read assembly?
I recommend to run WenganD, which usually generates better assembly results than WenganA and WenganM.
Best,
Alex

wbrewer5 · 2021-10-07T17:28:58Z

pandora.minia.41.log
pandora.minia.81.log
pandora.minia.121.log

I do not have access to a computer with enough memory to run WenganD at the moment. Our compute cluster is moving to a new scheduler soon and we are being advised to wait until after the transition before requesting new software.

adigenova · 2021-10-26T23:52:24Z

I see in the logs that minia generated 4.8 Million contigs with a total assembly length of 1.8Gb, from this numbers I can conclude that the minia assembly is extremely fragmented (Average contig length of 400bp), by default Wengan discard contigs shorter than 500bp, moreover contigs larger than 2kb are used to build the assembly backbone. Although you can modify the 2kb parameter(-M 2000), the minimum recommended is 1kb (-M 1000) but I think that will be not enough if you assemble the short reads with minia. Then, most contigs are being discarded due to these length constraints, you end up with a much shorter assembly. My recommendation is to try WenganA or WenganD, as your genome is not large, WenganD might be able to finish in a machine with 50-60Gb RAM.
Best,

Alex

wbrewer5 · 2021-11-03T18:00:43Z

I switched to my institution's compute cluster to use WenganD. Are you familiar with this error message?

gzip: stdout: Broken pipe

Below is my submission script.

/lustre/haven/user/wbrewer5/wengan/wengan.pl -x ontraw -a D
-s /lustre/haven/user/wbrewer5/pandora/assembly/fastq/zipped/Pand2_fwd.fastq.gz,/lustre/haven/user/wbrewer5/pandora/assembly/fastq/zipped/Pand2_rev.fastq.gz
-l /lustre/haven/user/wbrewer5/pandora/assembly/fastq/zipped/pandora_clean_nanopore.fastq.gz
-p pandora
-t 24
-g 20

adigenova · 2021-11-09T23:56:10Z

Well, the message is not very informative. Perhaps the job was killed?
Best,
Alex

adigenova closed this as completed Nov 26, 2021

adigenova mentioned this issue Dec 8, 2023

Final assembly too small #80

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incredibly small assembly #54

Incredibly small assembly #54

wbrewer5 commented Sep 30, 2021

adigenova commented Oct 5, 2021

wbrewer5 commented Oct 5, 2021

wbrewer5 commented Oct 5, 2021

adigenova commented Oct 6, 2021

wbrewer5 commented Oct 7, 2021

adigenova commented Oct 26, 2021

wbrewer5 commented Nov 3, 2021

adigenova commented Nov 9, 2021

Incredibly small assembly #54

Incredibly small assembly #54

Comments

wbrewer5 commented Sep 30, 2021

adigenova commented Oct 5, 2021

wbrewer5 commented Oct 5, 2021

wbrewer5 commented Oct 5, 2021

adigenova commented Oct 6, 2021

wbrewer5 commented Oct 7, 2021

adigenova commented Oct 26, 2021

wbrewer5 commented Nov 3, 2021

adigenova commented Nov 9, 2021