Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to adjusting parameters to improve the assembly result #218

Open
linshengnan2020 opened this issue Oct 21, 2020 · 8 comments
Open

how to adjusting parameters to improve the assembly result #218

linshengnan2020 opened this issue Oct 21, 2020 · 8 comments

Comments

@linshengnan2020
Copy link

hi, I have run a 1G , diploid genome with 70~80% repetitive sequences genome . The coverage of my PacBio data is approximately 40 x. As a result the final assembly N50 is 32k . I would be interested how to adjust the parameters to improve the assembly result. Could you please give me some advises? Thank you very much!

@shanesturrock
Copy link

I'm dealing with a much larger genome (26Gbp) but with similar levels of repeats. It may seem counter intuitive but increasing the required overlap from the default of 2Kbp to 5kbp (using the -l flag) has helped my assemblies. I noticed when I mapped the raw reads back onto the raw.fa that there were a number of locations where the assembly was collapsing around repeats but by increasing the minimum overlap I was able to get rid of a lot of these and improve the overall length of the assembly at the cost of increasing the number of contigs. However, I'm going to scaffold at a later stage once I'm done with error correction and polishing so I should be able to improve things again then. Better to have more contigs without repeat regions being collapsed.

@ruanjue
Copy link
Owner

ruanjue commented Nov 27, 2020

The solution may includes -l, '-R -s' and '--aln-dovtail -1'.

@shanesturrock
Copy link

I've been using -p 21 -S 2 --aln-noskip --rescue-low-cov-edges --tidy-reads 5000 -l 5000 but I'm still tweaking and testing. The good thing is the turnaround time is really short due to how fast the program is so I can try different settings and investigate the effects.

@cement-head
Copy link

Is there a specific parameter that needs to be adjusted and/or input to wtdbg2 to specify the coverage depth? Or is that irrelevant for the programme to run correctly?

@ruanjue
Copy link
Owner

ruanjue commented Dec 9, 2020

Have a look at wtdbg2 --help, there are two relative options, --limit-input and -X.

@lifan18
Copy link

lifan18 commented Mar 15, 2021

Hi Prof. Ruan,

I also have this question with similar levels of repeats. As your advice, I added up "-l -R -s --aln-dovetail -1" in this first run.

I am trying to assembly it again. Hope it works.

Thank you!

@lifan18
Copy link

lifan18 commented Mar 22, 2021

Hi Prof. Ruan,

I also have this question with similar levels of repeats. As your advice, I added up "-l -R -s --aln-dovetail -1" in this first run.

I am trying to assembly it again. Hope it works.

Thank you!

Hi Prof. Ruan,

I tried to use the 4 parameters together, but I got a more bad result than I did not add up -l -R -s --aln-dovetail -1. Is any problem to add up the 4 parameters at the same time?

-t 96 -fo Species -l -R -s --tidy-reads 5000 --edge-min 3 --rescue-low-cov-edges --aln-dovetail -1

Hope ur reply.

Thank you very much!

Li Fan

@ruanjue
Copy link
Owner

ruanjue commented Mar 23, 2021

-R works at the step of generating alignments, --aln-dovetail works at the step of filtering alignments, and -s wokrs at both steps. So, you can use a loose -s together with -R at the first run, then --load-alignemnts and tune a better results with different parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants