Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

with TIR/Sola2 TE_Sorter chokes? #178

Closed
oliviamr opened this issue Apr 6, 2021 · 8 comments
Closed

with TIR/Sola2 TE_Sorter chokes? #178

oliviamr opened this issue Apr 6, 2021 · 8 comments
Labels
enhancement New feature or request

Comments

@oliviamr
Copy link

oliviamr commented Apr 6, 2021

Hi Shujun,

Thank you for EDTA it is a nice tool.

I have run EDTA on a large genome split in chunks. One of the chunk ran EDTA_raw.pl without an issue. Now when the last I am doing homology-based annotation of TEs I ran into this:

Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.
Use of uninitialized value in pattern match (m//) at /EDTA_p/EDTA/util/call_seq_by_list.pl line 88.
Use of uninitialized value $chr_pre in hash element at /EDTA_p/EDTA/util/call_seq_by_list.pl line 90.
Use of uninitialized value $pos in pattern match (m//) at /EDTA_p/EDTA/util/call_seq_by_list.pl line 100.
Use of uninitialized value $pos in concatenation (.) or string at /EDTA_p/EDTA/util/call_seq_by_list.pl line 103.
ERROR: Can not recognize this MSU position in the list!
ERROR: TE annotation stats results not found in genome.2.fasta.mod.EDTA.TE.fa.stat!

Any suggestions on how to overcome this?

@oliviamr
Copy link
Author

oliviamr commented Apr 6, 2021

should I do the same as per issue #151 ?

@oushujun
Copy link
Owner

oushujun commented Apr 7, 2021 via email

@oliviamr
Copy link
Author

Ok, more into this warning that leaves me slightly confused as when I ran the annotation (--anno 1 --step anno )

Apart from the warning above I get:
Use of uninitialized value $type in concatenation (.) or string at /TREES_2020/EDTA_p/EDTA/util/gff2bed.pl line 84, line
Which stops from obtaining a *.TEanno.sum file
Same as #171

Any suggestion to what to do?

@oushujun
Copy link
Owner

oushujun commented Jun 18, 2021 via email

@oushujun
Copy link
Owner

@oliviamr can you provide reproducible sample data for me to test with? Thanks! - Shujun

@oliviamr
Copy link
Author

Hi Shujun,

Thank you for following up. Let me put you in context:

Since I am doing a 10Gb> plant, I followed your recipe noted in here: #61 (comment)

The final step runs so that I get the folder *.fasta.mod.EDTA.anno, the files *.fasta.mod.MAKER.masked and *.fasta.mod.EDTA.TEanno.gff3 but the file *.mod.EDTA.TEanno.sum is empty throwing the error after using this --anno 1 --step anno
--> /TREES_2020/EDTA_p/EDTA/util/gff2bed.pl line 84

what is the best way to send you sample data? and what is a suitable sample size?

@oliviamr
Copy link
Author

Also question: is it important that CDS are high quality (no frameshifts included) for the annotation or somewhere else in the pipeline?

@oushujun
Copy link
Owner

@oliviamr yes, otherwise you may include too many TEs in the CDS and thus remove too many TEs with these CDS sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants