You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The CDS of A thaliana I am using, won't be dated.
I already succeded using genEra v1.4.0 with a subset of H sapiens' CDS
Now using the enclosed fasta, even when providing 500Go RAM for 262Go of results, it does not work.
Notice that I performed the same analysis (same command) with v1.2.0 and it went perfectly fine (except it took longer of course).
I would have bet the problem is caused by the "|" character in the middle of the CDS name, but it worked with the previous version.
To Reproduce
Steps to reproduce the behaviour, e.g.
Expected behaviour
The ages are not assigned :
#gene phylostratum rank taxonomic_representativeness
lcl|NC_000932.1_cds_NP_051037.1_48181 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051038.1_48226 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051039.1_48182 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051040.2_48183 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051041.1_48184 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051042.1_48185 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051043.1_48186 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051044.1_48187 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051045.1_48188 Absent from the DIAMOND/MMseqs2 results NA NA
Screenshots or code
Here are the last lines of the err file (16 Mo of similar 'No such file or directory' lines)
awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_001321941.1_644.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_177334.1_10947.bout': No such file or directory
awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_565027.1_10948.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003071.7_cds_NP_001323584.1_19320.bout': No such file or directory
.................................................. 1M
.................................................. 2M
.................................................. 3M
.................................................. 4M
...
[mclIO] writing </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
.......................................
[mclIO] wrote native interchange 48227x48227 matrix with 4144755 entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
[mclIO] wrote 48227 tab entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.tab>
[mcxload] tab has 48227 entries
[mclIO] reading </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mcl>
.......................................
[mclIO] read native interchange 48227x8569 matrix with 48227 entries
Thanks for reaching out! You are right, the new script for faster gene age assignment seems to mistake the "|" characters in the FASTA headers with column separators, leading to errors. We'll start working on a solution throughout the weekend, but I think it should be fairly easy to fix.
@RocesV just fixed the issue with the fast headers containing | characters. Please download the newest version of FASTSTEP3R and let me know if this fixed your problem.
Dear genEra developers,
Describe the bug
The CDS of A thaliana I am using, won't be dated.
I already succeded using genEra v1.4.0 with a subset of H sapiens' CDS
Now using the enclosed fasta, even when providing 500Go RAM for 262Go of results, it does not work.
Notice that I performed the same analysis (same command) with v1.2.0 and it went perfectly fine (except it took longer of course).
I would have bet the problem is caused by the "|" character in the middle of the CDS name, but it worked with the previous version.
To Reproduce
Steps to reproduce the behaviour, e.g.
Expected behaviour
The ages are not assigned :
#gene phylostratum rank taxonomic_representativeness
lcl|NC_000932.1_cds_NP_051037.1_48181 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051038.1_48226 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051039.1_48182 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051040.2_48183 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051041.1_48184 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051042.1_48185 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051043.1_48186 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051044.1_48187 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051045.1_48188 Absent from the DIAMOND/MMseqs2 results NA NA
Screenshots or code
Here are the last lines of the err file (16 Mo of similar 'No such file or directory' lines)
awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_001321941.1_644.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_177334.1_10947.bout': No such file or directory
awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_565027.1_10948.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003071.7_cds_NP_001323584.1_19320.bout': No such file or directory
.................................................. 1M
.................................................. 2M
.................................................. 3M
.................................................. 4M
...
[mclIO] writing </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
.......................................
[mclIO] wrote native interchange 48227x48227 matrix with 4144755 entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
[mclIO] wrote 48227 tab entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.tab>
[mcxload] tab has 48227 entries
[mclIO] reading </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mcl>
.......................................
[mclIO] read native interchange 48227x8569 matrix with 48227 entries
Session info:
cds_from_genomic.tar.gz
Paul
The text was updated successfully, but these errors were encountered: