v1.4.0 : no tmp .bout files #18

Proginski · 2023-09-26T15:19:34Z

Dear genEra developers,

Describe the bug
The CDS of A thaliana I am using, won't be dated.
I already succeded using genEra v1.4.0 with a subset of H sapiens' CDS
Now using the enclosed fasta, even when providing 500Go RAM for 262Go of results, it does not work.
Notice that I performed the same analysis (same command) with v1.2.0 and it went perfectly fine (except it took longer of course).
I would have bet the problem is caused by the "|" character in the middle of the CDS name, but it worked with the previous version.

To Reproduce
Steps to reproduce the behaviour, e.g.

genEra \
-t 3702\
-q CDS/cds_from_genomic.faa \
-b /diamonddb/NR_DB/nr \
-n 75 \
-r ncbi_lineages_2023-07-12.csv

Expected behaviour
The ages are not assigned :
#gene phylostratum rank taxonomic_representativeness
lcl|NC_000932.1_cds_NP_051037.1_48181 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051038.1_48226 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051039.1_48182 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051040.2_48183 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051041.1_48184 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051042.1_48185 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051043.1_48186 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051044.1_48187 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051045.1_48188 Absent from the DIAMOND/MMseqs2 results NA NA

Screenshots or code
Here are the last lines of the err file (16 Mo of similar 'No such file or directory' lines)

awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_001321941.1_644.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_177334.1_10947.bout': No such file or directory
awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_565027.1_10948.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003071.7_cds_NP_001323584.1_19320.bout': No such file or directory
.................................................. 1M
.................................................. 2M
.................................................. 3M
.................................................. 4M
...
[mclIO] writing </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
.......................................
[mclIO] wrote native interchange 48227x48227 matrix with 4144755 entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
[mclIO] wrote 48227 tab entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.tab>
[mcxload] tab has 48227 entries
[mclIO] reading </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mcl>
.......................................
[mclIO] read native interchange 48227x8569 matrix with 48227 entries

Session info:

singularity build --fakeroot genEra_intraS.sif docker://josuebarrera/genera
cds_from_genomic.tar.gz

Paul

The text was updated successfully, but these errors were encountered:

josuebarrera · 2023-09-27T15:17:39Z

Dear Paul,

Thanks for reaching out! You are right, the new script for faster gene age assignment seems to mistake the "|" characters in the FASTA headers with column separators, leading to errors. We'll start working on a solution throughout the weekend, but I think it should be fairly easy to fix.

Cheers,
Josué.

josuebarrera · 2023-10-04T07:47:45Z

Dear Paul,

@RocesV just fixed the issue with the fast headers containing | characters. Please download the newest version of FASTSTEP3R and let me know if this fixed your problem.

Best,
Josué.

Proginski · 2023-10-05T08:57:43Z

Thanks a lot !

Paul

Proginski added the bug Something isn't working label Sep 26, 2023

RocesV mentioned this issue Oct 2, 2023

JGI like ID symbols fix #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.4.0 : no tmp .bout files #18

v1.4.0 : no tmp .bout files #18

Proginski commented Sep 26, 2023 •

edited

Loading

josuebarrera commented Sep 27, 2023

josuebarrera commented Oct 4, 2023

Proginski commented Oct 5, 2023

v1.4.0 : no tmp .bout files #18

v1.4.0 : no tmp .bout files #18

Comments

Proginski commented Sep 26, 2023 • edited Loading

josuebarrera commented Sep 27, 2023

josuebarrera commented Oct 4, 2023

Proginski commented Oct 5, 2023

Proginski commented Sep 26, 2023 •

edited

Loading