Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-trim-assemble option not working in 2.1 release (-untrimmed is OK workaround) #1781

Closed
skoren opened this issue Aug 28, 2020 · 3 comments
Closed
Assignees
Labels

Comments

@skoren
Copy link
Member

skoren commented Aug 28, 2020

Release 2.1 doesn't work with -trim-assemble and HiFi data on the grid, trimming is skipped prematurely. Using -untrimmed works as expected. Here are the commands:

canu -untrimmed -p asm -d test2 useGrid=true genomeSize=4.8m -pacbio-hifi m54316_180808_005743.fastq.gz
canu -trim-assemble -p asm -d test3 useGrid=true genomeSize=4.8m -pacbio-hifi m54316_180808_005743.fastq.gz

and the output of both:

% ls test2
asm.report  asm.seqStore  asm.seqStore.err  asm.seqStore.sh  canu-logs  canu.out  canu-scripts  trimming
% ls test2/trimming
0-mercounts  1-overlapper
% ls test3
asm.report  asm.seqStore  asm.seqStore.err  asm.seqStore.sh  canu-logs  canu.out  canu-scripts  trimming  unitigging
% ls test3/trimming
0-mercounts

Note how test3 already has a unitigging folder yet it never made trimmed reads nor ran overlapped in trimming. The issue seems to be that when it checks the store it detects trimmed reads, the canu-scripts/canu.02.out from the working run:

-- In 'asm.seqStore', found PacBio HiFi reads:
--   PacBio HiFi:              1
--
--   Corrected:                1
--
-- Generating assembly 'asm' in '/vf/users/korens/test/regression/test2':
--    - trim corrected reads.
--    - assemble corrected and trimmed reads.
--
-- Parameters:
--
--  genomeSize        4800000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.0000 (  0.00%)
--    obtOvlErrorRate 0.0250 (  2.50%)
--    utgOvlErrorRate 0.0100 (  1.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.0000 (  0.00%)
--    obtErrorRate    0.0250 (  2.50%)
--    utgErrorRate    0.0100 (  1.00%)
--    cnsErrorRate    0.0500 (  5.00%)
--
--
-- BEGIN TRIMMING
--
-- Found 1 Kmer counting (meryl) outputs.
-- Finished stage 'obt-merylCountCheck', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
--
-- 'meryl-process.jobSubmit-01.sh' -> job 63808692 task 1.
--
----------------------------------------

and the failed run:

-- In 'asm.seqStore', found PacBio HiFi reads:
--   PacBio HiFi:              1
--
--   Corrected:                1
--   Corrected and Trimmed:    1
--
-- Generating assembly 'asm' in '/vf/users/korens/test/regression/test3':
--    - trim corrected reads.
--    - assemble corrected and trimmed reads.
--
-- Parameters:
--
--  genomeSize        4800000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.0000 (  0.00%)
--    obtOvlErrorRate 0.0250 (  2.50%)
--    utgOvlErrorRate 0.0100 (  1.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.0000 (  0.00%)
--    obtErrorRate    0.0250 (  2.50%)
--    utgErrorRate    0.0100 (  1.00%)
--    cnsErrorRate    0.0500 (  5.00%)
--
--
-- BEGIN ASSEMBLY
--
----------------------------------------
-- Starting command on Fri Aug 28 15:14:27 2020 with 2290235.793 GB free disk space

    cd unitigging/0-mercounts
    ./meryl-configure.sh \
    > ./meryl-configure.err 2>&1

-- Finished on Fri Aug 28 15:14:27 2020 (furiously fast) with 2290235.793 GB free disk space
@brianwalenz
Copy link
Member

-untrimmed, grid=true

[canu.01.out]
    82 -- Found untrimmed raw PacBio HiFi reads in the input files.

    99 --   Stages to run:
   100 --     trim corrected reads.
   101 --     assemble corrected and trimmed reads.
   102 --
   103 --
   104 -- BEGIN STORE CREATION
   105 ----------------------------------------
   106 -- Starting command on Wed Jul 17 15:05:27 2024 with 846.188 GB free disk space
   107 
   108     cd .
   109     ./test.seqStore.sh \
   110     > ./test.seqStore.err 2>&1
   111 
   112 -- Finished on Wed Jul 17 15:06:38 2024 (71 seconds) with 844.965 GB free disk space

   165 -- Correction skipped; not enabled.
   166 --
   167 -- BEGIN TRIMMING
   168 ----------------------------------------
   169 -- Starting command on Wed Jul 17 15:06:39 2024 with 844.965 GB free disk space
   170 
   171     cd trimming/0-mercounts
   172     ./meryl-configure.sh \
   173     > ./meryl-configure.err 2>&1
[and then it submits meryl count jobs]
[canu.02.out]
   104 --   Stages to run:
   105 --     trim corrected reads.
   106 --     assemble corrected and trimmed reads.
   107 --
   108 --
   109 -- Correction skipped; not enabled.
   110 --
   111 -- BEGIN TRIMMING
   112 -- Found 1 Kmer counting (meryl) outputs.
   113 -- Finished stage 'obt-merylCountCheck', reset canuIteration.
   114 --
   115 -- Running jobs.  First attempt out of 2.

-untrimmed, grid=false, does the same steps.

@brianwalenz
Copy link
Member

-trim-assemble, grid=true

[canu.01.out]
    82 -- Found trimmed raw PacBio HiFi reads in the input files.

    99 --   Stages to run:
   100 --     trim corrected reads.
   101 --     assemble corrected and trimmed reads.
   102 --
   103 --
   104 -- BEGIN STORE CREATION
   105 ----------------------------------------
   106 -- Starting command on Wed Jul 17 15:05:32 2024 with 846.184 GB free disk space
   107 
   108     cd .
   109     ./test.seqStore.sh \
   110     > ./test.seqStore.err 2>&1
   111 
   112 -- Finished on Wed Jul 17 15:06:42 2024 (70 seconds) with 844.862 GB free disk space

   118 --    Histogram of corrected reads:
[omitted]
   168 --    Histogram of corrected-trimmed reads:
[omitted]

   215 -- Correction skipped; not enabled.
   216 --
   217 -- Trimming skipped; trimmed reads exist in test.seqStore.
   218 --
   219 -- BEGIN ASSEMBLY
   220 ----------------------------------------
   221 -- Starting command on Wed Jul 17 15:06:45 2024 with 844.862 GB free disk space
   222 
   223     cd unitigging/0-mercounts
   224     ./meryl-configure.sh \
   225     > ./meryl-configure.err 2>&1
[and count jobs submitted]
[canu.02.out]
   110 -- Correction skipped; not enabled.
   111 --
   112 -- Trimming skipped; trimmed reads exist in test.seqStore.
   113 --
   114 -- BEGIN ASSEMBLY
   115 -- Found 1 Kmer counting (meryl) outputs.
   116 -- Finished stage 'utg-merylCountCheck', reset canuIteration.
   117 --
   118 -- Running jobs.  First attempt out of 2.
   119 --
   120 -- 'meryl-process.jobSubmit-01.sh' -> job 318418 task 1.

-trim-assemble, grid=false, does the same steps.

@brianwalenz
Copy link
Member

canu -untrimmed     -p test -d untrimmedgrid  useGrid=true  genomeSize=4.8m -pacbio-hifi m54316_180808_005743.fastq.xz
canu -trim-assemble -p test -d trimassemgrid  useGrid=true  genomeSize=4.8m -pacbio-hifi m54316_180808_005743.fastq.xz

time canu -untrimmed     -p test -d untrimmedlocal useGrid=false genomeSize=4.8m -pacbio-hifi m54316_180808_005743.fastq.xz >& untrimmedlocal.err
time canu -trim-assemble -p test -d trimassemlocal useGrid=false genomeSize=4.8m -pacbio-hifi m54316_180808_005743.fastq.xz >& trimassemlocal.err

brianwalenz added a commit that referenced this issue Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants