Skip to content

Commit

Permalink
Fail early if more than 4095 input files are supplied. Issue #1910.
Browse files Browse the repository at this point in the history
  • Loading branch information
brianwalenz committed Jul 21, 2024
1 parent 1449b1a commit 6eb6d2c
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 2 deletions.
5 changes: 3 additions & 2 deletions documentation/source/quick-start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ however, between 30x and 60x coverage is the recommended minimum. More coverage
longer reads for assembly, which will result in better assemblies.

Input sequences can be FASTA or FASTQ format, uncompressed or compressed with gzip (.gz), bzip2
(.bz2) or xz (.xz). Note that zip files (.zip) are not supported.
(.bz2) or xz (.xz). Note that zip files (.zip) are not supported. Up to 4,095 input files are
allowed.

Canu can resume incomplete assemblies, allowing for recovery from system outages or other abnormal
terminations. On each restart of Canu, it will examine the files in the assembly directory to
Expand Down Expand Up @@ -152,7 +153,7 @@ Trio binning does not yet support inputting PacBio HiFi reads for binning as the
Assembling With Multiple Technologies and Multiple Files
-------------------------------------------

Canu can use reads from any number of input files, which can be a mix of formats and technologies. Note that current combining PacBio HiFi data with other datatypes it not supported. We'll assemble a mix of 10X PacBio CLR reads in two FASTQ files and 10X of Nanopore reads in one FASTA
Canu can use reads from any number of input files (up to 4,095 in total), which can be a mix of formats and technologies. Note that current combining PacBio HiFi data with other datatypes it not supported. We'll assemble a mix of 10X PacBio CLR reads in two FASTQ files and 10X of Nanopore reads in one FASTA
file::

curl -L -o mix.tar.gz http://gembox.cbcb.umd.edu/mhap/raw/ecoliP6Oxford.tar.gz
Expand Down
11 changes: 11 additions & 0 deletions src/pipelines/canu.pl
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,17 @@

print STDERR "--\n";
print STDERR "-- Found $ct $rt reads in the input files.\n";

# If there are more than 4095 input files, sqStore fails as it needs to
# encode the (possibly unused anymore) read library id (equivalent to
# the input file number) in 12 bits. The best we can do is fail now -
# increasing the limit will result in an on-disk metadata change (see
# _libraryID in sqRead.H). Issue #1910.
#
if (scalar(@inputFiles >= 4096)) {
my $nf = scalar(@inputFiles);
caExit("ERROR: Too many input read files ($nf). Must be fewer than 4096", undef);
}
}

# Otherwise, no reads found in a store, and no input files.
Expand Down

0 comments on commit 6eb6d2c

Please sign in to comment.