Skip to content

Long reference sequences #1380

@SergejN

Description

@SergejN

Dear developers,

i found a bug while working with very long sequences (Axolotl). FastaSequenceFile::readSequence increases the size of the internal buffer if the number of bases read so far is equal to the array size (line 177):
if (sequenceLength == bases.length)
Although it is a memory-efficient approach, unfortunately, it runs into problems if the sequence length is even minimally longer than 2^30-1, since then the method tries to allocate an array with more than 2^31-1 elements, which results in the array size being negative. I would suggest to check if the current array size is 2^30 and increment the internal array in smaller steps (say final byte[] tmp = new byte[(int)(bases.length*1.1)] instead of final byte[] tmp = new byte[bases.length*2] or switch to a different data structure, which I imaging, would be quite tedious. As of now I was able to solve that problem as described above for my project, but I admit it's probably not the best solution.

Thanks!
Sergej

Metadata

Metadata

Assignees

No one assigned

    Labels

    buglong referenceBugs and issues related to long reference sequences.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions