Extra large kmer #22

apaytuvi · 2022-10-21T08:57:08Z

Dear cuttlefish authors,

Thank you for this useful tool. I have a large database of genomes and I want to reduce the redundancy to reduce the computational time, improve speed and reduce RAM usage of a mapping against such a big database. I tried cuttlefish, and is useful but I would like a larger kmer, let's say, e.g. 1000. Why? Long-read technology requires long sequences for a correct mapping, but by setting low kmer lengths such as 127 most sequences remain that size, which is clearly not enough for long-read mapping. Do you have a suggestion for that?

Thanks,

jamshed · 2022-10-24T18:26:27Z

Hi @apaytuvi,

Thanks for using cuttlefish! I'll incorporate the capability of using extra-large k-mers into cuttlefish; but that might take a little time. In the meantime, I can try posting a hack in a separate branch for you to try it out. Would it work you?

Regards.

apaytuvi · 2022-10-25T06:14:43Z

That would be great. Thank you so much!

jamshed · 2022-10-30T20:08:46Z

Hi @apaytuvi: we've found some bug(s) in the initial k-mer enumeration phase of cuttlefish, only occurring with huge k-values (e.g. with k >= 1000)—hence the delay! I'll get back to this once we could address the issue.

apaytuvi · 2022-10-31T05:34:28Z

Thanks Jamshed for these efforts! No problem, I'll wait. Thanks again.

…

________________________________ From: Jamshed Khan ***@***.***> Sent: Sunday, October 30, 2022 9:08:56 PM To: COMBINE-lab/cuttlefish ***@***.***> Cc: apaytuvi ***@***.***>; Mention ***@***.***> Subject: Re: [COMBINE-lab/cuttlefish] Extra large kmer (Issue #22) Hi @apaytuvi<https://github.com/apaytuvi>: we've found some bug(s) in the initial k-mer enumeration phase of cuttlefish, only occurring with huge k-values (e.g. with k >= 1000)—hence the delay! I'll get back to this once we could address the issue. — Reply to this email directly, view it on GitHub<#22 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACVVIQPBSXA2G2IECMQW3O3WF3IVRANCNFSM6AAAAAARK6RQIQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

apaytuvi · 2022-11-28T10:55:27Z

It seems the bug has been solved and this feature should be available. Could you please confirm that @jamshed? Thanks a lot!

support req. #22

jamshed · 2022-12-05T16:48:01Z

Hi @apaytuvi: sorry for the delay in response!

I've pushed a new branch, extra-large-k, with the required support. This needs to be compiled from source, as instructed here. But the cmake line needs to be replaced with the following

cmake -DINSTANCE_COUNT=256 -DCMAKE_INSTALL_PREFIX=../ ..

Currently this supports k up-to 1023. Let me know if you want to try with even larger k—we can extend the range for that.
But note that, the installation takes quite some time with large k (you may use make -j install to make it faster with more threads). Also, the execution performance is also quite time- and disk-heavy—specifically, the initial (k+1)-mer and k-mer enumeration stages.

Let me know if you could test it successfully!

jamshed added the enhancement New feature or request label Oct 24, 2022

jamshed mentioned this issue Oct 30, 2022

Bug with extra large k-mers refresh-bio/KMC#204

Closed

jamshed added a commit that referenced this issue Dec 5, 2022

Support k upto 1023

463fe8e

support req. #22

rob-p mentioned this issue Mar 27, 2024

Support larger alphabets and k via generic kmer_t jermp/sshash#39

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra large kmer #22

Extra large kmer #22

apaytuvi commented Oct 21, 2022

jamshed commented Oct 24, 2022

apaytuvi commented Oct 25, 2022

jamshed commented Oct 30, 2022

apaytuvi commented Oct 31, 2022 via email

apaytuvi commented Nov 28, 2022

jamshed commented Dec 5, 2022

Extra large kmer #22

Extra large kmer #22

Comments

apaytuvi commented Oct 21, 2022

jamshed commented Oct 24, 2022

apaytuvi commented Oct 25, 2022

jamshed commented Oct 30, 2022

apaytuvi commented Oct 31, 2022 via email

apaytuvi commented Nov 28, 2022

jamshed commented Dec 5, 2022