Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow references with up to 2**31 contigs #455

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Allow references with up to 2**31 contigs #455

wants to merge 7 commits into from

Conversation

marcelm
Copy link
Collaborator

@marcelm marcelm commented Oct 22, 2024

Note, this is not quite the layout suggested in #421 (comment). I implemented it this way before that comment, but will adjust it.

This changes the RefRandstrobe struct in the following way.

Previous layout:

  • 40 bits base hash
  • 24 bits auxiliary hash
  • 32 bits position
  • 23 bits reference/contig index
  • 8 bits strobe2 offset
  • 1 bit first_strobe_is_main flag

We take 8 bits away from the auxiliary hash and use those bits for the reference
index. New layout:

  • 40 bits base hash
  • 16 bits auxiliary hash
  • 8 bits strobe2 offset
  • 32 bits position
  • 31 bits reference/contig index
  • 1 bit first_strobe_is_main flag

Closes #421

This changes the RefRandstrobe struct in the following way.

Previous layout:
- 40 bits base hash
- 24 bits auxiliary hash
- 32 bits position
- 23 bits reference/contig index
- 8 bits strobe2 offset
- 1 bit first_strobe_is_main flag

We take 8 bits away from the auxiliary hash and use those bits for the reference
index. New layout:

- 40 bits base hash
- 16 bits auxiliary hash
- 8 bits strobe2 offset
- 32 bits position
- 31 bits reference/contig index
- 1 bit first_strobe_is_main flag

Closes #421
@ksahlin
Copy link
Owner

ksahlin commented Oct 23, 2024

great! Let's have a meeting at some point. I want to understand and learn from you some of the design choices in this commit. Like changing hash to a method.

@luispedro had a local version where he moved the offset bits into positions integer. Just tagging him here so that he can see we are now actively working on it with a different layout related to multi-context seeds. An idea we had for a while that we make public soon.

@marcelm
Copy link
Collaborator Author

marcelm commented Oct 23, 2024

Happy to talk about this further. To elaborate already now: Making all attributes private improves encapsulation. By forcing any access to the hash value to go through the hash() getter method, the RefRandstrobe class is in full control of how it wants to internally represent hash values. There’s only one place we need to modify if we want to change that representation. In this particular case: I now don’t have to write randstrobe.hash & RANDSTROBE_HASH_MASK everywhere a hash value is used, I can just write randstrobe.hash() and let the method deal with.

Also, as a consequence of making all attributes private and not exposing any methods that allow changing them, RefRandstrobe instances are now immutable. And immutable objects are nice because they make it easier to reason about the state of the program.

Layout now:
- 38 bits base hash
- 17 bits auxiliary hash
- 1 bit first_strobe_is_main flag
- 8 bits strobe2 offset
- 32 bits position
- 32 bits reference/contig index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

strobealign: Too many reference sequences.
2 participants