-
Notifications
You must be signed in to change notification settings - Fork 6
feat: dynamic alphabets; disallow gaps #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is to allow combinations of: - different sources - different alphabets - different number of sequences read without introducing a function for each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a configurable alphabet system for FASTA parsing, with the main change being that gaps are no longer allowed by default. The implementation refactors standalone functions into methods of FastaReader to support the new alphabet configuration.
Key changes:
- Introduces an
Alphabetenum with three variants:DnaWithoutGap(new default),DnaWithGap, andCustom - Converts standalone functions (
read_one_fasta,read_many_fasta, etc.) toFastaReadermethods (read_one(),read_many()) - Fixes a bug in
is_empty()where it was checkingseq_nametwice instead of checkingseq
Reviewed Changes
Copilot reviewed 6 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/pangraph/src/io/fasta.rs | Core implementation: adds Alphabet enum, converts standalone functions to methods, fixes is_empty() bug, improves error messages for invalid characters |
| packages/pangraph/tests/itest_export_core_genome.rs | Updates test to use new API with appropriate alphabet selection based on alignment status |
| packages/pangraph/tests/itest_export_block_sequences.rs | Updates test to use new API with appropriate alphabet selection based on alignment status |
| packages/pangraph/tests/itest_export_block_consensus.rs | Updates test to use new method-based API (no gaps expected in consensus) |
| packages/pangraph/src/commands/build/build_run.rs | Migrates from read_many_fasta() to FastaReader::from_paths().read_many() |
| packages/pangraph/src/bin/nextclade_example.rs | Updates both single and multi-file reading to use new method-based API |
| packages/pangraph/src/bin/mmseqs_example.rs | Updates to use new method-based API |
| packages/pangraph/src/bin/minimap2_lib_example.rs | Updates to use new method-based API |
| packages/pangraph/src/align/minimap2_lib/align_with_minimap2_lib.rs | Updates test to use new method-based API |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| pub fn is_empty(&self) -> bool { | ||
| self.seq_name.is_empty() && self.seq_name.is_empty() && self.desc.is_none() && self.index == 0 | ||
| self.seq_name.is_empty() && self.seq.is_empty() && self.desc.is_none() && self.index == 0 |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixed is_empty() check now correctly compares self.seq.is_empty() instead of the duplicate self.seq_name.is_empty() from the original code. This fix is correct and resolves the bug.
mmolari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Ivan! This looks perfect to me. I looked at all the changes and tested it and for me it's ready for merging!
|
Also loving the sequence names in the tests 😆 |
Alternative to #172
*read_one()and*read_many()into methods ofFastaReaderto avoid writing new functions for all combinations of input type, alphabet and output size (one/many).is_empty()check while we are here