Skip to content

Conversation

@ivan-aksamentov
Copy link
Member

@ivan-aksamentov ivan-aksamentov commented Nov 14, 2025

Alternative to #172

  • add dynamic alphabets: there are preconfigured alphabets: DNA without gaps, DNA with gaps, and a custom alphabet can be used
  • default is now dna without gaps (gaps are no longer allowed)
  • when using gap-less alphabet and a gap is encountered, there is an additional note in the error message which tells users to strip the gaps
  • convert *read_one() and *read_many() into methods of FastaReader to avoid writing new functions for all combinations of input type, alphabet and output size (one/many).
  • fix is_empty() check while we are here
  • adjust tests and introduce new ones

This is to allow combinations of:
- different sources
- different alphabets
- different number of sequences read
without introducing a function for each
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a configurable alphabet system for FASTA parsing, with the main change being that gaps are no longer allowed by default. The implementation refactors standalone functions into methods of FastaReader to support the new alphabet configuration.

Key changes:

  • Introduces an Alphabet enum with three variants: DnaWithoutGap (new default), DnaWithGap, and Custom
  • Converts standalone functions (read_one_fasta, read_many_fasta, etc.) to FastaReader methods (read_one(), read_many())
  • Fixes a bug in is_empty() where it was checking seq_name twice instead of checking seq

Reviewed Changes

Copilot reviewed 6 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/pangraph/src/io/fasta.rs Core implementation: adds Alphabet enum, converts standalone functions to methods, fixes is_empty() bug, improves error messages for invalid characters
packages/pangraph/tests/itest_export_core_genome.rs Updates test to use new API with appropriate alphabet selection based on alignment status
packages/pangraph/tests/itest_export_block_sequences.rs Updates test to use new API with appropriate alphabet selection based on alignment status
packages/pangraph/tests/itest_export_block_consensus.rs Updates test to use new method-based API (no gaps expected in consensus)
packages/pangraph/src/commands/build/build_run.rs Migrates from read_many_fasta() to FastaReader::from_paths().read_many()
packages/pangraph/src/bin/nextclade_example.rs Updates both single and multi-file reading to use new method-based API
packages/pangraph/src/bin/mmseqs_example.rs Updates to use new method-based API
packages/pangraph/src/bin/minimap2_lib_example.rs Updates to use new method-based API
packages/pangraph/src/align/minimap2_lib/align_with_minimap2_lib.rs Updates test to use new method-based API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


pub fn is_empty(&self) -> bool {
self.seq_name.is_empty() && self.seq_name.is_empty() && self.desc.is_none() && self.index == 0
self.seq_name.is_empty() && self.seq.is_empty() && self.desc.is_none() && self.index == 0
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed is_empty() check now correctly compares self.seq.is_empty() instead of the duplicate self.seq_name.is_empty() from the original code. This fix is correct and resolves the bug.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@mmolari mmolari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Ivan! This looks perfect to me. I looked at all the changes and tested it and for me it's ready for merging!

@mmolari
Copy link
Collaborator

mmolari commented Nov 18, 2025

Also loving the sequence names in the tests 😆

@mmolari mmolari linked an issue Nov 18, 2025 that may be closed by this pull request
@ivan-aksamentov ivan-aksamentov merged commit d3760b8 into master Nov 18, 2025
24 checks passed
@ivan-aksamentov ivan-aksamentov deleted the feat/dynamic-alphabets branch November 18, 2025 11:58
@ivan-aksamentov ivan-aksamentov restored the feat/dynamic-alphabets branch November 20, 2025 09:05
@ivan-aksamentov ivan-aksamentov deleted the feat/dynamic-alphabets branch November 20, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add informative error message when input sequences contain gaps

2 participants