feat(inference): Add support for custom residue numbering (resolves #58) #69
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This Pull Request implements full support for custom residue numbering in the inference output. This feature allows users to define specific residue numbering in the input JSON, resolving Issue #58.
Summary of Changes
The primary goal was to allow users to define specific residue numbering in the input JSON, rather than relying on the default numbering starting from 1. This includes support for non-sequential numbers and PDB-style insertion codes (e.g., '103A').
The implementation required coordinated changes across three key modules:
inference_query_format.py):Chainclass:starting_residue_number(for simple offset) andresidue_ids(for explicit lists).inference.py):residue_idstakes precedence overstarting_residue_number. If a valid explicit list is provided, it is used; otherwise, a sequential list is generated based on the start number. The final list is stored in the data batch.writer.py):OF3OutputWriter._renumber_atom_array. This method executes after model inference but before writing the PDB/mmCIF file.re) to safely parse string IDs (e.g., separating'103A'into the integer ID 103 and the insertion code 'A').AtomArray'sres_idandins_codeannotations. This ensures the output structure reflects the desired numbering without affecting core model calculations.Related Issues
Resolves: #58
Testing and Validation
Note on Testing: Due to local environment configuration issues (missing model checkpoints), an end-to-end test run was not possible to perform.
However, the logic has been manually validated to ensure:
residue_idsand handles sequence length mismatch by defaulting to standard numbering (1, 2, 3...).writer.pycorrectly extracts insertion codes, which is critical for PDB compliance.