Skip to content

Comments

Improve IMGT numbering#76

Open
benjiemc wants to merge 5 commits intooxpig:mainfrom
benjiemc:improve-imgt-numbering
Open

Improve IMGT numbering#76
benjiemc wants to merge 5 commits intooxpig:mainfrom
benjiemc:improve-imgt-numbering

Conversation

@benjiemc
Copy link
Contributor

@benjiemc benjiemc commented Jun 6, 2025

I have modified the annotations to improve the IMGT numbering for molecules. Previously, MHC class I were only being half numbered because anarci would find the first hit, and then that would be passed back and used as annotations. Now, anarci keeps going until no more annotations can be found for a sequence, and these are used to number the whole molecule. These changes also update how scTCRs are identified.

These changes also rely on a fix that is staged as a PR in anarci (https://github.com/npqst/anarci-mhc/pull/5). There is an issue with how CD1 molecules are currently identified by anarci.

I also added a fix for interaction profiling that addresses #75, because the updated numbering uncovered some potential issues with how the conversion between PLIP and PDB numbering was working.

@benjiemc
Copy link
Contributor Author

Ran into an issue that needs to be resolved caused by this change before the PR is ready to be merged.

File "/Users/bmcmaster/code/STCRpy/stcrpy/tcr_processing/TCRParser.py", line 288, in get_tcr_structure
    numbering, chain_type, germline_info, scTCR = annotate(chain)
                                                  ^^^^^^^^^^^^^^^
  File "/Users/bmcmaster/code/STCRpy/stcrpy/tcr_processing/annotate.py", line 108, in annotate
    aligned_numbering = align_numbering(numbering, sequence_list)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bmcmaster/code/STCRpy/stcrpy/tcr_processing/annotate.py", line 297, in align_numbering
    raise AlignmentError(numbered_sequence_ali, input_sequence_ali)
stcrpy.tcr_processing.annotate.AlignmentError: Could not align sequences: -----------------------------DSGVVQSPRHIIKEKGGRSVLTCIPISGHSNVVWYQQTLGKELKFLIQHYEKVERDKGFLPSRFSVQQFDDYHSEMNMSALELEDSAMYFCASSLTGDYAEQFFGPGTRLTVL----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- and MSNTVLADSAWGITLLSWVTVFLLGTSSADSGVVQSPRHIIKEKGGRSVLTCIPISGHSNVVWYQQTLGKELKFLIQHYEKVERDKGFLPSRFSVQQFDDYHSEMNMSALELEDSAMYFCASSLTGDYAEQFFGPGTRLTVLEDLRNVTPPKVSLFEPSKAEIANKQKATLVCLARGFFPDHVELSWWVNGKEVHSGVSTDPQAYKESNYSYCLSSRLRVSATFWHNPRNHFRCQVQFHGLSEEDKWPEGSPKPVTQNISAEAWGRADCGITSASYHQGVLSATILYEILLGKATLYAVLVSGLVLMAMVKKKNS

----------------------------------------------------------------------

@benjiemc benjiemc marked this pull request as draft June 19, 2025 15:50
@benjiemc
Copy link
Contributor Author

Ran into an issue that needs to be resolved caused by this change before the PR is ready to be merged.

File "/Users/bmcmaster/code/STCRpy/stcrpy/tcr_processing/TCRParser.py", line 288, in get_tcr_structure
    numbering, chain_type, germline_info, scTCR = annotate(chain)
                                                  ^^^^^^^^^^^^^^^
  File "/Users/bmcmaster/code/STCRpy/stcrpy/tcr_processing/annotate.py", line 108, in annotate
    aligned_numbering = align_numbering(numbering, sequence_list)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bmcmaster/code/STCRpy/stcrpy/tcr_processing/annotate.py", line 297, in align_numbering
    raise AlignmentError(numbered_sequence_ali, input_sequence_ali)
stcrpy.tcr_processing.annotate.AlignmentError: Could not align sequences: -----------------------------DSGVVQSPRHIIKEKGGRSVLTCIPISGHSNVVWYQQTLGKELKFLIQHYEKVERDKGFLPSRFSVQQFDDYHSEMNMSALELEDSAMYFCASSLTGDYAEQFFGPGTRLTVL----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- and MSNTVLADSAWGITLLSWVTVFLLGTSSADSGVVQSPRHIIKEKGGRSVLTCIPISGHSNVVWYQQTLGKELKFLIQHYEKVERDKGFLPSRFSVQQFDDYHSEMNMSALELEDSAMYFCASSLTGDYAEQFFGPGTRLTVLEDLRNVTPPKVSLFEPSKAEIANKQKATLVCLARGFFPDHVELSWWVNGKEVHSGVSTDPQAYKESNYSYCLSSRLRVSATFWHNPRNHFRCQVQFHGLSEEDKWPEGSPKPVTQNISAEAWGRADCGITSASYHQGVLSATILYEILLGKATLYAVLVSGLVLMAMVKKKNS

----------------------------------------------------------------------

This issue is fine because it happened from expected behaviour. The error message has now been improved to include more details.

@benjiemc benjiemc marked this pull request as ready for review June 19, 2025 16:20
@benjiemc benjiemc force-pushed the improve-imgt-numbering branch from 844037b to b69525b Compare September 26, 2025 10:12
Now, anarci keeps going in the sequences to find more annotations and
the numbering pipeline handles gaps in the sequences.
@benjiemc benjiemc force-pushed the improve-imgt-numbering branch from b69525b to 567f6c9 Compare January 19, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant