tmd_start and tmd_stop definition in the Position-based format for SequenceFeature.get_df_parts()

### Question: 
Does SequenceFeature.get_df_parts() use 0- or 1-based indexing for tmd_start / tmd_stop?

### Description:
I'm trying to determine whether the SequenceFeature.get_df_parts() function expects the tmd_start and tmd_stop values in the DataFrame to be 1-based or 0-based indexed. This matters because P1 annotations (e.g. from MEROPS) usually refer to residue positions starting from 1.

Concrete ### Example:
If the cleavage site P1 is at position 10 and the TMD should be 10 amino acids long:

Should I write tmd_start = 6 (assuming 1-based indexing)?
Or should I use tmd_start = 5 (assuming 0-based indexing)?

Similar for tmd_stop:
Does tmd_stop include the amino acid at that position?
Or is it excluded, as is typical in Python slicing?

### Code to reproduce:

record = next(SeqIO.parse(input_file, "fasta"))

p1 = 10
p1_start_with_0 = p1 -1

seq = str(record.seq)

p1_amino_acid = seq[p1_start_with_0]
actual_tmd = str(seq[p1_start_with_0-4:p1_start_with_0+6])

tmd_start_with_1 = p1 - 4
tmd_end_with_1 = p1 + 6

df_animo_acid = pd.DataFrame({"entry": [id_sub], "sequence": [seq], "tmd_start": [tmd_start_with_1], "tmd_stop": [tmd_end_with_1]})
print("df_amino_acid:")

print(df_animo_acid)
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_animo_acid, jmd_c_len=5, jmd_n_len=5)
tmd_from_get_parts = df_parts["tmd"].iloc[0]



### Output:

df_amino_acid:
           entry              sequence  tmd_start  tmd_stop
0  A0A0A0VBX4_45  LDRYLQRGVRDVHRPCQSVR          6        16
df_parts:
                       tmd  jmd_n_tmd_n tmd_c_jmd_c
A0A0A0VBX4_45  QRGVRDVHRPC  LDRYLQRGVRD  VHRPCQSVR-

TMDs:
QRGVRDVHRP
QRGVRDVHRPC (from get_df_parts)
p1 in tmd from seq:  R
p1 in tmd from get_df_parts:  R
length optained tmd from seq:  10
length optained tmd from get_df_parts:  11
length of tmd from get_df_parts behind p1:  6

### Conclusion:
TMDs matched in content at the start
But get_df_parts() returns 11 residues instead of 10 → it seems to include the residue at tmd_stop.
get_df_parts() appears to interpret tmd_start and tmd_stop as 1-based, matching typical annotation formats like UniProt/MEROPS.
tmd_stop is inclusive – the residue at that position is included in the result.
Suggestion: It would be helpful if the behavior were clarified in the documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tmd_start and tmd_stop definition in the Position-based format for SequenceFeature.get_df_parts() #15

Question:

Description:

Code to reproduce:

Output:

Conclusion:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

tmd_start and tmd_stop definition in the Position-based format for SequenceFeature.get_df_parts() #15

Description

Question:

Description:

Code to reproduce:

Output:

Conclusion:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions