-
Notifications
You must be signed in to change notification settings - Fork 86
Open
Labels
C: conventionRelates to docstring format conventionRelates to docstring format conventionP: bugPEP 257 violation or existing functionality that doesn't work as documentedPEP 257 violation or existing functionality that doesn't work as documentedU: high
Description
I saw the error in CI because I used master (that was needed for a while to use with pre-commit version 4 and above).
Some multi lines sentences are split incorrectly. This is the behavior of the function split_summary
>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
'and split in two. I even have a third sentence here.',
'Second sentence is long',
'',
'And other text here.']I think this comes from the split_summary function that does this:
lines[0] = first_sentence
if rest_text:
lines.insert(2, rest_text)and inserts at the wrong place if we have sentences that are too long.
I am not familiar with the code base, but maybe something along those lines could work?
def split_summary(lines) -> List[str]:
"""Split multi-sentence summary into the first sentence and the rest."""
if not lines or not lines[0].strip():
return lines
text = lines[0].strip()
tokens = re.split(r"(\s+)", text) # Keep whitespace for accurate rejoining
sentence = []
rest = []
i = 0
while i < len(tokens):
token = tokens[i]
sentence.append(token)
if token.endswith(".") and not any(
"".join(sentence).strip().endswith(abbr) for abbr in ABBREVIATIONS
):
i += 1
break
i += 1
rest = tokens[i:]
first_sentence = "".join(sentence).strip()
rest_text = "".join(rest).strip()
new_lines = [first_sentence, ""]
if rest_text:
new_lines.append(rest_text)
new_lines.extend(line for line in lines[1:] if line)
return new_linesThis gives:
>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
'',
'Second sentence is long',
'and split in two. I even have a third sentence here.',
'And other text here.']I do not know if the result should be processed more before returning or if it is something that is taken into account elsewhere in the codebase.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
C: conventionRelates to docstring format conventionRelates to docstring format conventionP: bugPEP 257 violation or existing functionality that doesn't work as documentedPEP 257 violation or existing functionality that doesn't work as documentedU: high