Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions scielo_classic_website/spsxml/sps_xml_article_meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,13 @@ def _create_date_element(element_name, attributes, date_text):
year, month, day = date_text[:4], date_text[4:6], date_text[6:]
labels = ("day", "month", "year")
for label, value in zip(labels, (day, month, year)):
if int(value) != 0:
e = ET.Element(label)
e.text = value
date_element.append(e)
if not value.isdigit():
Comment on lines 64 to +67
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't validate that date_text has the expected length (8 characters) before slicing. If date_text is shorter than 8 characters, the slicing operations will produce empty strings or truncated values. For example, if date_text is "2020", then year="2020", month="", day="". The empty string will fail the isdigit() check and break, but this could lead to silent failures. Consider adding a length check after the line 58 check, e.g., "if len(date_text) != 8: return" to make the validation more explicit and catch malformed input early.

Copilot uses AI. Check for mistakes.
break
if int(value) == 0:
break
Comment on lines +68 to +70
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic now uses "break" instead of the implicit "continue" from the previous code, which fundamentally changes the behavior. When a zero or non-digit is encountered, the break statement stops all further date components from being processed.

For example, with date "20200200" (valid year and month, but day is "00"):

  • Old behavior: would skip day but still add month and year elements
  • New behavior: breaks at day "00" and adds no elements at all

This means valid year and month values will be lost if the day is zero or invalid. Consider using "continue" instead of "break" to preserve the old behavior of skipping invalid components while still processing valid ones. Alternatively, if the intent is to process in order of significance, the iteration order should be changed to (year, month, day) so that more significant components are processed first.

Suggested change
break
if int(value) == 0:
break
continue
if int(value) == 0:
continue

Copilot uses AI. Check for mistakes.
e = ET.Element(label)
e.text = value
date_element.append(e)
Comment on lines +67 to +73
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces significant changes to date validation logic but no new tests are added to verify the behavior with malformed dates (e.g., "00000000", "20200200", "2023 12", or dates with non-numeric characters). The PR description specifically mentions these edge cases as the motivation for the changes. Add test cases to cover these scenarios and verify that the validation behaves as expected, especially given the critical logic changes identified in the _create_date_element function.

Copilot uses AI. Check for mistakes.
return date_element


Expand Down Expand Up @@ -523,7 +526,18 @@ def transform(self, data):
for date_type, date_ in dates.items():
if date_:
attributes = {"date-type": date_type}
elem = _create_date_element("date", attributes, date_)
try:
elem = _create_date_element("date", attributes, date_)
except Exception as e:
raw.exceptions.append(
{
"pipe": "XMLArticleMetaHistoryPipe",
"error_type": str(type(e)),
"error_message": str(e),
"exc_traceback": traceback.format_exc(),
}
)
continue
Comment on lines +529 to +540
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The traceback module is used here but is not imported at the top of the file. This will cause a NameError when an exception occurs. Add "import traceback" to the imports section at the top of the file (similar to how it's done in sps_xml_refs.py:2).

Copilot uses AI. Check for mistakes.
history.append(elem)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the try-except block, the code attempts to append elem to history without checking if elem is None. The _create_date_element function can return None in two cases: when date_text is falsy (line 58-59), or when the date_element has no child elements added due to validation failures. If _create_date_element returns None and no exception is raised, this will attempt to append None to the history element, which could cause issues. Add a check "if elem is not None:" before the append operation on line 541.

Suggested change
history.append(elem)
if elem is not None:
history.append(elem)

Copilot uses AI. Check for mistakes.

if len(history.findall("date")) > 0:
Expand Down
Loading