-
Notifications
You must be signed in to change notification settings - Fork 3
Add XMLSupToXrefPipe to convert numeric sup elements to xref #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -86,6 +86,7 @@ def _process(document): | |||||||||||||||||||||
| XMLStylePipe(), | ||||||||||||||||||||||
| XMLArticleMetaCountsPipe(), | ||||||||||||||||||||||
| XMLNormalizeSpacePipe(), | ||||||||||||||||||||||
| XMLSupToXrefPipe(), | ||||||||||||||||||||||
| XMLDeleteRepeatedElementWithId(), | ||||||||||||||||||||||
| XMLDeleteRepeatedTranslations(), | ||||||||||||||||||||||
| XMLFontFaceSymbolPipe(), | ||||||||||||||||||||||
|
|
@@ -645,3 +646,100 @@ def transform(self, data): | |||||||||||||||||||||
|
|
||||||||||||||||||||||
| ET.strip_tags(xml, "REMOVEFONTFACESYMBOL") | ||||||||||||||||||||||
| return data | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| class XMLSupToXrefPipe(plumber.Pipe): | ||||||||||||||||||||||
| """ | ||||||||||||||||||||||
| Transforma elementos <sup> com conteúdo numérico em <xref> quando eles | ||||||||||||||||||||||
| representam referências bibliográficas (bibr) ou notas de rodapé (fn). | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Critérios de análise: | ||||||||||||||||||||||
| - sup deve ter conteúdo numérico | ||||||||||||||||||||||
| - Verifica se não existem xref[@ref-type='bibr'] (evita duplicação) | ||||||||||||||||||||||
|
||||||||||||||||||||||
| - Compara valores numéricos com ref/label e fn/label existentes | ||||||||||||||||||||||
| - Prioriza bibr sobre fn quando há ambiguidade | ||||||||||||||||||||||
| - Na dúvida, mantém como sup | ||||||||||||||||||||||
| """ | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| def transform(self, data): | ||||||||||||||||||||||
| raw, xml = data | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| sups = xml.xpath(".//sup") | ||||||||||||||||||||||
| if not sups: | ||||||||||||||||||||||
| return data | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| sup_values = set() | ||||||||||||||||||||||
| for sup in list(sups): | ||||||||||||||||||||||
| if sup.find("xref") is not None: | ||||||||||||||||||||||
| continue | ||||||||||||||||||||||
| parent = sup.getparent() | ||||||||||||||||||||||
| if parent.tag == "xref": | ||||||||||||||||||||||
| continue | ||||||||||||||||||||||
| text = "".join(sup.itertext()).strip() | ||||||||||||||||||||||
| if text and text.isdigit(): | ||||||||||||||||||||||
| sup_values.add(text) | ||||||||||||||||||||||
| if not sup_values: | ||||||||||||||||||||||
| return data | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ids, numeric_labels = self.get_ids_and_labels(xml, "ref", "mixed-citation") | ||||||||||||||||||||||
| done = self._convert_sup_to_xref(sups, sup_values, ids, numeric_labels, "bibr") | ||||||||||||||||||||||
| if done: | ||||||||||||||||||||||
| return data | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ids, numeric_labels = self.get_ids_and_labels(xml, "fn") | ||||||||||||||||||||||
| done = self._convert_sup_to_xref(sups, sup_values, ids, numeric_labels, "fn") | ||||||||||||||||||||||
| if done: | ||||||||||||||||||||||
| return data | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
||||||||||||||||||||||
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage: The new XMLSupToXrefPipe class lacks test coverage. Given that the test file tests/test_sps_xml_pipes.py contains tests for other pipe classes like XMLFontFaceSymbolPipe, tests should be added for this new pipe to ensure:
- Correct conversion of numeric
<sup>elements to<xref ref-type="bibr"> - Correct conversion of numeric
<sup>elements to<xref ref-type="fn"> - Elements that should not be converted remain unchanged (non-numeric, already wrapped, ambiguous cases)
- Edge cases like missing labels, missing IDs, etc.
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential issue with missing IDs: If a ref or fn element lacks an id attribute, elem.get("id") returns None (line 706), which gets stored in the elem_ids dictionary. Later in _convert_sup_to_xref (line 714), this could result in setting rid to None on the xref element (line 720), which would be invalid XML. Consider adding a check:
elem_id = elem.get("id")
if label and elem_id:
numeric_labels.add(label)
elem_ids[label] = elem_id| if label: | |
| numeric_labels.add(label) | |
| elem_ids[label] = elem.get("id") | |
| elem_id = elem.get("id") | |
| if label and elem_id: | |
| numeric_labels.add(label) | |
| elem_ids[label] = elem_id |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic issue: The _convert_sup_to_xref method doesn't check if the sup element already contains a child xref or if its parent is an xref (as done in lines 673-677). This could lead to incorrect transformations of sup elements that should be skipped. Add the same validation checks here:
for sup in list(sups):
if sup.find("xref") is not None:
continue
parent = sup.getparent()
if parent.tag == "xref":
continue
sup_text = "".join(sup.itertext()).strip()
# ... rest of the logic
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring mentions returning "string vazia" (empty string) when not found, but the method actually returns None (line 709, 712). The documentation should be corrected to accurately reflect the return value: str or None: O label numérico extraído ou None se não encontrado.
| str: O label numérico extraído ou string vazia se não encontrado | |
| str or None: O label numérico extraído ou None se não encontrado |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear docstring: The docstring says this method extracts numeric labels from "mixed-citation" elements, but the method actually works with any XML node, not just mixed-citation elements. It's called for both mixed-citation nodes (line 681) and fn nodes (line 696).
Consider updating the docstring to be more generic:
"""
Extrai o label numérico inicial de um nó XML.
Returns:
str: O label numérico extraído ou None se não encontrado
"""| Extrai o label numérico inicial de um elemento mixed-citation. | |
| Returns: | |
| str: O label numérico extraído ou string vazia se não encontrado | |
| Extrai o label numérico inicial de um nó XML. | |
| Returns: | |
| str: O label numérico extraído ou None se não encontrado |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation inconsistency: The docstring states the method returns "string vazia se não encontrado" (empty string if not found), but the code actually returns None on lines 718 and 721 when no label is found.
Either update the docstring to say "None se não encontrado" or change the code to return empty string "" instead of None for consistency.
| return None | |
| text = "".join(node.itertext()).strip() | |
| if not text or not text[0].isdigit(): | |
| return None | |
| return "" | |
| text = "".join(node.itertext()).strip() | |
| if not text or not text[0].isdigit(): | |
| return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation issue: The docstring states "Verifica se não existem xref[@ref-type='bibr'] (evita duplicação)" but there's no code that actually checks for existing xref elements. This check is mentioned in the PR description but not implemented. Consider adding this check before transforming sup elements, or remove this line from the documentation.