Add XMLSupToXrefPipe to convert numeric sup elements to xref #129

robertatakenaka · 2025-12-03T13:32:16Z

Adiciona XMLSupToXrefPipe para converter elementos sup numéricos em xref

Descrição

Esta PR implementa um novo pipe de transformação XML (XMLSupToXrefPipe) que converte automaticamente elementos  com conteúdo numérico em elementos <xref> apropriados quando representam referências bibliográficas ou notas de rodapé.

Motivação

Documentos XML frequentemente contêm elementos  que na verdade são referências bibliográficas ou notas de rodapé, mas não estão marcados corretamente como <xref>. Isso pode afetar o processamento e a renderização adequada desses elementos no sistema.

Alterações implementadas

Novo pipe XMLSupToXrefPipe: Transforma elementos  numéricos em <xref ref-type="bibr"> ou <xref ref-type="fn"> baseado na análise do conteúdo
Análise inteligente:
- Identifica elementos  com conteúdo numérico
- Extrai labels numéricos de elementos <ref> e <fn> existentes
- Compara valores para determinar o tipo correto de referência
- Prioriza referências bibliográficas (bibr) sobre notas de rodapé (fn) em casos ambíguos
Método auxiliar _extract_numeric_label: Extrai labels numéricos do início de elementos mixed-citation
Prevenção de duplicação: Verifica xref existentes antes de aplicar transformações
Integração ao pipeline: Pipe adicionado à sequência de processamento XML

Impacto

Melhora a qualidade semântica do XML processado
Garante marcação correta de referências e notas
Mantém compatibilidade com o processamento existente
Não altera elementos quando há incerteza sobre o tipo correto

Como testar

Processar documentos XML contendo elementos  numéricos
Verificar se elementos  que correspondem a refs ou fns são convertidos corretamente
Confirmar que elementos  ambíguos ou não-referências permanecem inalterados
Validar que não há duplicação de elementos <xref>

Copilot

Pull request overview

This PR attempts to add a new XMLSupToXrefPipe transformation pipe that converts  elements with numeric content into <xref> elements when they represent bibliographic references (bibr) or footnotes (fn). However, the implementation is incomplete and contains critical bugs that prevent it from functioning.

Key Issues

The transform method contains an incomplete if statement (line 698) causing a syntax error
Method naming mismatch: calls to _extract_numeric_label_from_mixed_citation reference a non-existent method
Multiple collected variables (sup_values, ref_numeric_labels, fn_numeric_labels, ref_labels, fn_labels, refs, fns) are never used in any transformation logic

Comments suppressed due to low confidence (1)

scielo_classic_website/spsxml/sps_xml_pipes.py:698

Syntax Error (in Python 3).

if

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T13:36:12Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+        Extrai o label numérico inicial de um elemento mixed-citation.
+
+        Returns:
+            str: O label numérico extraído ou string vazia se não encontrado


The docstring mentions returning "string vazia" (empty string) when not found, but the method actually returns None (line 709, 712). The documentation should be corrected to accurately reflect the return value: str or None: O label numérico extraído ou None se não encontrado.

Suggested change

str: O label numérico extraído ou string vazia se não encontrado

str or None: O label numérico extraído ou None se não encontrado

Copilot · 2025-12-03T13:36:12Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+class XMLSupToXrefPipe(plumber.Pipe):
+    """
+    Transforma elementos <sup> com conteúdo numérico em <xref> quando eles
+    representam referências bibliográficas (bibr) ou notas de rodapé (fn).
+
+    Critérios de análise:
+    - sup deve ter conteúdo numérico
+    - Verifica se não existem xref[@ref-type='bibr'] (evita duplicação)
+    - Compara valores numéricos com ref/label e fn/label existentes
+    - Prioriza bibr sobre fn quando há ambiguidade
+    - Na dúvida, mantém como sup
+    """
+
+    def transform(self, data):
+        raw, xml = data
+
+        sups = xml.xpath(".//sup")
+        if not sups:
+            return data
+
+        sup_values = set()
+        for sup in sups:
+            if sup.text and sup.text.strip().isdigit():
+                sup_values.add(sup.text.strip())
+        if not sup_values:
+            return data
+
+        ref_numeric_labels = []
+        for item in xml.xpath(".//ref"):
+            label = item.findtext("label") or self._extract_numeric_label_from_mixed_citation(item)
+            if label:
+                ref_numeric_labels.append(label)
+
+        fn_numeric_labels = []
+        for item in xml.xpath(".//fn"):
+            label = item.findtext("label") or self._extract_numeric_label_from_mixed_citation(item)
+            if label:
+                fn_numeric_labels.append(label)
+
+        ref_labels = [
+            (ref.findtext("label"), ref) for ref in xml.xpath(".//ref") if ref.find("label") is not None
+        ]
+        fn_labels = [
+            (fn.findtext("label"), fn) for fn in xml.xpath(".//fn") if fn.find("label") is not None
+        ]
+        refs = xml.xpath(".//ref")
+        fns = xml.xpath(".//fn")
+        if 
+        return data
+
+    def _extract_numeric_label(self, node):
+        """
+        Extrai o label numérico inicial de um elemento mixed-citation.
+
+        Returns:
+            str: O label numérico extraído ou string vazia se não encontrado
+        """
+        if node is None:
+            return None
+        text = "".join(node.itertext()).strip()
+        if not text or not text[0].isdigit():
+            return None
+
+        label = ""
+        for char in text:
+            if char.isdigit():
+                label += char
+            else:
+                break
+        return label


The new XMLSupToXrefPipe class lacks test coverage. Based on the test patterns in the repository (see test_sps_xml_pipes.py), comprehensive tests should be added to verify the transformation logic, including cases for: numeric sup elements matching bibr/fn labels, non-numeric sup elements, missing labels, edge cases, and the prioritization of bibr over fn.

Copilot · 2025-12-03T13:36:12Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+        sup_values = set()
+        for sup in sups:
+            if sup.text and sup.text.strip().isdigit():
+                sup_values.add(sup.text.strip())


The sup_values set is collected but never used in the subsequent logic. This variable should likely be used to filter or compare against the ref_numeric_labels and fn_numeric_labels, or it should be removed if not needed.

Copilot · 2025-12-03T13:36:13Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+        ]
+        refs = xml.xpath(".//ref")
+        fns = xml.xpath(".//fn")
+        if 


Incomplete if statement with no condition. This is a syntax error that will prevent the code from running.

Suggested change

if

Copilot · 2025-12-03T13:36:13Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+        if 
+        return data
+
+    def _extract_numeric_label(self, node):


The method _extract_numeric_label_from_mixed_citation is called on lines 680 and 686, but the actual method defined is named _extract_numeric_label. Either the method name should be changed to _extract_numeric_label_from_mixed_citation or the calls should use _extract_numeric_label.

Copilot · 2025-12-03T13:36:13Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+        ref_labels = [
+            (ref.findtext("label"), ref) for ref in xml.xpath(".//ref") if ref.find("label") is not None
+        ]
+        fn_labels = [
+            (fn.findtext("label"), fn) for fn in xml.xpath(".//fn") if fn.find("label") is not None
+        ]
+        refs = xml.xpath(".//ref")
+        fns = xml.xpath(".//fn")
+        if 


Variables ref_labels, fn_labels, refs, and fns are defined but never used. The code appears incomplete as these variables are collected but not utilized in any logic before returning. Either complete the implementation or remove these unused variables.

Suggested change

ref_labels = [

(ref.findtext("label"), ref) for ref in xml.xpath(".//ref") if ref.find("label") is not None

]

fn_labels = [

(fn.findtext("label"), fn) for fn in xml.xpath(".//fn") if fn.find("label") is not None

]

refs = xml.xpath(".//ref")

fns = xml.xpath(".//fn")

if

# Removed unused variables and incomplete logic

Copilot · 2025-12-03T13:36:13Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+        ref_numeric_labels = []
+        for item in xml.xpath(".//ref"):
+            label = item.findtext("label") or self._extract_numeric_label_from_mixed_citation(item)
+            if label:
+                ref_numeric_labels.append(label)
+
+        fn_numeric_labels = []
+        for item in xml.xpath(".//fn"):
+            label = item.findtext("label") or self._extract_numeric_label_from_mixed_citation(item)
+            if label:
+                fn_numeric_labels.append(label)
+


Variables ref_numeric_labels and fn_numeric_labels are collected but never used. Since the code is incomplete, these should likely be incorporated into the missing logic or removed if not needed.

Suggested change

ref_numeric_labels = []

for item in xml.xpath(".//ref"):

label = item.findtext("label") or self._extract_numeric_label_from_mixed_citation(item)

if label:

ref_numeric_labels.append(label)

fn_numeric_labels = []

for item in xml.xpath(".//fn"):

label = item.findtext("label") or self._extract_numeric_label_from_mixed_citation(item)

if label:

fn_numeric_labels.append(label)

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T13:46:37Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+            for sup in sups:
+                sup_text = sup.text.strip()
+                sup.tag = "xref"
+                sup.set("ref-type", "bibr")
+                sup.set("rid", ref_ids[sup_text])
+            return data
+
+        fn_ids = {}
+        fn_numeric_labels = set()
+        for item in xml.xpath(".//fn"):
+            label = item.findtext("label") or self._extract_numeric_label_from_node(item)
+            if label:
+                fn_numeric_labels.add(label)
+                fn_ids[label] = item.get("id")
+        if sup_values.issubset(ref_numeric_labels):
+            for sup in sups:
+                sup_text = sup.text.strip()
+                sup.tag = "xref"
+                sup.set("ref-type", "fn")
+                sup.set("rid", fn_ids[sup_text])
+            return data


Maintainability: The logic for converting sup elements to xref for bibr (lines 686-691) and fn (lines 701-706) is duplicated. This duplication makes the code harder to maintain and increases the risk of bugs. Consider extracting this into a helper method like _convert_sups_to_xrefs(sups, ref_type, ids_dict) to reduce code duplication.

Copilot · 2025-12-03T13:46:37Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+            for sup in sups:
+                sup_text = sup.text.strip()
+                sup.tag = "xref"
+                sup.set("ref-type", "bibr")
+                sup.set("rid", ref_ids[sup_text])
+            return data


[nitpick] Best practices violation: The code modifies elements in the sups list while iterating over it. While this specific case may work because only attributes and tag names are being changed (not adding/removing elements), it's generally a risky pattern. Consider iterating over a copy of the list for better safety: for sup in list(sups): or for sup in sups[:]:.

Copilot · 2025-12-03T13:46:38Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+            for sup in sups:
+                sup_text = sup.text.strip()
+                sup.tag = "xref"
+                sup.set("ref-type", "bibr")
+                sup.set("rid", ref_ids[sup_text])
+            return data


Bug: This loop processes ALL sup elements found on line 667, not just those with numeric text. Non-numeric sup elements (e.g., *) will cause an AttributeError on line 687 when calling .strip() on None text, or will be incorrectly converted if they have text. The loop should only process sup elements that are in sup_values. Consider: for sup in sups: if sup.text and sup.text.strip() in sup_values:.

Copilot · 2025-12-03T13:46:38Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+            for sup in sups:
+                sup_text = sup.text.strip()
+                sup.tag = "xref"
+                sup.set("ref-type", "fn")
+                sup.set("rid", fn_ids[sup_text])
+            return data


Bug: Same issue as lines 686-691 - this loop processes ALL sup elements, not just those with numeric text. Non-numeric sup elements will cause errors or be incorrectly converted. The loop should only process sup elements that are in sup_values. Consider: for sup in sups: if sup.text and sup.text.strip() in sup_values:.

Copilot · 2025-12-03T13:46:38Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+                sup.tag = "xref"
+                sup.set("ref-type", "fn")
+                sup.set("rid", fn_ids[sup_text])


Bug: Same KeyError risk as line 690 - if sup_text is not in fn_ids dictionary, this will crash. Add validation: if sup_text in fn_ids: before accessing the dictionary.

Suggested change

sup.tag = "xref"

sup.set("ref-type", "fn")

sup.set("rid", fn_ids[sup_text])

if sup_text in fn_ids:

sup.tag = "xref"

sup.set("ref-type", "fn")

sup.set("rid", fn_ids[sup_text])

Copilot · 2025-12-03T13:46:38Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+            label = item.findtext("label") or self._extract_numeric_label_from_node(item.find("mixed-citation"))
+            if label:
+                ref_numeric_labels.add(label)
+                ref_ids[label] = item.get("id")


Bug: The code adds label to ref_numeric_labels and ref_ids[label] only when label exists, but doesn't check if item.get("id") returns None. If a ref element doesn't have an 'id' attribute, ref_ids[label] will be None, which will cause issues later when setting the 'rid' attribute on line 690. Only add to the dictionaries if both label and id exist: if label and item.get("id"):.

Copilot · 2025-12-03T13:46:39Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+            if label:
+                fn_numeric_labels.add(label)
+                fn_ids[label] = item.get("id")


Bug: Same issue as lines 681-684 - doesn't check if item.get("id") returns None. If a fn element doesn't have an 'id' attribute, fn_ids[label] will be None. Only add to the dictionaries if both label and id exist: if label and item.get("id"):.

Suggested change

if label:

fn_numeric_labels.add(label)

fn_ids[label] = item.get("id")

id_ = item.get("id")

if label and id_:

fn_numeric_labels.add(label)

fn_ids[label] = id_

Copilot · 2025-12-03T13:46:39Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+
+    Critérios de análise:
+    - sup deve ter conteúdo numérico
+    - Verifica se não existem xref[@ref-type='bibr'] (evita duplicação)


Documentation issue: The docstring states "Verifica se não existem xref[@ref-type='bibr'] (evita duplicação)" but there's no code that actually checks for existing xref elements. This check is mentioned in the PR description but not implemented. Consider adding this check before transforming sup elements, or remove this line from the documentation.

Copilot · 2025-12-03T13:46:39Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+                sup.tag = "xref"
+                sup.set("ref-type", "bibr")
+                sup.set("rid", ref_ids[sup_text])


Bug: The code will raise a KeyError if sup_text is not in ref_ids dictionary. This can happen if a sup element was added to sup_values but doesn't have a corresponding entry in ref_ids (e.g., if the ref lacks an 'id' attribute). Add validation: if sup_text in ref_ids: before accessing the dictionary, or ensure all refs have IDs.

Suggested change

sup.tag = "xref"

sup.set("ref-type", "bibr")

sup.set("rid", ref_ids[sup_text])

if sup_text in ref_ids and ref_ids[sup_text] is not None:

sup.tag = "xref"

sup.set("ref-type", "bibr")

sup.set("rid", ref_ids[sup_text])

else:

logging.warning(f"Sup value '{sup_text}' not found in ref_ids or has no id; skipping.")

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T14:34:34Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+        Extrai o label numérico inicial de um elemento mixed-citation.
+
+        Returns:
+            str: O label numérico extraído ou string vazia se não encontrado


Unclear docstring: The docstring says this method extracts numeric labels from "mixed-citation" elements, but the method actually works with any XML node, not just mixed-citation elements. It's called for both mixed-citation nodes (line 681) and fn nodes (line 696).

Consider updating the docstring to be more generic:

""" Extrai o label numérico inicial de um nó XML. Returns: str: O label numérico extraído ou None se não encontrado """

Suggested change

Extrai o label numérico inicial de um elemento mixed-citation.

Returns:

str: O label numérico extraído ou string vazia se não encontrado

Extrai o label numérico inicial de um nó XML.

Returns:

str: O label numérico extraído ou None se não encontrado

Copilot · 2025-12-03T14:34:34Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+class XMLSupToXrefPipe(plumber.Pipe):
+    """
+    Transforma elementos <sup> com conteúdo numérico em <xref> quando eles
+    representam referências bibliográficas (bibr) ou notas de rodapé (fn).
+
+    Critérios de análise:
+    - sup deve ter conteúdo numérico
+    - Verifica se não existem xref[@ref-type='bibr'] (evita duplicação)
+    - Compara valores numéricos com ref/label e fn/label existentes
+    - Prioriza bibr sobre fn quando há ambiguidade
+    - Na dúvida, mantém como sup
+    """
+
+    def transform(self, data):
+        raw, xml = data
+
+        sups = xml.xpath(".//sup")
+        if not sups:
+            return data
+
+        sup_values = set()
+        for sup in sups:
+            if sup.text and sup.text.strip().isdigit():
+                sup_values.add(sup.text.strip())
+        if not sup_values:
+            return data
+
+        ref_ids = {}
+        ref_numeric_labels = set()
+        for item in xml.xpath(".//ref"):
+            label = item.findtext("label") or self._extract_numeric_label_from_node(item.find("mixed-citation"))
+            if label:
+                ref_numeric_labels.add(label)
+                ref_ids[label] = item.get("id")
+        if sup_values.issubset(ref_numeric_labels):
+            for sup in sups:
+                sup_text = sup.text.strip()
+                sup.tag = "xref"
+                sup.set("ref-type", "bibr")
+                sup.set("rid", ref_ids[sup_text])
+            return data
+
+        fn_ids = {}
+        fn_numeric_labels = set()
+        for item in xml.xpath(".//fn"):
+            label = item.findtext("label") or self._extract_numeric_label_from_node(item)
+            if label:
+                fn_numeric_labels.add(label)
+                fn_ids[label] = item.get("id")
+        if sup_values.issubset(ref_numeric_labels):
+            for sup in sups:
+                sup_text = sup.text.strip()
+                sup.tag = "xref"
+                sup.set("ref-type", "fn")
+                sup.set("rid", fn_ids[sup_text])
+            return data
+
+        return data
+
+    def _extract_numeric_label_from_node(self, node):
+        """
+        Extrai o label numérico inicial de um elemento mixed-citation.
+
+        Returns:
+            str: O label numérico extraído ou string vazia se não encontrado
+        """
+        if node is None:
+            return None
+        text = "".join(node.itertext()).strip()
+        if not text or not text[0].isdigit():
+            return None
+
+        label = ""
+        for char in text:
+            if char.isdigit():
+                label += char
+            else:
+                break
+        return label


Missing test coverage: The new XMLSupToXrefPipe class lacks automated tests. Similar pipes in this module (like XMLFontFaceSymbolPipe) have comprehensive test coverage in tests/test_sps_xml_pipes.py.

Consider adding tests to cover:

Converting numeric  to <xref ref-type="bibr"> when matching ref labels

Converting numeric  to <xref ref-type="fn"> when matching fn labels

Not converting non-numeric  elements

Not converting when there's no match with ref or fn labels

Edge cases like empty  elements or  without text

Copilot · 2025-12-03T14:34:34Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+            return None
+        text = "".join(node.itertext()).strip()
+        if not text or not text[0].isdigit():
+            return None


Documentation inconsistency: The docstring states the method returns "string vazia se não encontrado" (empty string if not found), but the code actually returns None on lines 718 and 721 when no label is found.

Either update the docstring to say "None se não encontrado" or change the code to return empty string "" instead of None for consistency.

Suggested change

return None

text = "".join(node.itertext()).strip()

if not text or not text[0].isdigit():

return None

return ""

text = "".join(node.itertext()).strip()

if not text or not text[0].isdigit():

return ""

Copilot · 2025-12-03T14:34:35Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+                sup_text = sup.text.strip()
+                sup.tag = "xref"
+                sup.set("ref-type", "fn")
+                sup.set("rid", fn_ids[sup_text])


Same issue as lines 686-690: This loop iterates over all sup elements without checking if they have numeric text content. If a sup element doesn't have numeric text, calling sup.text.strip() on line 702 could fail if sup.text is None, and fn_ids[sup_text] on line 705 will raise a KeyError.

Apply the same fix as suggested for the bibr case:

for sup in sups: if sup.text and sup.text.strip().isdigit(): sup_text = sup.text.strip() sup.tag = "xref" sup.set("ref-type", "fn") sup.set("rid", fn_ids[sup_text])

Suggested change

sup_text = sup.text.strip()

sup.tag = "xref"

sup.set("ref-type", "fn")

sup.set("rid", fn_ids[sup_text])

if sup.text and sup.text.strip().isdigit():

sup_text = sup.text.strip()

sup.tag = "xref"

sup.set("ref-type", "fn")

sup.set("rid", fn_ids[sup_text])

Copilot · 2025-12-03T14:34:35Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+
+    Critérios de análise:
+    - sup deve ter conteúdo numérico
+    - Verifica se não existem xref[@ref-type='bibr'] (evita duplicação)


Documentation inconsistency: The docstring states "Verifica se não existem xref[@ref-type='bibr'] (evita duplicação)" (Checks if bibr xrefs don't exist to avoid duplication), but the code doesn't actually perform this check. There's no XPath query to verify existing xref elements before transformation.

If duplication prevention is intended, the code should include a check like:

existing_xrefs = xml.xpath(".//xref[@ref-type='bibr']") if existing_xrefs: return data # or handle differently

Otherwise, remove this claim from the documentation.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T17:36:26Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+            for sup in list(sups):
+                sup_text = "".join(sup.itertext()).strip()
+                rid = ids.get(sup_text)
+                if not rid:
+                    continue
+                parent = sup.getparent()
+                xref = ET.Element("xref")
+                xref.set("ref-type", ref_type)
+                xref.set("rid", rid)
+                xref.append(deepcopy(sup))
+                parent.replace(sup, xref)
+                total += 1


Logic issue: The _convert_sup_to_xref method doesn't check if the sup element already contains a child xref or if its parent is an xref (as done in lines 673-677). This could lead to incorrect transformations of sup elements that should be skipped. Add the same validation checks here:

for sup in list(sups): if sup.find("xref") is not None: continue parent = sup.getparent() if parent.tag == "xref": continue sup_text = "".join(sup.itertext()).strip() # ... rest of the logic

Copilot · 2025-12-03T17:36:26Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+            if label:
+                numeric_labels.add(label)
+                elem_ids[label] = elem.get("id")


Potential issue with missing IDs: If a ref or fn element lacks an id attribute, elem.get("id") returns None (line 706), which gets stored in the elem_ids dictionary. Later in _convert_sup_to_xref (line 714), this could result in setting rid to None on the xref element (line 720), which would be invalid XML. Consider adding a check:

elem_id = elem.get("id") if label and elem_id: numeric_labels.add(label) elem_ids[label] = elem_id

Suggested change

if label:

numeric_labels.add(label)

elem_ids[label] = elem.get("id")

elem_id = elem.get("id")

if label and elem_id:

numeric_labels.add(label)

elem_ids[label] = elem_id

Copilot · 2025-12-03T17:36:26Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+        return elem_ids, numeric_labels
+
+    def _convert_sup_to_xref(self, sups, sup_values, ids, numeric_labels, ref_type):
+       total = 0


Inconsistent indentation: This line uses only 7 spaces for indentation while the rest of the method uses 8 spaces (or a proper indentation level). This should be aligned with the rest of the method.

Suggested change

total = 0

total = 0

Copilot · 2025-12-03T17:36:27Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+        done = self._convert_sup_to_xref(sups, sup_values, ids, numeric_labels, "fn")
+        if done:
+            return data
+


Trailing whitespace detected on this line. Please remove the extra space(s) at the end of line 693.

Suggested change

Copilot · 2025-12-03T17:36:27Z

scielo_classic_website/spsxml/sps_xml_pipes.py

+class XMLSupToXrefPipe(plumber.Pipe):
+    """
+    Transforma elementos <sup> com conteúdo numérico em <xref> quando eles
+    representam referências bibliográficas (bibr) ou notas de rodapé (fn).
+
+    Critérios de análise:
+    - sup deve ter conteúdo numérico
+    - Verifica se não existem xref[@ref-type='bibr'] (evita duplicação)
+    - Compara valores numéricos com ref/label e fn/label existentes
+    - Prioriza bibr sobre fn quando há ambiguidade
+    - Na dúvida, mantém como sup
+    """
+
+    def transform(self, data):
+        raw, xml = data
+
+        sups = xml.xpath(".//sup")
+        if not sups:
+            return data
+
+        sup_values = set()
+        for sup in list(sups):
+            if sup.find("xref") is not None:
+                continue
+            parent = sup.getparent()
+            if parent.tag == "xref":
+                continue
+            text = "".join(sup.itertext()).strip()
+            if text and text.isdigit():
+                sup_values.add(text)
+        if not sup_values:
+            return data
+
+        ids, numeric_labels = self.get_ids_and_labels(xml, "ref", "mixed-citation")
+        done = self._convert_sup_to_xref(sups, sup_values, ids, numeric_labels, "bibr")
+        if done:
+            return data
+
+        ids, numeric_labels = self.get_ids_and_labels(xml, "fn")
+        done = self._convert_sup_to_xref(sups, sup_values, ids, numeric_labels, "fn")
+        if done:
+            return data
+
+        return data


Missing test coverage: The new XMLSupToXrefPipe class lacks test coverage. Given that the test file tests/test_sps_xml_pipes.py contains tests for other pipe classes like XMLFontFaceSymbolPipe, tests should be added for this new pipe to ensure:

Correct conversion of numeric  elements to <xref ref-type="bibr">

Correct conversion of numeric  elements to <xref ref-type="fn">

Elements that should not be converted remain unchanged (non-numeric, already wrapped, ambiguous cases)

Edge cases like missing labels, missing IDs, etc.

- Implements transformation of elements with numeric content to <xref> when they represent bibliographic references (bibr) or footnotes (fn) - Analyzes sup elements to identify numeric references - Extracts numeric labels from ref and fn elements for comparison - Includes helper method to extract numeric labels from mixed-citation - Prevents duplication by checking existing xref elements - Prioritizes bibr over fn when ambiguous, maintains sup when uncertain - Added pipe to XML processing pipeline

Copilot AI review requested due to automatic review settings December 3, 2025 13:32

Copilot started reviewing on behalf of robertatakenaka December 3, 2025 13:32 View session

Copilot finished reviewing on behalf of robertatakenaka December 3, 2025 13:34

Copilot AI reviewed Dec 3, 2025

View reviewed changes

robertatakenaka force-pushed the converte_sup_para_xref_bibr_ou_xref_fn branch from 5c6ae59 to f70da64 Compare December 3, 2025 13:40

robertatakenaka requested a review from Copilot December 3, 2025 13:40

Copilot started reviewing on behalf of robertatakenaka December 3, 2025 13:40 View session

Copilot finished reviewing on behalf of robertatakenaka December 3, 2025 13:44

Copilot AI reviewed Dec 3, 2025

View reviewed changes

robertatakenaka force-pushed the converte_sup_para_xref_bibr_ou_xref_fn branch from f70da64 to a04a81f Compare December 3, 2025 14:28

robertatakenaka requested a review from Copilot December 3, 2025 14:29

Copilot started reviewing on behalf of robertatakenaka December 3, 2025 14:30 View session

Copilot finished reviewing on behalf of robertatakenaka December 3, 2025 14:32

Copilot AI reviewed Dec 3, 2025

View reviewed changes

robertatakenaka force-pushed the converte_sup_para_xref_bibr_ou_xref_fn branch from a04a81f to c3d6f55 Compare December 3, 2025 17:30

robertatakenaka requested a review from Copilot December 3, 2025 17:31

Copilot started reviewing on behalf of robertatakenaka December 3, 2025 17:31 View session

Copilot finished reviewing on behalf of robertatakenaka December 3, 2025 17:35

Copilot AI reviewed Dec 3, 2025

View reviewed changes

robertatakenaka force-pushed the converte_sup_para_xref_bibr_ou_xref_fn branch from c3d6f55 to 6342888 Compare December 3, 2025 17:41

robertatakenaka merged commit e491871 into scieloorg:main Dec 3, 2025
1 check passed

	str: O label numérico extraído ou string vazia se não encontrado
	str or None: O label numérico extraído ou None se não encontrado

Add XMLSupToXrefPipe to convert numeric sup elements to xref #129

Add XMLSupToXrefPipe to convert numeric sup elements to xref #129

Uh oh!

Conversation

robertatakenaka commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adiciona XMLSupToXrefPipe para converter elementos sup numéricos em xref

Descrição

Motivação

Alterações implementadas

Impacto

Como testar

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Issues

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

robertatakenaka commented Dec 3, 2025 •

edited

Loading