refactor: melhora validação de dígitos e valores nulos na criação de … #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

robertatakenaka merged 1 commit into scieloorg:main from robertatakenaka:adiciona_tratamento_de_erro_para_dados_do_front_com_formato_incorreto

Jan 30, 2026

Member

robertatakenaka commented Jan 30, 2026 •

edited

Loading

…elementos de data

O que esse PR faz?

Este PR implementa melhorias na conversão de strings de data para elementos XML e aumenta a resiliência do pipeline de metadados de histórico. As principais alterações são:

Validação rigorosa de datas: A função _create_date_element agora valida se os valores são numéricos e interrompe o processamento caso encontre zeros ou caracteres não numéricos, evitando a criação de elementos XML inválidos.
Tratamento de exceções: Adiciona um bloco try-except no XMLArticleMetaHistoryPipe para garantir que falhas no processamento de uma data específica não interrompam todo o pipeline, registrando o erro detalhadamente no objeto raw.exceptions.

Onde a revisão poderia começar?

A revisão deve começar pelo arquivo:
scielo_classic_website/spsxml/sps_xml_article_meta.py

Inicie pela função _create_date_element (linha 64) para entender a nova lógica de validação e depois siga para a classe XMLArticleMetaHistoryPipe (linha 523).

Como este poderia ser testado manualmente?

Execute o processo de conversão SPS-XML utilizando uma base de dados que contenha datas malformadas (ex: datas com strings, espaços ou apenas zeros).
Verifique se o processo termina sem lançar exceções fatais.
Inspecione o objeto de saída e confirme se os erros foram capturados na lista raw.exceptions.
Garanta que as datas válidas continuem sendo geradas corretamente no XML final.

Algum cenário de contexto que queira dar?

Em bases de dados legadas, é comum encontrar campos de data com preenchimento inconsistente (ex: "00000000" ou "2023 12"). A lógica anterior tentava converter esses valores, o que podia causar erros de conversão de tipo ou gerar XMLs semanticamente incorretos. Esta alteração torna o conversor mais defensivo contra dados ruidosos.

fixes
scieloorg/scms-upload#811

Screenshots

N/A (Alteração de lógica de backend)

Quais são tickets relevantes?

scieloorg/scms-upload#811

Referências

Documentação de Pipes e Plumber (arquitetura interna).
Padrões de validação XML para o padrão SPS (SciELO Publishing Schema).


          refactor: melhora validação de dígitos e valores nulos na criação de …

4049e7f

…elementos de data

Copilot AI review requested due to automatic review settings

January 30, 2026 15:28

Copilot started reviewing on behalf of robertatakenaka

January 30, 2026 15:28

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR refactors the date validation logic in the XML article metadata processing pipeline to handle malformed dates more robustly. The changes aim to prevent invalid XML generation when encountering dates with zeros or non-numeric characters in legacy databases.

Changes:

Modified _create_date_element function to validate date components using isdigit() and stop processing when encountering zeros or non-numeric values
Added try-except block in XMLArticleMetaHistoryPipe to catch and log exceptions during date processing without halting the entire pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scielo_classic_website/spsxml/sps_xml_article_meta.py

Comment on lines +67 to +73

+                      if not value.isdigit():
+                          break
+                      if int(value) == 0:
+                          break
+                      e = ET.Element(label)
+                      e.text = value
+                      date_element.append(e)

Copilot AI Jan 30, 2026

This PR introduces significant changes to date validation logic but no new tests are added to verify the behavior with malformed dates (e.g., "00000000", "20200200", "2023 12", or dates with non-numeric characters). The PR description specifically mentions these edge cases as the motivation for the changes. Add test cases to cover these scenarios and verify that the validation behaves as expected, especially given the critical logic changes identified in the _create_date_element function.

Copilot uses AI. Check for mistakes.

scielo_classic_website/spsxml/sps_xml_article_meta.py

Comment on lines +529 to +540

+                              try:
+                                  elem = _create_date_element("date", attributes, date_)
+                              except Exception as e:
+                                  raw.exceptions.append(
+                                      {
+                                          "pipe": "XMLArticleMetaHistoryPipe",
+                                          "error_type": str(type(e)),
+                                          "error_message": str(e),
+                                          "exc_traceback": traceback.format_exc(),
+                                      }
+                                  )
+                                  continue

Copilot AI Jan 30, 2026

The traceback module is used here but is not imported at the top of the file. This will cause a NameError when an exception occurs. Add "import traceback" to the imports section at the top of the file (similar to how it's done in sps_xml_refs.py:2).

Copilot uses AI. Check for mistakes.

scielo_classic_website/spsxml/sps_xml_article_meta.py

Comment on lines +68 to +70

+                          break
+                      if int(value) == 0:
+                          break

Copilot AI Jan 30, 2026

The validation logic now uses "break" instead of the implicit "continue" from the previous code, which fundamentally changes the behavior. When a zero or non-digit is encountered, the break statement stops all further date components from being processed.

For example, with date "20200200" (valid year and month, but day is "00"):

Old behavior: would skip day but still add month and year elements
New behavior: breaks at day "00" and adds no elements at all

This means valid year and month values will be lost if the day is zero or invalid. Consider using "continue" instead of "break" to preserve the old behavior of skipping invalid components while still processing valid ones. Alternatively, if the intent is to process in order of significance, the iteration order should be changed to (year, month, day) so that more significant components are processed first.

Suggested change

      
                        break
          
                    if int(value) == 0:
          
                        break
          
                        continue
          
                    if int(value) == 0:
          
                        continue

Copilot uses AI. Check for mistakes.

scielo_classic_website/spsxml/sps_xml_article_meta.py

Comment on lines 64 to +67

                   year, month, day = date_text[:4], date_text[4:6], date_text[6:]
                   labels = ("day", "month", "year")
                   for label, value in zip(labels, (day, month, year)):
-                      if int(value) != 0:
-                          e = ET.Element(label)
-                          e.text = value
-                          date_element.append(e)
+                      if not value.isdigit():

Copilot AI Jan 30, 2026

The code doesn't validate that date_text has the expected length (8 characters) before slicing. If date_text is shorter than 8 characters, the slicing operations will produce empty strings or truncated values. For example, if date_text is "2020", then year="2020", month="", day="". The empty string will fail the isdigit() check and break, but this could lead to silent failures. Consider adding a length check after the line 58 check, e.g., "if len(date_text) != 8: return" to make the validation more explicit and catch malformed input early.

Copilot uses AI. Check for mistakes.

scielo_classic_website/spsxml/sps_xml_article_meta.py

+                                      }
+                                  )
+                                  continue
                               history.append(elem)

Copilot AI Jan 30, 2026

After the try-except block, the code attempts to append elem to history without checking if elem is None. The _create_date_element function can return None in two cases: when date_text is falsy (line 58-59), or when the date_element has no child elements added due to validation failures. If _create_date_element returns None and no exception is raised, this will attempt to append None to the history element, which could cause issues. Add a check "if elem is not None:" before the append operation on line 541.

Suggested change

      
                            history.append(elem)
          
                            if elem is not None:
          
                                history.append(elem)

Copilot uses AI. Check for mistakes.

robertatakenaka merged commit 2fde8a7 into scieloorg:main

7 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet