-
Notifications
You must be signed in to change notification settings - Fork 3
Fix graphic position in tablewrap #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ef1790b
7da974a
a05e662
9cc5720
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -192,6 +192,15 @@ def analyze_xref(text: str = None, rid: str = None) -> Dict[str, Optional[str]]: | |||
| result["prefix"] = prefix | ||||
| result["number"] = number | ||||
| result["source"] = "text" | ||||
| else: | ||||
| ref_type_text, element_name_text, prefix, number = detect_from_text(text.split()[0]) | ||||
| if ref_type_text: | ||||
| result["ref_type"] = ref_type_text | ||||
| result["element_name"] = element_name_text | ||||
| result["prefix"] = prefix | ||||
| result["number"] = number | ||||
| result["source"] = "text" | ||||
|
|
||||
|
|
||||
|
||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |||||||||||||||
| detect_from_text, | ||||||||||||||||
| detect_element_type, | ||||||||||||||||
| detect_sec_type, | ||||||||||||||||
| detect_from_id, | ||||||||||||||||
| ) | ||||||||||||||||
| from scielo_classic_website.spsxml.detector_title_parent import identify_parent_by_title | ||||||||||||||||
| from scielo_classic_website.htmlbody.html_merger import ( | ||||||||||||||||
|
|
@@ -128,7 +129,7 @@ def convert_html_to_xml(document): | |||||||||||||||
| convert_html_to_xml_step_60_ahref_and_aname, | ||||||||||||||||
| convert_html_to_xml_step_70_complete_fig_and_tablewrap, | ||||||||||||||||
| convert_html_to_xml_step_80_fix_sec, | ||||||||||||||||
|
||||||||||||||||
| convert_html_to_xml_step_80_fix_sec, | |
| convert_html_to_xml_step_80_fix_sec, | |
| # NOTE: Step 90 (`convert_html_to_xml_step_90_complete_disp_formula`) is | |
| # temporarily disabled because it may incorrectly transform some <disp-formula> | |
| # elements and generate invalid SPS XML. Re-enable this step only after the | |
| # underlying issues are fixed and regression tests for complex formula markup | |
| # are in place and passing. |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactored ANamePipe logic now uses detect_from_id to determine element types, but the existing test only covers the case where the name doesn't match any pattern (expecting <div> as fallback). Add tests for cases where the name matches known patterns like 'f1' (should become <fig>), 't1' (should become <table-wrap>), and 'cuadro1' (should become <table-wrap>).
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of ET.strip_tags(root, 'STRIPTAG') in the rename_center method is not explained. Add a comment explaining why STRIPTAG elements need to be stripped at this point in the processing pipeline.
| center.tag = "title" | |
| center.tag = "title" | |
| # Remove os elementos marcados com STRIPTAG, usados apenas como marcadores | |
| # temporários para <center> vazios, para que essas tags artificiais não | |
| # apareçam no XML final. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
textis an empty string or contains only whitespace,text.split()will return an empty list, causing an IndexError when accessing index[0]. Add a check to ensure the split result is not empty before accessing the first element.