-
Notifications
You must be signed in to change notification settings - Fork 89
Open
Description
When translating XML documents, I have problems with numbers being dropped in the translated result, especially in tables.
Input:
<row>
<entry>
<para>22</para>
</entry>
<entry>
<para>Ventilations- och defrostermunstycken SB styrhytt</para>
</entry>
</row>
<row>
<entry>
<para>23</para>
</entry>
<entry>
<para>Luft från intagsaggregatet till ventilationsmunstycken i styrhytt</para>
</entry>
</row>
<row>
<entry>
<para>1100</para>
</entry>
<entry>
<para>Luft från intagsaggregatet till ventilationsmunstycken i styrhytt</para>
</entry>
</row>
In the table below, you can see how numbers are being dropped from the translated content. This is serious as it is not always easy to spot.
<row>
<entry>
<para>22</para>
</entry>
<entry>
<para>Ventilation and defroster nozzles SB control cabin</para>
</entry>
</row>
<row>
<entry>
<para>2</para>
</entry>
<entry>
<para>Air from intake unit to ventilation nozzles in control cabin</para>
</entry>
</row>
<row>
<entry>
<para>11</para>
</entry>
<entry>
<para>Air from intake unit to ventilation nozzles in wheelhouse</para>
</entry>
</row>
The code I am using:
result = deepl_client.translate_text(
text,
tag_handling="xml",
source_lang="SV",
target_lang="EN-GB",
model_type="prefer_quality_optimized",
non_splitting_tags="div",
split_sentences="nonewlines"
)
I have a cumbersome workaround for this particular issue, but it's not safe. Pre-processing the content input file as a string with regular expressions that add a 'fake' element around particular numbers and then use the 'ignore_tags' option on this fake element.
This will however not find all all instances of the problem.
Is there anything else that can be done?
Metadata
Metadata
Assignees
Labels
No labels