Closed
Description
First of all, thank you very much for this very useful program.
Describe the bug
When exporting a document to HTML with the --encoding
option, the output file is always in encoded in windows-1252.
This issue looks like this one (someone suggested an answer, but I don't know if it's relevant here): https://stackoverflow.com/q/34026716
To Reproduce
Here with UTF-8:
docto.exe -F input.rtf -T wdFormatHTML -O test.html -E 65001
The same behavior is encountered when running the command from Node.js (https://github.com/brrd/msoconvert).
Expected behavior
I would expect the file HTML file to be encoded in UTF-8, and its header to contain this meta:
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
Instead, the file is encoded in windows-1252 and the the header contains the following:
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
Additional context
- Please run the command with
-L 10
to provide verbose logging and paste that into your bug report.
docto.exe -F input.rtf -T wdFormatHTML -O test.html -E 65001 -L 10
[20210528 19:21:26 -]: [DEBUG] Log Level Set To:10
Loading ChooseConverter...
Parameter Count is 10
Converter:MS Word
[DEBUG] Log Level Set To:10
[INFO] Loading Configuration...
[DEBUG] Parameter Count is 10
[DEBUG] Input File is: C:\Users\Thomas\Desktop\input.rtf
[DEBUG] Type Integer is: 8
[INFO] Output file: C:\Users\Thomas\Desktop\test.html
[INFO] Log Level Set To:10
[DEBUG] Current Directory: C:\Users\Thomas\Desktop
[DEBUG] Ready to Execute
[DEBUG] Executing Conversion ...
[INFO] ExecuteConversion:C:\Users\Thomas\Desktop\input.rtf
[DEBUG] Version >= 14 Using Saveas2 Function
[INFO] File Converted: C:\Users\Thomas\Desktop\test.html
- Please also run docto.exe -v so I can see what version of Docto and Word you are running.
docto.exe -v
DocTo Version:1.03.30.54
OfficeApp Version:16
Source: https://github.com/tobya/DocTo/
- What OS: [e.g. Windows Server 2012]
Windows 10 Pro 20H2