-
-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CJK chars encoding error with HTML output #2301
Comments
Thanks for reporting, I'll wait for a small example of the problem and workaround that you mentioned, since you understand it better. |
Sorry for the belated reply. Example journal file follows:
I also tried with the hanja(kanji, Chinese character) input and the same thing happened. Adding |
On my system (macos 15.1), all of the above display properly in safari, brave and firefox. I probably have a system locale that supports UTF-8 decoding. Can you tell us more about your OS and system locale/language setting ? |
Ah, you said: Fedora GNU/Linux. And what does |
en_US.UTF-8
<SNIP>
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
en_ZA
<SNIP> |
To be browser specific, I used Zen browser v1.0.1-a17 (Firefox 132.0) and Brave browser v1.73.97 (Chromium 131.0.6778.108), both on the same Fedora machine. Also tested this with a Windows 10 machine(set to use both English and Korean, display language is English), same with Zen browser and Brave browser and the exact same encoding error happens. |
You seem to have everything set up correctly for command line use at least. All I can think of is that when you start web browsers from the GUI they are not seeing the same system locale. In that case starting the browser from your terminal might make a difference. But the Windows test makes this seem unlikely. So I can't reproduce at the moment. Ideas, anyone ?? |
@thielema, have you noticed this in any of your HTML reports ? |
@lewisleedev, when I trimmed your comment just now I noticed that the $LANG value and the installed locale have a different spelling. I remember that causing problems in my past testing. (Related: https://hledger.org/dev/hledger.html#troubleshooting) |
( |
That seem to (also) affect terminal output? I have no problem with terminal output(using Wezterm, if it matters). Besides, considering this also happens on my Windows machine, I don't think locale setup is the issue here, at least Also tested with an Android machine, en-US, Firefox 133.0.3. Same thing happens. w3m works fine so system wide encoding may not be the issue... It's strange really, perhaps it's only MacOS that's working properly? |
It works correctly for me in Firefox and Chrome on Windows, too. Nevertheless, I agree with @lewisleedev that hledger should add |
That does make sense and sounds simple to fix. I’ll do it soon unless someone beats me to it. It must be a default on Mac or something.
|
I actually figured it out thanks to that note. All this time I was using |
@lewisleedev great! Does that mean we don't need to do anything in hledger ? |
I would still suggest adding |
For general correctness of reports' HTML output.
I would have to personally disagree with @Aankhen and say that I think browsers having uft8 as their default is definitely something that we can rely on. Besides, if for some reason user needs different encoding than utf8, this will be an issue. |
Well now I'm glad I had all these local IT hassles because I was just about to push the charset UTF-8 meta tag in printHtml. I see the point that our reports' HTML output is a HTML fragment, not a full HTML document. And "if it ain't broke don't fix it". @thielema, I'm guessing you'll agree with @lewisleedev here ? (Unrelated: I didn't find obvious users of Hledger.Write.Html.Blaze.printHtml, should it be removed ?) |
Example of our HTML output with the meta tag added, just to make this concrete:
|
On Sat, 28 Dec 2024, Simon Michael wrote:
(Unrelated: I didn't find obvious users of
Hledger.Write.Html.Blaze.printHtml, should it be removed ?)
I added it for the proposed HTML export in hledger-web, which is based on
blaze-html.
|
I understand what you’re saying, but hledger specifies that journal files are in UTF-8 and can only ever produce UTF-8 (modulo bugs or errors). Putting the HTML output in a non–UTF-8 document verbatim doesn’t make sense, which is why I’d say |
On Sat, 28 Dec 2024, Lewis Lee wrote:
I would have to personally disagree with @Aankhen and say that meta tag
in a snippet isn't the best idea. It's a snippet. It cqn either go in a
file unaltered or can be inside another HTML page. Either way, user
should be able to make their own decisions regarding the encoding
without having to remove the meta tag.
I also encountered this question when doing FODS export. But since I
observed that HTML export actually exports only the table and no HTML
header, I did not add charset header to printHtml.
Hledger documentation should state:
* exported HTML is the bare HTML table, no standalone HTML, no HTML
headers, thus no encoding information (btw. pandoc also requires
--standalone option for some output formats)
* HTML is exported in local encoding (not always UTF-8) (This is why FODS
export declares the local encoding as XML encoding.)
However, I added HTML view of balance reports experimentally to
hledger-web and there I have to generate standalone HTML.
I think browsers having uft8 as their default is definitely something
that we can rely on. Besides, if for some reason user needs different
encoding than utf8, this will be an issue.
I think default encoding in HTTP was Latin-(1?) and the WWW default cannot
be simply changed.
|
HTML has some problems with CJK characters, and the simplest solution is to add in the tag. hledger by default does not include this encoding tag, so CJK characters (Korean, in my case) appear broken when rendered without adding the snippet.
Problem is that this snippet must be placed within the tag and the proper character encoding declaration should be set by the HTML document itself.
I'm still creating an issue for this problem since it affects the proper display of CJK characters in the generated HTML output but I also think that adding HTML meta tags in hledger's output might not be the right approach as a snippet of a document shouldn't really change the encoding for the whole document.
The text was updated successfully, but these errors were encountered: