-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding issues with Eclipse WTP HTML format special chars #545
Comments
Just to clarify, after formatting with mvn spotless:apply the HTML5 file is changed to
|
My untested suspicion is that this bug is specific to the Eclipse WTP formatter. e.g. you would not see this if you used the Lines 60 to 70 in 8aab108
Since we're only passing Strings back and forth, and java Strings are always unicode, then it shouldn't matter, but it wouldn't shock me if there is Eclipse code that roundtrips through binary while assuming an old charset unless you explicitly set it. But it's easy for us to make a test case that confirms whether or not this is Eclipse-WTP specific or not, and if it is, then there's not that many places to look for a fix. @fvgh does this seem plausible to you? |
Actually I currently use the replace step after the Eclipse WTP to put the special chars back in as a workaround. So that does not have an encoding problem, only the Eclipse WTP. Sadly the workaround leads to problems with line length, as temporary a line can go over the max line length and is wrapped when it should not due to the 2 chars for 1 special char. I also tested Eclipse WTP with XML, that is fine and leaves the üöä as they are. |
Java uses internally UTF-16 (originally it used UCS-2, but to my understanding, they switched). When opening the modified file, be aware that neither *NIX users have the tendency not to care about the BOM. If any application sees an extension code, it look's up UTF extensions anyway. Could you provide a HEX dump of the output file? I would like to check whether a BOM got lost or (as I expect) the output is a valid translation of the input without a BOM. I expect that you switched all your IDE's to use UTF-8 per default, right? If not, I recommend it when you want to work with UTF. I had trouble in the past that a developer (using Jet-Brains editor) messed up a UTF-8 file, since there was no BOM. |
@source-knights Sorry, just found a mistake in my previous comment. I would like to see the HEX of input and output. |
@nedtwigg I added quickly a test on WTP side to deal with UTF-8 characters. There were no problems. But I must admit, I am not 100% sure that we handle a BOM correctly. Currently the reading/writing just passes the byte sequence on to the formatters. Not sure whether this is a good idea. |
@source-knights I may have found the problem. Could you use in the meantime the Java system property |
I can confirm all fine when I use Thx for looking into this so quickly. Is there is anything I can do to help just shout |
Took the liberty to delete a few of my previous comments regarding error analysis. Was in a hurry and lacking caffeine. The comments were not correct Spotless framework assures a conversion form the specified format to the internal format However @nedtwigg was right to suspect my WTP implementation. It always needs to use |
Thxalot for the quick fix. Now I just need typescript checks as a maven plugin... Will look into that later, maybe I can code it :) |
Fixed in gradle |
Hi, I am using the maven spotless version 1.28.0 and Eclipe WTP 4.13.0 (but tried previous versions as well). I'm on windows 10. Tried 3 different developer machines, all showing same issue.
Whenever I use Eclipse WTP / Spotless to format HTML 5 files, the german special chars as in üöäÜÖÄß and the Euro sign € are changed to "üöäÜÖÄ߀". I understand that is actually the binary encoding of these chars if you would wrongly look at the file with non UTF-8 encoding. But as I use UTF-8 in all editors and in the HTML itself and in the spotless config, I don't understand why the files are changed to that by the formatter.
I managed to reprocude this in a simple maven project with only below pom.xml and the pasted HTML file.
Sample HTML5 file (which I save as UTF-8 in IDE, Eclipse, IntelliJ or even Notepad++ all leading to same problem).
My pom
Does anyone has an idea what I am doing wrong? All these specials chars are proper UTF-8 chars and allowed in HTML5, so they should not be changed.
Thxalot and stay healthy
The text was updated successfully, but these errors were encountered: