-
Notifications
You must be signed in to change notification settings - Fork 103
Replace accented character in unknown charset with unicode equivalent (#10251) #690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\u00F6 is also an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we're consistent everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, given how easy it seems to be to mess up encodings in source and how different viewing applications seem to cope differently (at least on the Mac), maybe we should be using escaped codepoints everywhere, yes. What do you think? I could file a ticket about fixing the other instances and fix the others I notice in the meantime. (I had already fixed mine in mtbc@7b77a74 -- I assume there's some convenient tool into which one can paste the strings to get their escaped equivalents.)
That sounds like the most foolproof option, unless we add localisation for non-ascii languages. I've changed |
|
Great, good to merge, thanks. |
|
Would there be a way to detect (via setting a property in ant, etc) that non-UTF-8 extended characters/encodings are in a file? If so, I'm personally for our allowing UTF-8 to be used directly. Java is UTF8 safe and so having them in the files makes sense. But if we can't do it carefully, then I'll have to learn to live with escapes. /cc @jburel, thoughts? |
|
Yes, if we could rely on our viewers and editors not to do something dumb, it'd be great to omit escapes, but I think first we'd have to figure out how the mistakes are happening and issue guidance to prevent further occurrences: for instance, perhaps by changing the text encoding setting in Eclipse's General -> Workspace preferences setting, which starts out as MacRoman. |
|
You could always run this over every file during the build (which would help catch anything in non-Java files too): https://gist.github.com/4707338 |
|
I have used encoding like |
|
The following could be used as a ref http://www.fileformat.info/info/unicode/index.htm |
|
Documentation: absolutely. If we decide to use unescaped characters, then once other developers see that then they may be more likely to attempt it themselves! (-: |
This reverts commit 4af0f14.
|
Switched back to |
|
Heh, as there is diverse opinion, still good to merge. (-: |
|
@manics: ? |
|
Hmm. I definitely didn't mean to close this.... unless deleting a branch closes it. |
|
It does. (Also, pushing extra commits to it makes them appear on the PR too, but you'd probably already noticed that.) |
|
Yup, knew both of those. Thanks for re-opening. |
|
This will go into the rebased version of #729. |
To test: find a system where
./build.py build-all-devfails in the test-compile target due to a charset encoding error. This should fix it.