Replace accented character in unknown charset with unicode equivalent (#10251) #690

manics · 2013-02-03T20:16:05Z

To test: find a system where ./build.py build-all-dev fails in the test-compile target due to a charset encoding error. This should fix it.

… (#10251)

mtbc · 2013-02-04T09:18:08Z

components/blitz/test/ome/services/blitz/test/utests/ManagedRepositoryITest.java

\u00F6 is also an option

As long as we're consistent everywhere.

Hmm, given how easy it seems to be to mess up encodings in source and how different viewing applications seem to cope differently (at least on the Mac), maybe we should be using escaped codepoints everywhere, yes. What do you think? I could file a ticket about fixing the other instances and fix the others I notice in the meantime. (I had already fixed mine in mtbc@7b77a74 -- I assume there's some convenient tool into which one can paste the strings to get their escaped equivalents.)

manics · 2013-02-04T14:00:24Z

maybe we should be using escaped codepoints everywhere, yes. What do you think? I could file a ticket about fixing the other instances and fix the others I notice in the meantime.

That sounds like the most foolproof option, unless we add localisation for non-ascii languages. I've changed ö to \u00F6

mtbc · 2013-02-04T14:11:52Z

Great, good to merge, thanks.

mtbc · 2013-02-04T14:28:12Z

Filed http://trac.openmicroscopy.org.uk/ome/ticket/10288

joshmoore · 2013-02-04T14:43:03Z

Would there be a way to detect (via setting a property in ant, etc) that non-UTF-8 extended characters/encodings are in a file? If so, I'm personally for our allowing UTF-8 to be used directly. Java is UTF8 safe and so having them in the files makes sense. But if we can't do it carefully, then I'll have to learn to live with escapes.

/cc @jburel, thoughts?

mtbc · 2013-02-04T14:52:52Z

Yes, if we could rely on our viewers and editors not to do something dumb, it'd be great to omit escapes, but I think first we'd have to figure out how the mistakes are happening and issue guidance to prevent further occurrences: for instance, perhaps by changing the text encoding setting in Eclipse's General -> Workspace preferences setting, which starts out as MacRoman.

manics · 2013-02-04T15:18:09Z

You could always run this over every file during the build (which would help catch anything in non-Java files too): https://gist.github.com/4707338

jburel · 2013-02-04T21:24:57Z

I have used encoding like \u2103 for (DEGREE CELSIUS). I am all for unifying and use UTF-8. We will need to add info to our documentation of what is expected.

jburel · 2013-02-04T21:26:13Z

The following could be used as a ref http://www.fileformat.info/info/unicode/index.htm

mtbc · 2013-02-05T08:16:34Z

Documentation: absolutely. If we decide to use unescaped characters, then once other developers see that then they may be more likely to attempt it themselves! (-:

This reverts commit 4af0f14.

manics · 2013-02-05T17:10:16Z

Switched back to ö

mtbc · 2013-02-06T08:36:03Z

Heh, as there is diverse opinion, still good to merge. (-:

joshmoore · 2013-02-08T14:34:56Z

@manics: ?

manics · 2013-02-08T14:39:17Z

Hmm. I definitely didn't mean to close this.... unless deleting a branch closes it.

mtbc · 2013-02-08T14:55:11Z

It does. (Also, pushing extra commits to it makes them appear on the PR too, but you'd probably already noticed that.)

joshmoore · 2013-02-08T14:59:11Z

Yup, knew both of those. Thanks for re-opening.

Small tweaks, partly caused by comments on PRs ome#669, ome#690.

manics · 2013-02-13T09:09:13Z

This will go into the rebased version of #729.

Replace accented character in unknown charset with unicode equivalent…

08c0721

… (#10251)

mtbc reviewed Feb 4, 2013
View reviewed changes

Use \u escapes instead of utf8

4af0f14

Revert "Use \u escapes instead of utf8"

61f61b8

This reverts commit 4af0f14.

manics closed this Feb 8, 2013

manics reopened this Feb 8, 2013

mtbc added a commit to mtbc/openmicroscopy that referenced this pull request Feb 11, 2013

Minor comment and string fixes.

27d1403

Small tweaks, partly caused by comments on PRs ome#669, ome#690.

This was referenced Feb 11, 2013

Minor comment and string fixes. #721

Closed

Add note about UTF-8 Java source. ome/omero-documentation#244

Closed

Add note about Java UTF-8 source in Insight. ome/omero-documentation#245

Merged

manics closed this Feb 13, 2013

Replace accented character in unknown charset with unicode equivalent (#10251) #690

Replace accented character in unknown charset with unicode equivalent (#10251) #690

Uh oh!

Conversation

manics commented Feb 3, 2013

Uh oh!

mtbc Feb 4, 2013

Choose a reason for hiding this comment

Uh oh!

manics Feb 4, 2013

Choose a reason for hiding this comment

Uh oh!

mtbc Feb 4, 2013

Choose a reason for hiding this comment

Uh oh!

manics commented Feb 4, 2013

Uh oh!

mtbc commented Feb 4, 2013

Uh oh!

mtbc commented Feb 4, 2013

Uh oh!

joshmoore commented Feb 4, 2013

Uh oh!

mtbc commented Feb 4, 2013

Uh oh!

manics commented Feb 4, 2013

Uh oh!

jburel commented Feb 4, 2013

Uh oh!

jburel commented Feb 4, 2013

Uh oh!

mtbc commented Feb 5, 2013

Uh oh!

manics commented Feb 5, 2013

Uh oh!

mtbc commented Feb 6, 2013

Uh oh!

joshmoore commented Feb 8, 2013

Uh oh!

manics commented Feb 8, 2013

Uh oh!

mtbc commented Feb 8, 2013

Uh oh!

joshmoore commented Feb 8, 2013

Uh oh!

manics commented Feb 13, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants