Add note about Java UTF-8 source in Insight. #245

mtbc · 2013-02-11T13:41:14Z

ome/openmicroscopy#721 changes some Insight code to use UTF-8 literals in response to comments in ome/openmicroscopy#690

joshmoore · 2013-02-11T14:31:44Z

Haven't checked the output of the doc build, but I agree with the sentiment. @jburel, @manics?

sbesson · 2013-02-11T14:56:35Z

developers/Insight/Contributing.txt

Use :menuselection:Preferences --> General --> Workspace --> Text file encoding``

manics · 2013-02-11T15:48:03Z

Should this apply across the whole codebase, rather than just for Insight? Python and Javascript too? There's an incomplete page on development standards: http://www.openmicroscopy.org/site/support/omero4/developers/standards.html

mtbc · 2013-02-11T15:53:57Z

Yes, I was wondering that too. The phrasing of the development standards page is very much "this is what we were thinking at a certain point, we'll get around to deciding in due course" rather than an "up-to-date thinking". It does feel like this should go into that page, but maybe as part of a larger overhaul?

hflynn · 2013-02-11T16:13:15Z

The developer standards page is on the review & update list (if one of you wants to volunteer, be my guest ;)). There is also https://www.openmicroscopy.org/site/support/omero4/developers/policies.html which may be an appropriate place for this possibly?

jburel · 2013-02-11T19:15:43Z

if we add it to the policies page, it will get lost. Adding to the code template will probably be better..
@joshmoore, OMERO 5 is the breaking changes that we could use to apply the template across the code base.

joshmoore · 2013-02-11T20:20:14Z

@jburel, I disagree. Anything that's across the whole code base should be on both, otherwise rebasing will be a nightmare.

joshmoore · 2013-02-11T20:37:28Z

Discussing where this should go: I agree about standards needing work, but that's likely a better place than policies. Another option might be splitting testing into a whole section about Eclipse and other tools.

jburel · 2013-02-12T07:05:50Z

@joshmoore i think we were talking about 2 different things. this pr is not the place for that discussion.

joshmoore · 2013-02-12T08:06:01Z

@jburel, assuming you're talking about the breaking changes in OMERO 5, agreed.

So, focusing on the UTF-8 section the opinions that are mentioned here are:

policies - likely to get lost
standards - needs general work
testing --> tools (?) IDEs(?)

Did I miss anything?

manics · 2013-02-12T09:38:14Z

How about put it in standards so we don't forget, but leave the tidy up for later.

joshmoore · 2013-02-12T10:03:49Z

That'd be my vote.

jburel · 2013-02-12T13:35:04Z

sounds fine. Sorry catching up with comments. (I did not check e-mails)

manics · 2013-02-14T12:36:03Z

Did we decide on whether this should be Java only or across all languages?

mtbc · 2013-02-14T12:48:45Z

Not that I noticed. Perhaps I should add that question to http://trac.openmicroscopy.org.uk/ome/ticket/10288

joshmoore · 2013-02-14T12:52:31Z

@manics, I'd vote yes.

joshmoore · 2013-02-14T13:41:57Z

Sorry, to all languages. Python certainly makes sense. I don't know if there's any problem with doing UTF-8 in C++ land. @rleigh-dundee / @JesseCorrington ?

mtbc · 2013-02-14T13:46:34Z

Also, just as I tweaked the javac options in the ant build files in ome/openmicroscopy#729, it would be good to make sure that whatever analogous options that other compilers/interpreters need to be set for source file encoding are indeed getting set in the build scripts before we tell people to go ahead and assume UTF-8. (What those might be, I don't yet know.)

ghost · 2013-02-14T14:18:20Z

On 14/02/13 13:41, Josh Moore wrote:

Sorry, to all languages. Python certainly makes sense. I don't know if
there's any problem with doing UTF-8 in C++ land.

Certainly works well in GCC-land. UTF-8 is its default input encoding,
and internal/execution charset for narrow strings (with UTF-32 for wide
strings). I've been using UTF-8 in string literals etc. for years. It
all Just Works.

Except on Windows... Other compilers might be more picky, and this
includes MSVC. Stackoverflow says that MSVC2008 will process UTF-8 if
it finds a Unicode BOM into UTF16 internally. Yuck! But GCC won't like
that. And if you don't have a BOM it passes it through, but is unaware
of it--it is apparently reliant upon the locale you build in.

#pragma execution_character_set("utf-8")
exists in MSVC2008

http://www.utf8everywhere.org/
has some information about MSVC. Looks like the C++ implementation on
Windows is severely lacking. This will probably be a significant pain
point... you can't even open a file with unicode name using the C++
standard API on Windows; you have to use nonstandard extensions.

So looks like it's definitely possible. An alternative for Windows
might be transcoding to UTF-16 at compile time?

Roger

The University of Dundee is a registered Scottish Charity, No: SC015096

ghost · 2013-02-14T15:44:01Z

On 14/02/13 14:17, Roger Leigh wrote:

On 14/02/13 13:41, Josh Moore wrote:

Sorry, to all languages. Python certainly makes sense. I don't know if
there's any problem with doing UTF-8 in C++ land.

Certainly works well in GCC-land. UTF-8 is its default input encoding,
and internal/execution charset for narrow strings (with UTF-32 for wide
strings). I've been using UTF-8 in string literals etc. for years. It
all Just Works.

Except on Windows... Other compilers might be more picky, and this
includes MSVC. Stackoverflow says that MSVC2008 will process UTF-8 if it
finds a Unicode BOM into UTF16 internally. Yuck! But GCC won't like
that. And if you don't have a BOM it passes it through, but is unaware
of it--it is apparently reliant upon the locale you build in.

One other note: C++11 introduces u8"", u"" and U"" for UTF-8, UTF-16 and
UTF-32 string literals:

// Unicode literals
char *utf8 = u8"UTF-8 string \u2500";
char16_t *utf16 = u"UTF-8 string \u2500";
char32_t *utf32 = U"UTF-32 string \u2500";

The [w]string classes have the appropriate ctors, etc. Not sure how
MSVC implement it, but it's at least a properly standardised way to
specify the input encoding of the strings, and represent them internally
as the appropriate type. Note these are independent of string/stream
width, so will work with both narrow and wide variants.

Roger

The University of Dundee is a registered Scottish Charity, No: SC015096

joshmoore · 2013-02-18T09:08:40Z

@mtbc, it seems if we're going to add this to the docs, let's go ahead and have it defined across the board. If you're not comfortable writing the individual sections, might be worth soliciting the text from @rleigh-dundee, @manics, et al.

mtbc · 2013-02-18T09:27:00Z

Yes, definitely will need to solicit text for C++, Python, whatever else. (I don't know to what extent the "contributing to Insight" page needs those, although the more general page does.)

I also don't know enough about our build-time file generation to be sure if generation steps will be happily preserving UTF-8, if that's an issue? I don't know if UTF-8 may creep into model definition or ICE files.

manics · 2013-02-18T09:29:49Z

Python requires a source header comment:

#!/usr/bin/python
# -*- coding: <encoding name> -*-

which brings us back to @jburel's comment about templates and when to introduce them.

joshmoore · 2013-02-18T10:08:55Z

The Python header example definitely specifies that UTF-8 is the way to go, but we could add a comment to be even more explicit:

https://github.com/openmicroscopy/openmicroscopy/blob/develop/docs/headers.txt#L136

jburel · 2013-03-03T19:59:08Z

Unfortunately we did not take the time to discuss a plan last week. Maybe add @joshmoore's suggestion i.e. be more explicit and schedule a discussion this week

mtbc · 2013-03-22T16:56:22Z

Following discussion in recent standup, here's a revised commit.

manics · 2013-03-25T11:28:32Z

Good to merge

joshmoore · 2013-03-25T12:17:55Z

👍

Add note about Java UTF-8 source in Insight.

This was referenced Feb 11, 2013

Add note about UTF-8 Java source. #244

Closed

Minor comment and string fixes. ome/openmicroscopy#721

Closed

sbesson reviewed Feb 11, 2013
View reviewed changes

developers/Insight/Contributing.txt Outdated

Copy link

Member

sbesson Feb 11, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use :menuselection:Preferences --> General --> Workspace --> Text file encoding``

mtbc mentioned this pull request Mar 18, 2013

Add UTF-8 encoding header to Python scripts. ome/openmicroscopy#904

Merged

Add note about UTF-8 source code files.

d39d55b

joshmoore added a commit that referenced this pull request Mar 25, 2013

Merge pull request #245 from mtbc/encoding

e4c65d4

Add note about Java UTF-8 source in Insight.

joshmoore merged commit e4c65d4 into ome:dev_4_4 Mar 25, 2013

mtbc deleted the encoding branch March 25, 2013 13:10

mtbc mentioned this pull request Mar 25, 2013

Add note about UTF-8 source code files. (rebase) #300

Merged

Add note about Java UTF-8 source in Insight. #245

Add note about Java UTF-8 source in Insight. #245

Uh oh!

Conversation

mtbc commented Feb 11, 2013

Uh oh!

joshmoore commented Feb 11, 2013

Uh oh!

sbesson Feb 11, 2013

Choose a reason for hiding this comment

Uh oh!

manics commented Feb 11, 2013

Uh oh!

mtbc commented Feb 11, 2013

Uh oh!

hflynn commented Feb 11, 2013

Uh oh!

jburel commented Feb 11, 2013

Uh oh!

joshmoore commented Feb 11, 2013

Uh oh!

joshmoore commented Feb 11, 2013

Uh oh!

jburel commented Feb 12, 2013

Uh oh!

joshmoore commented Feb 12, 2013

Uh oh!

manics commented Feb 12, 2013

Uh oh!

joshmoore commented Feb 12, 2013

Uh oh!

jburel commented Feb 12, 2013

Uh oh!

manics commented Feb 14, 2013

Uh oh!

mtbc commented Feb 14, 2013

Uh oh!

joshmoore commented Feb 14, 2013

Uh oh!

joshmoore commented Feb 14, 2013

Uh oh!

mtbc commented Feb 14, 2013

Uh oh!

ghost commented Feb 14, 2013

Uh oh!

ghost commented Feb 14, 2013

Uh oh!

joshmoore commented Feb 18, 2013

Uh oh!

mtbc commented Feb 18, 2013

Uh oh!

manics commented Feb 18, 2013

Uh oh!

joshmoore commented Feb 18, 2013

Uh oh!

jburel commented Mar 3, 2013

Uh oh!

mtbc commented Mar 22, 2013

Uh oh!

manics commented Mar 25, 2013

Uh oh!

joshmoore commented Mar 25, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants