Skip to content

Conversation

@mtbc
Copy link
Member

@mtbc mtbc commented Feb 11, 2013

ome/openmicroscopy#721 changes some Insight code to use UTF-8 literals in response to comments in ome/openmicroscopy#690

@joshmoore
Copy link
Member

Haven't checked the output of the doc build, but I agree with the sentiment. @jburel, @manics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use :menuselection:Preferences --> General --> Workspace --> Text file encoding``

@manics
Copy link
Member

manics commented Feb 11, 2013

Should this apply across the whole codebase, rather than just for Insight? Python and Javascript too? There's an incomplete page on development standards: http://www.openmicroscopy.org/site/support/omero4/developers/standards.html

@mtbc
Copy link
Member Author

mtbc commented Feb 11, 2013

Yes, I was wondering that too. The phrasing of the development standards page is very much "this is what we were thinking at a certain point, we'll get around to deciding in due course" rather than an "up-to-date thinking". It does feel like this should go into that page, but maybe as part of a larger overhaul?

@hflynn
Copy link
Contributor

hflynn commented Feb 11, 2013

The developer standards page is on the review & update list (if one of you wants to volunteer, be my guest ;)). There is also https://www.openmicroscopy.org/site/support/omero4/developers/policies.html which may be an appropriate place for this possibly?

@jburel
Copy link
Member

jburel commented Feb 11, 2013

if we add it to the policies page, it will get lost. Adding to the code template will probably be better..
@joshmoore, OMERO 5 is the breaking changes that we could use to apply the template across the code base.

@joshmoore
Copy link
Member

@jburel, I disagree. Anything that's across the whole code base should be on both, otherwise rebasing will be a nightmare.

@joshmoore
Copy link
Member

Discussing where this should go: I agree about standards needing work, but that's likely a better place than policies. Another option might be splitting testing into a whole section about Eclipse and other tools.

@jburel
Copy link
Member

jburel commented Feb 12, 2013

@joshmoore i think we were talking about 2 different things. this pr is not the place for that discussion.

@joshmoore
Copy link
Member

@jburel, assuming you're talking about the breaking changes in OMERO 5, agreed.

So, focusing on the UTF-8 section the opinions that are mentioned here are:

  • policies - likely to get lost
  • standards - needs general work
  • testing --> tools (?) IDEs(?)

Did I miss anything?

@manics
Copy link
Member

manics commented Feb 12, 2013

How about put it in standards so we don't forget, but leave the tidy up for later.

@joshmoore
Copy link
Member

That'd be my vote.

@jburel
Copy link
Member

jburel commented Feb 12, 2013

sounds fine. Sorry catching up with comments. (I did not check e-mails)

@manics
Copy link
Member

manics commented Feb 14, 2013

Did we decide on whether this should be Java only or across all languages?

@mtbc
Copy link
Member Author

mtbc commented Feb 14, 2013

Not that I noticed. Perhaps I should add that question to http://trac.openmicroscopy.org.uk/ome/ticket/10288

@joshmoore
Copy link
Member

@manics, I'd vote yes.

@joshmoore
Copy link
Member

Sorry, to all languages. Python certainly makes sense. I don't know if there's any problem with doing UTF-8 in C++ land. @rleigh-dundee / @JesseCorrington ?

@mtbc
Copy link
Member Author

mtbc commented Feb 14, 2013

Also, just as I tweaked the javac options in the ant build files in ome/openmicroscopy#729, it would be good to make sure that whatever analogous options that other compilers/interpreters need to be set for source file encoding are indeed getting set in the build scripts before we tell people to go ahead and assume UTF-8. (What those might be, I don't yet know.)

@ghost
Copy link

ghost commented Feb 14, 2013

On 14/02/13 13:41, Josh Moore wrote:

Sorry, to all languages. Python certainly makes sense. I don't know if
there's any problem with doing UTF-8 in C++ land.

Certainly works well in GCC-land. UTF-8 is its default input encoding,
and internal/execution charset for narrow strings (with UTF-32 for wide
strings). I've been using UTF-8 in string literals etc. for years. It
all Just Works.

Except on Windows... Other compilers might be more picky, and this
includes MSVC. Stackoverflow says that MSVC2008 will process UTF-8 if
it finds a Unicode BOM into UTF16 internally. Yuck! But GCC won't like
that. And if you don't have a BOM it passes it through, but is unaware
of it--it is apparently reliant upon the locale you build in.

#pragma execution_character_set("utf-8")
exists in MSVC2008

http://www.utf8everywhere.org/
has some information about MSVC. Looks like the C++ implementation on
Windows is severely lacking. This will probably be a significant pain
point... you can't even open a file with unicode name using the C++
standard API on Windows; you have to use nonstandard extensions.

So looks like it's definitely possible. An alternative for Windows
might be transcoding to UTF-16 at compile time?

Roger

The University of Dundee is a registered Scottish Charity, No: SC015096

@ghost
Copy link

ghost commented Feb 14, 2013

On 14/02/13 14:17, Roger Leigh wrote:

On 14/02/13 13:41, Josh Moore wrote:

Sorry, to all languages. Python certainly makes sense. I don't know if
there's any problem with doing UTF-8 in C++ land.

Certainly works well in GCC-land. UTF-8 is its default input encoding,
and internal/execution charset for narrow strings (with UTF-32 for wide
strings). I've been using UTF-8 in string literals etc. for years. It
all Just Works.

Except on Windows... Other compilers might be more picky, and this
includes MSVC. Stackoverflow says that MSVC2008 will process UTF-8 if it
finds a Unicode BOM into UTF16 internally. Yuck! But GCC won't like
that. And if you don't have a BOM it passes it through, but is unaware
of it--it is apparently reliant upon the locale you build in.

One other note: C++11 introduces u8"", u"" and U"" for UTF-8, UTF-16 and
UTF-32 string literals:

// Unicode literals
char *utf8 = u8"UTF-8 string \u2500";
char16_t *utf16 = u"UTF-8 string \u2500";
char32_t *utf32 = U"UTF-32 string \u2500";

The [w]string classes have the appropriate ctors, etc. Not sure how
MSVC implement it, but it's at least a properly standardised way to
specify the input encoding of the strings, and represent them internally
as the appropriate type. Note these are independent of string/stream
width, so will work with both narrow and wide variants.

Roger

The University of Dundee is a registered Scottish Charity, No: SC015096

@joshmoore
Copy link
Member

@mtbc, it seems if we're going to add this to the docs, let's go ahead and have it defined across the board. If you're not comfortable writing the individual sections, might be worth soliciting the text from @rleigh-dundee, @manics, et al.

@mtbc
Copy link
Member Author

mtbc commented Feb 18, 2013

Yes, definitely will need to solicit text for C++, Python, whatever else. (I don't know to what extent the "contributing to Insight" page needs those, although the more general page does.)

I also don't know enough about our build-time file generation to be sure if generation steps will be happily preserving UTF-8, if that's an issue? I don't know if UTF-8 may creep into model definition or ICE files.

@manics
Copy link
Member

manics commented Feb 18, 2013

Python requires a source header comment:

#!/usr/bin/python
# -*- coding: <encoding name> -*-

which brings us back to @jburel's comment about templates and when to introduce them.

@joshmoore
Copy link
Member

The Python header example definitely specifies that UTF-8 is the way to go, but we could add a comment to be even more explicit:

https://github.com/openmicroscopy/openmicroscopy/blob/develop/docs/headers.txt#L136

@jburel
Copy link
Member

jburel commented Mar 3, 2013

Unfortunately we did not take the time to discuss a plan last week. Maybe add @joshmoore's suggestion i.e. be more explicit and schedule a discussion this week

@mtbc
Copy link
Member Author

mtbc commented Mar 22, 2013

Following discussion in recent standup, here's a revised commit.

@manics
Copy link
Member

manics commented Mar 25, 2013

Good to merge

@joshmoore
Copy link
Member

👍

joshmoore added a commit that referenced this pull request Mar 25, 2013
Add note about Java UTF-8 source in Insight.
@joshmoore joshmoore merged commit e4c65d4 into ome:dev_4_4 Mar 25, 2013
@mtbc mtbc deleted the encoding branch March 25, 2013 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants