Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Remove OSX Homebrew ICU dependency #1851

Merged
merged 19 commits into from
Oct 26, 2015

Conversation

ellismg
Copy link

@ellismg ellismg commented Oct 22, 2015

This change converts the System.Globalization.Native shim to use the ICU C API (instead of C++). Once that is done, we are able to link against the version of ICU that OSX uses for all of their locale APIs.

Please take a look @stephentoub @ericeil

The individual commits have some duplicated work, I did it this way so I could make progress incrementally, continue to test and slowly make progress driving towards C. I'm not sure if it is easier to review the 8 files or the 15 commits.

I ran all of the corefx tests locally on both Ubuntu and OSX.

We would like to be able to link against versions of ICU installed as a
"operating system level library" which means we can't take a dependency
on any C++ APIs.

This change moves away from icu::Calendar in favor of UCalendar. I also
introduce a small helper template to manage the lifetime of ICU
resources.
Part of the effort to remove our usage of C++ ICU APIs. The major issue
here was that the C++ API used char*'s for some things whereas the C API
used UChar*'s so we needed to define our own copies of some constants.

We also need to manage a buffer ourselves, instead of being able to use
the underlying buffer of a retured UnicodeString.
Remove uses of the C++ DateFormat and SimpleDateFormat classes in favor
of UDateFormat.

As part of this change, it was easier to move some of the code that
converts an ICU format string to a .NET Style format string from native
code up to managed code. This code used UnicodeString and we'll need to
move away from that as well as we remove all the C++ usage.
This change removes NumberFormat in favor of UNumberFormat. There is a
bit of work that needs to happen in order to keep the normalization code
we use to convert an ICU pattern so to something we can match against
working.

Instead of UnicodeStrings, the input to the normalization function is
now a UChar* and we build up a std::string during normalization.  This
allows us to also skip a conversion from UChar* back to char* so we can
find the correct pattern in our collection of patterns to examine.
Getting the regular eras is straight forward, we can do the thing we do
for other locale data and just ask ICU using a specific
UDateFormatSymbolType.  For abbreviated eras, there's no C API, but we
can try to just read the data from ICU resources and fall back to the
standard width eras if that doesn't work.
To prepare for removing icu::Locale in favor of just using the id
directly, remove all the uses of Locale methods except for .getName().
We now use GetLocale to create a Locale but then turn it into a char*
for all the helper methods.

After this change, we can update GetLocale to do locale parsing into a
char buffer and remove all the locale.getName() calls with just `locale'.
Remove all the uses of the icu::Locale type in favor of just using a
char* which is the raw locale id (which is exactly what all the ICU C
apis use for a locale).

The meat of this change si in locale.cpp to actually handle doing the
conversion from UChar* to char*.  The rest of the places are dealing
with the fallout (GetLocale now has a different signiture and the
.getName() dance is no longer needed as we have a raw locale name all
the time now).
Use "= delete" syntax to make it clear the IcuHolder copy constructor
and assignment opperators are removed.

Remove superfluous "public" modifier on the struct closers used by the
IcuHolders.
OSX ships with a copy of ICU (their globalization APIs are built on top
of it).  Since we only use stable c based APIs, we can link against it
using the methods described in using a System ICU in the ICU User's
Guide (basically we disable function renaming, don't use C++ and only
use stable APIs).

The ICU headers are not part of the SDK, so we continue to need ICU
installed via Homebrew as a build time dependency.

Fixes dotnet/corefx#3849
typedef IcuHolder<UDateFormat, UDateFormatCloser> UDateFormatHolder;
typedef IcuHolder<UNumberFormat, UNumberFormatCloser> UNumberFormatHolder;
typedef IcuHolder<ULocaleDisplayNames, ULocaleDisplayNamesCloser> ULocaleDisplayNamesHolder;
typedef IcuHolder<UResourceBundle, UResourceBundleCloser> UResourceBundleHolder;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, you could cut down on the ceremony by just making the closers overloads of an IcuClose function. That would eliminate the extra template arg on IcuHolder, which you could then use directly as IcuHolder<T> without the typedefs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I will do this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not sure if this will work. All of these Uxxx things are typedefs of void*, so I don't think we can overload.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuck (talking to ICU, not you). Ok.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code did look nicer, so maybe partial template specialization is the way to go in the future. I doubt I will make any changes for RC1 however, what we have works even if it is ugly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just be trading one set of ceremony for another.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how specialization would work any better than overloading if these are all really the same type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I was not thinking clearly.

This matches what we do in other places in calendarData.cpp, the RAII
pattern will make it easier to not leak memory.
@stephentoub
Copy link
Member

Looks great, Matt.

@stephentoub
Copy link
Member

cc: @eerhardt

@ellismg
Copy link
Author

ellismg commented Oct 23, 2015

Ugh, tagged the wrong Eric at first. Feel free to ignore this @ericeil.

{
void operator()(UNumberFormat* pNumberFormat) const
{
udat_close(pNumberFormat);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does NumberFormat have the same close as DateFormat or is this an all too easy mistake to make with the "everything is really void*" silliness?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Will fix.

message(FATAL_ERROR "Cannot find libicui18n, try installing libicu-dev (or the appropriate package for your platform)")
return()
if(NOT CLR_CMAKE_PLATFORM_DARWIN)
find_library(ICUUC NAMES icuuc PATHS ${ICU_HOMEBREW_LIB_PATH})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have "HOMEBREW_LIB_PATH" for non-Darwin platforms?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about switching this to use find_package(ICU REQUIRED), and then use the ${ICU_LIBRARIES} in target_link_libraries?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the HOMEBREW_LIB_PATH thing. I will look to into find_package in a follow-up to this PR.

case AbbrevMonthNames:
return EnumMonths(
locale, calendarId, DateFormatSymbols::STANDALONE, DateFormatSymbols::ABBREVIATED, callback, context);
return EnumSymbols(locale, calendarId, UDAT_STANDALONE_SHORT_MONTHS, 0, callback, context);
case SuperShortDayNames:
#ifdef HAVE_DTWIDTHTYPE_SHORT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check in the cmake file is out of date with what is being used in code here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@eerhardt
Copy link
Member

LGTM - I just had a few minor questions.

There were a few problems that needed to be addressed:

 - Our detection logic around testing if ICU supported a feature was
   still checking for C++ stuff instead of the coresponding C
   code (which we ended up using).

 - There was some cleanup we could do now that the OSX and other Unix
   builds were split apart
@ellismg ellismg force-pushed the icu-remove-c-plus-plus branch from 3eac0d9 to 392f5f4 Compare October 23, 2015 22:43
ellismg added a commit that referenced this pull request Oct 26, 2015
@ellismg ellismg merged commit e93beb1 into dotnet:master Oct 26, 2015
@davidfowl
Copy link
Member

👏

joshfree added a commit that referenced this pull request Oct 26, 2015
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…-plus

Remove OSX Homebrew ICU dependency

Commit migrated from dotnet/coreclr@e93beb1
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants