Skip to content

Commit

Permalink
Optimize UnicodeICU localeCompare (#1186)
Browse files Browse the repository at this point in the history
Summary:
This change eliminates repeated calls to `ucol_open` and `ucol_close` in `hermes::platform_unicode::localeCompare` for HERMES_PLATFORM_UNICODE_ICU. On a sample sort.js(that sorted 30k strings using localeCompare), the runtime goes down from ~500 ms to ~120ms.

**Before:**
I observed in my trace that ucol_open and ucol_close was called in every localeCompare invocation and each ucol_open took around 500-900 ns. ucol_close is tiny ~100ish ns in each call.
Note that the first call to ucol_open, it takes around 100us even after applying this diff.
![image](https://github.com/facebook/hermes/assets/6753926/739edffb-6c7b-4e93-babd-f51f9add4231)

Here is localeCompare call stack before the patch:
![image](https://github.com/facebook/hermes/assets/6753926/9b9fbf31-aa5f-4cc6-a594-26c544c6ebcb)

**After**
After making UCollator construction static, we avoid repeated ucol_open and ucol_close(except during static initialization)  thus making the sort.js sorting fast.
Here is the localeCompare stack after the patch:
![image](https://github.com/facebook/hermes/assets/6753926/7deeba56-ec64-480d-9cf2-4ddcec10fc8b)

Looking at the code, i found out that HERMES_PLATFORM_UNICODE_ICU is not used in Android/iOS but can make desktop usage faster (it it useful??).

Here is the sort.js script which i got from one of the issues/posts in hermes.
```
print("gen...");
  const randInt = (end, start = 0) =>
    Math.floor(Math.random() * (end - start) + start);
  const alphabet = 'abcdefghijklmnopqrstuvwxyz';
  const names = Array.from({length: 30000}, () =>
     Array.from({length: 8}, () => alphabet[randInt(alphabet.length)]).join(''),
   );

  for(var i = 0; i < 5; ++i){
    print(names[i])
  }
  print('sorting...');
  const s = Date.now();
  names.sort(localeCompare);
  print(`...done, took ${Date.now() - s}ms`);

  function localeCompare(a, b) {
    const s = Date.now();
    const result = a.localeCompare(b);
    const e = Date.now();
    if (e - s > 1) {
      print(`slow localeCompare: ${e - s}ms for '${a}' vs '${b}'`);
    }
    return result;
  }

  for(var i = 0; i < 5; ++i){
    print(names[i])
  }
```

Pull Request resolved: #1186

Test Plan:
**Execute sort.js before applying this patch**
```
hermes (main)$ ../build_release/bin/hermesc -emit-binary -out sort.hbc samples/sort.js
hermes (main)$ ../build_release/bin/hvm ./sort.hbc
gen...
sorting...
...done, took 466ms
hermes (main)$ ../build_release/bin/hvm ./sort.hbc
gen...
sorting...
...done, took 486ms
```

**Execute sort.js after applying this patch**
```
$ ../build_release/bin/hermesc -emit-binary -out sort.hbc samples/sort.js
hermes (icu_localecompare_optimize)$ ../build_release/bin/hvm ./sort.hbc
gen...
sorting...
...done, took 119ms
hermes (icu_localecompare_optimize)$ ../build_release/bin/hvm ./sort.hbc
gen...
sorting...
...done, took 127ms
```

**Run check-hermes:**
```
hermes (icu_localecompare_optimize)$ cmake --build ../build_release/ --target check-hermes
[0/1] Running the Hermes regression tests
Testing Time: 15.91s
  Expected Passes    : 1773
  Unsupported Tests  : 66
hermes (icu_localecompare_optimize)$ echo $?
0
```

Reviewed By: avp

Differential Revision: D51429400

Pulled By: neildhar

fbshipit-source-id: f5a2f56952ec9bcbaad441df8f2e9c49c8a0b09a
  • Loading branch information
sujankh authored and facebook-github-bot committed Dec 1, 2023
1 parent aae2c42 commit 16a6e2c
Showing 1 changed file with 41 additions and 16 deletions.
57 changes: 41 additions & 16 deletions lib/Platform/Unicode/PlatformUnicodeICU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,35 +13,60 @@

#include <time.h>

#include <memory>

namespace hermes {
namespace platform_unicode {

namespace {
struct UCollatorDeleter {
void operator()(UCollator *coll) {
ucol_close(coll);
}
};

/// Returns a pointer to the singleton UCollator. It is thread-safe because
/// threads can only access const APIs using this handle(which are thread-safe
/// as per the documentation). Please refer to
/// https://unicode-org.github.io/icu/userguide/icu/design.html#thread-safe-const-apis,
/// for more details. This
/// function invokes ucol_open only once and keeps it as a singleton for better
/// performance instead of ucol_open/ucol_close in each invocation.
const UCollator *getUCollatorInstance() {
using UCollatorPtr = std::unique_ptr<UCollator, UCollatorDeleter>;
static UCollatorPtr coll = [] {
UErrorCode err{U_ZERO_ERROR};
UCollator *coll = ucol_open(uloc_getDefault(), &err);

if (U_FAILURE(err)) {
// Failover to root locale if we're unable to open in default locale.
err = U_ZERO_ERROR;
coll = ucol_open("", &err);
}
assert(U_SUCCESS(err) && "failed to open collator");

// Normalization mode allows for strings that can be represented
// in two different ways to compare as equal.
ucol_setAttribute(coll, UCOL_NORMALIZATION_MODE, UCOL_ON, &err);
assert(U_SUCCESS(err) && "failed to set collator attribute");
return UCollatorPtr(coll, UCollatorDeleter());
}();

return coll.get();
}
} // namespace

int localeCompare(
llvh::ArrayRef<char16_t> left,
llvh::ArrayRef<char16_t> right) {
UErrorCode err{U_ZERO_ERROR};
UCollator *coll = ucol_open(uloc_getDefault(), &err);
if (U_FAILURE(err)) {
// Failover to root locale if we're unable to open in default locale.
err = U_ZERO_ERROR;
coll = ucol_open("", &err);
}
assert(U_SUCCESS(err) && "failed to open collator");

// Normalization mode allows for strings that can be represented
// in two different ways to compare as equal.
ucol_setAttribute(coll, UCOL_NORMALIZATION_MODE, UCOL_ON, &err);
assert(U_SUCCESS(err) && "failed to set collator attribute");

const UCollator *coll = getUCollatorInstance();
auto result = ucol_strcoll(
coll,
(const UChar *)left.data(),
left.size(),
(const UChar *)right.data(),
right.size());

ucol_close(coll);

switch (result) {
case UCOL_LESS:
return -1;
Expand Down

0 comments on commit 16a6e2c

Please sign in to comment.