Skip to content

Commit

Permalink
Remove astral character assert in U16MatchCharICase32Insn
Browse files Browse the repository at this point in the history
Summary:
We currently have an assert that characters encountered in a
`U16MatchCharICase32Insn` must be astral. This is not necessarily the
case since unpaired low/high surrogates may also be encoded in
`U16MatchCharICase32Insn`instructions. We also don't have such an
assert for the case sensitive variant `U16MatchChar32Insn`, which is
generated in almost the same way.

Reviewed By: avp

Differential Revision: D23927543

fbshipit-source-id: 01593e2c434676be22ee4333740b8c448b7276cf
  • Loading branch information
neildhar authored and facebook-github-bot committed Sep 25, 2020
1 parent a3d57bb commit c04d69e
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 1 deletion.
1 change: 0 additions & 1 deletion lib/Regex/Executor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1048,7 +1048,6 @@ auto Context<Traits>::match(State<Traits> *s, bool onlyAtStart)

case Opcode::U16MatchCharICase32: {
const auto *insn = llvh::cast<U16MatchCharICase32Insn>(base);
assert(insn->c >= 0x010000 && "Character should be astral");
bool matched = false;
if (!c.atEnd()) {
CodePoint cp = c.consumeUTF16();
Expand Down
7 changes: 7 additions & 0 deletions test/hermes/regexp_unicode.js
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,13 @@ print(/.*/u.exec("\u0101bc\ndef")[0].length);
// We should not match a low surrogate in a Unicode regexp.
print(!! /\uDE42/u.exec("\uD83D\uDE42ZZZ"));
// CHECK-NEXT: false
// We should match an unpaired surrogate.
print(!! /\uDC00/u.exec("\uDC00"));
// CHECK-NEXT: true
// Test the case insensitive variant.
print(!! /\uDC00/iu.exec("\uDC00"));
// CHECK-NEXT: true
// We should match the low surrogate when Unicode is off.
print(!! /\uDE42/.exec("\uD83D\uDE42ZZZ"));
// CHECK-NEXT: true

Expand Down

0 comments on commit c04d69e

Please sign in to comment.