Skip to content

Commit c11e648

Browse files
committed
Regexp supports Unicoe 9.0.0's \X
* meta character \X matches Unicode 9.0.0 characters with some workarounds for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences. [Feature ruby#12831] [ruby-core:77586] The term "character" can have many meanings bytes, codepoints, combined characters, and so on. "grapheme cluster" is highest one of such words, which means user-perceived characters. Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to handle grapheme clusters (extended grapheme cluster). But some specs aren't updated to current situation because Unicode Emoji is rapidly extended without well definition. It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters". (the sentence will be removed in the next version) Though some of its detail are described in Unicode Technical Report #51 UNICODE EMOJI but it is not merged into UTR#29 yet. http://unicode.org/reports/tr29/ http://unicode.org/reports/tr51/ http://unicode.org/Public/emoji/4.0/ git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
1 parent e680bfb commit c11e648

File tree

8 files changed

+3916
-1960
lines changed

8 files changed

+3916
-1960
lines changed

NEWS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,9 @@ with all sufficient information, see the ChangeLog file or Redmine
132132
* Regexp#match? [Feature #8110]
133133
This returns bool and doesn't save backref.
134134

135+
* meta character \X matches Unicode 9.0 characters with some workarounds
136+
for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences.
137+
135138
* Regexp/String: Updated Unicode version from 8.0.0 to 9.0.0 [Feature #12513]
136139

137140
* RubyVM::Env

common.mk

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1064,6 +1064,7 @@ UNICODE_PROPERTY_FILES = \
10641064
$(UNICODE_SRC_DATA_DIR)/PropertyAliases.txt \
10651065
$(UNICODE_SRC_DATA_DIR)/PropertyValueAliases.txt \
10661066
$(UNICODE_SRC_DATA_DIR)/Scripts.txt \
1067+
$(UNICODE_SRC_DATA_DIR)/auxiliary/GraphemeBreakProperty.txt \
10671068
$(empty)
10681069

10691070
update-unicode: $(UNICODE_FILES)
@@ -1076,7 +1077,7 @@ UNICODE_DOWNLOAD = \
10761077

10771078
$(UNICODE_PROPERTY_FILES):
10781079
$(ECHO) Downloading Unicode $(UNICODE_VERSION) property files...
1079-
$(Q) $(MAKEDIRS) "$(UNICODE_SRC_DATA_DIR)"
1080+
$(Q) $(MAKEDIRS) "$(UNICODE_SRC_DATA_DIR)/auxiliary"
10801081
$(Q) $(UNICODE_DOWNLOAD) $(UNICODE_PROPERTY_FILES)
10811082

10821083
$(UNICODE_FILES):

0 commit comments

Comments
 (0)