Skip to content

Commit

Permalink
collation: add utf8_unicode_ci/utf8mb4_unicode_ci support (pingcap#4482)
Browse files Browse the repository at this point in the history
* document for unicode_ci

* document for unicode_ci

* lint

* Apply suggestions from code review

* Update character-set-and-collation.md

Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com>
  • Loading branch information
xiongjiwei and TomShawn authored Sep 14, 2020
1 parent 6ee9cd5 commit 28f373d
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 4 deletions.
2 changes: 1 addition & 1 deletion basic-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ aliases: ['/docs-cn/dev/basic-features/']

- 字符集:UTF8、UTF8MB4、BINARY、ASCII、LATIN1。

- 排序规则:UTF8MB4_GENERAL_CI、UTF8MB4_GENERAL_BIN、UTF8_GENERAL_CI、UTF8_GENERAL_BIN、BINARY。
- 排序规则:UTF8MB4_GENERAL_CI、UTF8MB4_UNICODE_CI、UTF8MB4_GENERAL_BIN、UTF8_GENERAL_CI、UTF8_UNICODE_CI、UTF8_GENERAL_BIN、BINARY。

## 函数

Expand Down
10 changes: 7 additions & 3 deletions character-set-and-collation.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ SHOW COLLATION WHERE Charset = 'utf8mb4';
+--------------------+---------+------+---------+----------+---------+
| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 |
| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 1 |
+--------------------+---------+------+---------+----------+---------+
2 rows in set (0.00 sec)
```
Expand Down Expand Up @@ -444,7 +445,7 @@ ERROR 1062 (23000): Duplicate entry 'a ' for key 'PRIMARY' # TiDB 修正了 `PAD
如果一个表达式涉及多个不同排序规则的子表达式时,需要对计算时用的排序规则进行推断,规则如下:

+ 显式 `COLLATE` 子句的 coercibility 值为 `0`
+ 如果两个字符串的排序规则不兼容,这两个字符串 `concat` 结果的 coercibility 值为 `1`目前所实现的排序规则都是互相兼容的。
+ 如果两个字符串的排序规则不兼容,这两个字符串 `concat` 结果的 coercibility 值为 `1`
+ 列或者 `CAST()``CONVERT()``BINARY()` 的排序规则的 coercibility 值为 `2`
+ 系统常量(`USER()` 或者 `VERSION()` 返回的字符串)的 coercibility 值为 `3`
+ 常量的 coercibility 值为 `4`
Expand All @@ -453,9 +454,12 @@ ERROR 1062 (23000): Duplicate entry 'a ' for key 'PRIMARY' # TiDB 修正了 `PAD

在推断排序规则时,TiDB 优先使用 coercibility 值较低的表达式的排序规则。如果 coercibility 值相同,则按以下优先级确定排序规则:

binary > utf8mb4_bin > utf8mb4_general_ci > utf8_bin > utf8_general_ci > latin1_bin > ascii_bin
binary > utf8mb4_bin > (utf8mb4_general_ci = utf8mb4_unicode_ci) > utf8_bin > (utf8_general_ci = utf8_unicode_ci) > latin1_bin > ascii_bin

如果两个子表达式的排序规则不相同,而且表达式的 coercibility 值都为 `0` 时,TiDB 无法推断排序规则并报错。
以下情况 TiDB 无法推断排序规则并报错:

- 如果两个子表达式的排序规则不相同,而且表达式的 coercibility 值都为 `0`
- 如果两个子表达式的排序规则不兼容,而且表达式的返回类型为 `String` 类。

## `COLLATE` 子句

Expand Down

0 comments on commit 28f373d

Please sign in to comment.