grapheme-splitter

背景

尽管Golang使用utf8作为默认编码，但是在处理某些特殊字符时仍然不能准确判断字符串的长度

例如，普通emoji字符 "🌷","🎁","💩","😜" 和 "👍" 可以确定长度为1，但是像国旗"🇨🇳"这样复杂且常用的emoji的长度就不准确了

len([]rune("🌷")) == 1

len([]rune("🇨🇳")) == 2

一些不常用的汉字，或者其他国家的特殊文字识别出的长度与肉眼所见会存在误差

two := "ñ"; // unnormalized two-char n+◌̃  , i.e. "\u006E\u0303";
one := "ñ"; // normalized single-char, i.e. "\u00F1"
len([]rune(two)) // 2
len([]rune(one)) // 1

本库的目的在于将任意的字符串切割成肉眼所见的单个字符

安装

通过如下命令进行安装:

$ go get github.com/SubLuLu/grapheme-splitter

测试

To run the tests on grapheme-splitter, use the command below:

$ go test -v

$ go test -bench=.

使用

import成功后即可使用:

import splitter "github.com/SubLuLu/grapheme-splitter"

// split the string to an array of grapheme clusters (one string each)
graphemes := splitter.Split(string)

// or do this if you just need their number
counter := splitter.Counter(string)

示例

import splitter "github.com/SubLuLu/grapheme-splitter"

// plain latin alphabet - nothing spectacular
splitter.Split("abcd"); // returns ["a", "b", "c", "d"]

// two-char emojis and six-char combined emoji
splitter.Split("🌷🎁💩😜👍🇨🇳"); // returns ["🌷","🎁","💩","😜","👍","🇨🇳"]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
grapheme.go		grapheme.go
splitter.go		splitter.go
splitter_test.go		splitter_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

grapheme-splitter

背景

安装

测试

使用

示例

About

Uh oh!

Releases

Packages

Languages

SubLuLu/grapheme-splitter

Folders and files

Latest commit

History

Repository files navigation

grapheme-splitter

背景

安装

测试

使用

示例

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages