A multilingual, flexible and fast string and regex matcher, supports 拼音匹配 (Chinese pinyin match) and ローマ字検索 (Japanese romaji match).
- Unicode support
- Fully UTF-8 support and limited support for UTF-16 and UTF-32.
- Unicode case insensitivity.
- Chinese pinyin matching (拼音匹配)
- Support characters with multiple readings (i.e. heteronyms, 多音字).
- Support multiple pinyin notations, including Quanpin (全拼), Jianpin (简拼) and many Shuangpin (双拼) notations.
- Support mixing multiple notations during matching.
- Japanese romaji matching (ローマ字検索)
- Support characters with multiple readings (i.e. heteronyms, 同形異音語).
- Support Hepburn romanization system only at the moment.
- glob()-style pattern matching (i.e.
?
,*
and**
)- Support treating surrounding wildcards as anchors when not matching the whole string.
- Support two seperators (
//
) or a complement separator (\
) as a glob star (*/**
).
- Regular expression
- Support the same syntax as
regex
, including wildcards, repetitions, alternations, groups, etc. - Support custom matching callbacks, which can be used to implement ad hoc look-around, backreferences, balancing groups/recursion/subroutines, combining domain-specific parsers, etc.
- Support the same syntax as
- Relatively high performance
And all of the above features are optional. You don't need to pay the performance and binary size cost for features you don't use.
You can also use ib-pinyin if you only need Chinese pinyin match, which is simpler and more stable.
//! cargo add ib-matcher --features pinyin,romaji
use ib_matcher::{
matcher::{IbMatcher, PinyinMatchConfig, RomajiMatchConfig},
pinyin::PinyinNotation,
};
let matcher = IbMatcher::builder("pysousuoeve")
.pinyin(PinyinMatchConfig::notations(
PinyinNotation::Ascii | PinyinNotation::AsciiFirstLetter,
))
.build();
assert!(matcher.is_match("拼音搜索Everything"));
let matcher = IbMatcher::builder("konosuba")
.romaji(RomajiMatchConfig::default())
.is_pattern_partial(true)
.build();
assert!(matcher.is_match("この素晴らしい世界に祝福を"));
See regex
module for more details. For example:
// cargo add ib-matcher --features regex,pinyin,romaji
use ib_matcher::{
matcher::{MatchConfig, PinyinMatchConfig, RomajiMatchConfig},
regex::{cp::Regex, Match},
};
let config = MatchConfig::builder()
.pinyin(PinyinMatchConfig::default())
.romaji(RomajiMatchConfig::default())
.build();
let re = Regex::builder()
.ib(config.shallow_clone())
.build("raki.suta")
.unwrap();
assert_eq!(re.find("「らき☆すた」"), Some(Match::must(0, 3..18)));
let re = Regex::builder()
.ib(config.shallow_clone())
.build("pysou.*?(any|every)thing")
.unwrap();
assert_eq!(re.find("拼音搜索Everything"), Some(Match::must(0, 0..22)));
let config = MatchConfig::builder()
.pinyin(PinyinMatchConfig::default())
.romaji(RomajiMatchConfig::default())
.mix_lang(true)
.build();
let re = Regex::builder()
.ib(config.shallow_clone())
.build("(?x)^zangsounofuri-?ren # Mixing pinyin and romaji")
.unwrap();
assert_eq!(re.find("葬送のフリーレン"), Some(Match::must(0, 0..24)));
// cargo add ib-matcher --features regex,regex-callback
use ib_matcher::regex::cp::Regex;
let re = Regex::builder()
.callback("ascii", |input, at, push| {
let haystack = &input.haystack()[at..];
if haystack.len() > 0 && haystack[0].is_ascii() {
push(1);
}
})
.build(r"(ascii)+\d(ascii)+")
.unwrap();
let hay = "that4U this4me";
assert_eq!(&hay[re.find(hay).unwrap().span()], " this4me");
一个高性能 Rust 拼音查询、匹配库。
- 支持以下拼音编码方案:
- 简拼(“py”)
- 全拼(“pinyin”)
- 带声调全拼(“pin1yin1”)
- Unicode(“pīnyīn”)
- 智能 ABC 双拼
- 拼音加加双拼
- 微软双拼
- 华宇双拼(紫光双拼)
- 小鹤双拼
- 自然码双拼
- 支持多音字。
- 支持混合匹配多种拼音编码方案,默认匹配简拼和全拼。
- 默认小写字母匹配拼音或字母,大写字母只匹配字母。
- 支持 Unicode 辅助平面汉字。
支持 C、AHK2。
use ib_pinyin::{matcher::PinyinMatcher, pinyin::PinyinNotation};
let matcher = PinyinMatcher::builder("pysousuoeve")
.pinyin_notations(PinyinNotation::Ascii | PinyinNotation::AsciiFirstLetter)
.build();
assert!(matcher.is_match("拼音搜索Everything"));
#include <ib_pinyin/ib_pinyin.h>
#include <ib_pinyin/notation.h>
// UTF-8
bool is_match = ib_pinyin_is_match_u8c(u8"pysousuoeve", u8"拼音搜索Everything", PINYIN_NOTATION_ASCII_FIRST_LETTER | PINYIN_NOTATION_ASCII);
// UTF-16
bool is_match = ib_pinyin_is_match_u16c(u"pysousuoeve", u"拼音搜索Everything", PINYIN_NOTATION_ASCII_FIRST_LETTER | PINYIN_NOTATION_ASCII);
// UTF-32
bool is_match = ib_pinyin_is_match_u32c(U"pysousuoeve", U"拼音搜索Everything", PINYIN_NOTATION_ASCII_FIRST_LETTER | PINYIN_NOTATION_ASCII);
原实现(停止维护)
#Include <IbPinyin>
IsMatch := IbPinyin_Match("pysousuoeve", "拼音搜索Everything")
; 指定拼音编码
IsMatch := IbPinyin_Match("pysousuoeve", "拼音搜索Everything", IbPinyin_AsciiFirstLetter | IbPinyin_Ascii)
; 获取匹配范围
IsMatch := IbPinyin_Match("pysousuoeve", "拼音搜索Everything", IbPinyin_AsciiFirstLetter | IbPinyin_Ascii, &start, &end)
; 中文 API
是否匹配 := 拼音_匹配("pysousuoeve", "拼音搜索Everything")
; 指定拼音编码
是否匹配 := 拼音_匹配("pysousuoeve", "拼音搜索Everything", 拼音_简拼 | 拼音_全拼)
; 获取匹配范围
是否匹配 := 拼音_匹配("pysousuoeve", "拼音搜索Everything", 拼音_简拼 | 拼音_全拼, &开始位置, &结束位置)
A fast Japanese romanizer.
Unicode utils.
语言 | 库 | 拼音 | 双拼 | 词典 | 匹配 | 其它 |
---|---|---|---|---|---|---|
Rust (C, AHK2) |
ib-matcher/ib-pinyin | ✔️ Unicode | ✔️ | ❌ | ✔️ | 支持日文;支持正则表达式;性能优先;支持 Unicode 辅助平面汉字 |
Rust (Node.js) |
rust-pinyin | ✔️ Unicode | ❌ | ❌ | ❌ | |
Rust | rust-pinyin | 简拼 | ❌ | ❌ | ❌ | |
C# | ToolGood.Words.Pinyin | ✔️ | ❌ | ❌ | 单编码? | |
C# | TinyPinyin.Net | ✔️ | ❌ | ❌ | ❌ | |
C# | Romanization.NET | Unicode | ❌ | ❌ | 支持日文、韩文、俄文、希腊文 | |
Java | PinIn | ✔️ | ✔️ | ❌ | ✔️ | 支持注音输入法、模糊音 |
Java | TinyPinyin | ✔️ | ❌ | ✔️ | ❌ | |
Go | go-pinyin | ✔️ | ❌ | ✔️ | ❌ | |
Python | python-pinyin | ✔️ | ❌ | ✔️ | ❌ | |
TS | pinyin-pro | ✔️ | ❌ | ❌ | ✔️ | |
JS | pinyin-match | ✔️ | ❌ | ❌ | 单编码 | 匹配时忽略空白 |
JS | pinyin-engine | ✔️ | ❌ | ❌ | 单编码 | |
JS | pinyin | ✔️ | ❌ | ✔️ | ❌ | |
JS | pinyinjs | ✔️ Unicode | ❌ | ❌ | ❌ | |
Perl (Rust, Java, Python, Ruby, JS, PHP) |
Text::Unidecode | ✔️ | ❌ | ❌ | ❌ | 支持文字广泛 |
数据库:
文件搜索/启动器:
- IbEverythingExt: Everything Everything 拼音搜索、ローマ字検索、快速选择扩展(基于 ib-matcher)
- Listary(简拼、全拼)
文件管理:
- 资源管理器
- 资源管理器拼音搜索扩展(基于 ib-matcher)
- Directory Opus(仅简拼)
- Total Commander:QuickSearch eXtended(仅简拼)
终端:
文本编辑: