Skip to content

A multilingual, flexible and fast string, glob and regex matcher. Support 拼音匹配 and ローマ字検索.

License

Notifications You must be signed in to change notification settings

Chaoses-Ib/ib-matcher

Repository files navigation

crates.io Documentation License

A multilingual, flexible and fast string, glob and regex matcher. Support 拼音匹配 (Chinese pinyin match) and ローマ字検索 (Japanese romaji match).

Features

And all of the above features are optional. You don't need to pay the performance and binary size cost for features you don't use.

See documentation for details.

You can also use ib-pinyin if you only need Chinese pinyin match, which is simpler and more stable.

Bindings for other languages:

Usage

// cargo add ib-matcher --features pinyin,romaji
use ib_matcher::matcher::{IbMatcher, PinyinMatchConfig, RomajiMatchConfig};

let matcher = IbMatcher::builder("la vie est drôle").build();
assert!(matcher.is_match("LA VIE EST DRÔLE"));

let matcher = IbMatcher::builder("βίος").build();
assert!(matcher.is_match("Βίοσ"));
assert!(matcher.is_match("ΒΊΟΣ"));

let matcher = IbMatcher::builder("pysousuoeve")
    .pinyin(PinyinMatchConfig::default())
    .build();
assert!(matcher.is_match("拼音搜索Everything"));

let matcher = IbMatcher::builder("konosuba")
    .romaji(RomajiMatchConfig::default())
    .is_pattern_partial(true)
    .build();
assert!(matcher.is_match("この素晴らしい世界に祝福を"));

glob()-style pattern matching

See glob module for more details. Here is a quick example:

// cargo add ib-matcher --features syntax-glob,regex,romaji
use ib_matcher::{
    matcher::MatchConfig,
    regex::lita::Regex,
    syntax::glob::{parse_wildcard_path, PathSeparator}
};

let re = Regex::builder()
    .ib(MatchConfig::builder().romaji(Default::default()).build())
    .build_from_hir(
        parse_wildcard_path()
            .separator(PathSeparator::Windows)
            .call("wifi**miku"),
    )
    .unwrap();
assert!(re.is_match(r"C:\Windows\System32\ja-jp\WiFiTask\ミク.exe"));

Regular expression

See regex module for more details. Here is a quick example:

// cargo add ib-matcher --features regex,pinyin,romaji
use ib_matcher::{
    matcher::{MatchConfig, PinyinMatchConfig, RomajiMatchConfig},
    regex::{cp::Regex, Match},
};

let config = MatchConfig::builder()
    .pinyin(PinyinMatchConfig::default())
    .romaji(RomajiMatchConfig::default())
    .build();

let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("raki.suta")
    .unwrap();
assert_eq!(re.find("「らき☆すた」"), Some(Match::must(0, 3..18)));

let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("pysou.*?(any|every)thing")
    .unwrap();
assert_eq!(re.find("拼音搜索Everything"), Some(Match::must(0, 0..22)));

let config = MatchConfig::builder()
    .pinyin(PinyinMatchConfig::default())
    .romaji(RomajiMatchConfig::default())
    .mix_lang(true)
    .build();
let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("(?x)^zangsounofuri-?ren # Mixing pinyin and romaji")
    .unwrap();
assert_eq!(re.find("葬送のフリーレン"), Some(Match::must(0, 0..24)));

Custom matching callbacks:

// cargo add ib-matcher --features regex,regex-callback
use ib_matcher::regex::cp::Regex;

let re = Regex::builder()
    .callback("ascii", |input, at, push| {
        let haystack = &input.haystack()[at..];
        if haystack.len() > 0 && haystack[0].is_ascii() {
            push(1);
        }
    })
    .build(r"(ascii)+\d(ascii)+")
    .unwrap();
let hay = "that4U this4me";
assert_eq!(&hay[re.find(hay).unwrap().span()], " this4me");

一个高性能 Rust 拼音查询、匹配库。

  • 支持以下拼音编码方案:
    • 简拼(“py”)
    • 全拼(“pinyin”)
    • 带声调全拼(“pin1yin1”)
    • Unicode(“pīnyīn”)
    • 智能 ABC 双拼
    • 拼音加加双拼
    • 微软双拼
    • 华宇双拼(紫光双拼)
    • 小鹤双拼
    • 自然码双拼
  • 支持多音字。
  • 支持混合匹配多种拼音编码方案,默认匹配简拼和全拼。
  • 默认小写字母匹配拼音或字母,大写字母只匹配字母。
  • 支持 Unicode 辅助平面汉字。

支持 C、AHK2。

crates.io Documentation

use ib_pinyin::{matcher::PinyinMatcher, pinyin::PinyinNotation};

let matcher = PinyinMatcher::builder("pysousuoeve")
    .pinyin_notations(PinyinNotation::Ascii | PinyinNotation::AsciiFirstLetter)
    .build();
assert!(matcher.is_match("拼音搜索Everything"));
#include <ib_pinyin/ib_pinyin.h>
#include <ib_pinyin/notation.h>

// UTF-8
bool is_match = ib_pinyin_is_match_u8c(u8"pysousuoeve", u8"拼音搜索Everything", PINYIN_NOTATION_ASCII_FIRST_LETTER | PINYIN_NOTATION_ASCII);

// UTF-16
bool is_match = ib_pinyin_is_match_u16c(u"pysousuoeve", u"拼音搜索Everything", PINYIN_NOTATION_ASCII_FIRST_LETTER | PINYIN_NOTATION_ASCII);

// UTF-32
bool is_match = ib_pinyin_is_match_u32c(U"pysousuoeve", U"拼音搜索Everything", PINYIN_NOTATION_ASCII_FIRST_LETTER | PINYIN_NOTATION_ASCII);

C++

原实现(停止维护)

#Include <IbPinyin>

IsMatch := IbPinyin_Match("pysousuoeve", "拼音搜索Everything")
; 指定拼音编码
IsMatch := IbPinyin_Match("pysousuoeve", "拼音搜索Everything", IbPinyin_AsciiFirstLetter | IbPinyin_Ascii)
; 获取匹配范围
IsMatch := IbPinyin_Match("pysousuoeve", "拼音搜索Everything", IbPinyin_AsciiFirstLetter | IbPinyin_Ascii, &start, &end)

; 中文 API
是否匹配 := 拼音_匹配("pysousuoeve", "拼音搜索Everything")
; 指定拼音编码
是否匹配 := 拼音_匹配("pysousuoeve", "拼音搜索Everything", 拼音_简拼 | 拼音_全拼)
; 获取匹配范围
是否匹配 := 拼音_匹配("pysousuoeve", "拼音搜索Everything", 拼音_简拼 | 拼音_全拼, &开始位置, &结束位置)

下载

crates.io Documentation

A fast Japanese romanizer.

crates.io Documentation

Fast Unicode utils.

Features:

  • Simple case folding
  • Mono to_lowercase()
  • ASCII search utils
  • floor_char_boundary() and ceil_char_boundary() polyfill

See also

Projects using this library

其它拼音相关项目

语言 拼音 双拼 词典 匹配 其它
Rust
(C, C#, AHK2)
ib-matcher/ib-pinyin ✔️ Unicode ✔️ ✔️ 支持日文;支持正则表达式;性能优先;支持 Unicode 辅助平面汉字
Rust
(Node.js)
rust-pinyin ✔️ Unicode
Rust rust-pinyin 简拼
C# ToolGood.Words.Pinyin ✔️ 单编码?
C# TinyPinyin.Net ✔️
C# Romanization.NET Unicode 支持日文、韩文、俄文、希腊文
Java PinIn ✔️ ✔️ ✔️ 支持注音输入法、模糊音
Java TinyPinyin ✔️ ✔️
Go go-pinyin ✔️ ✔️
Python python-pinyin ✔️ ✔️
TS pinyin-pro ✔️ ✔️
JS pinyin-match ✔️ 单编码 匹配时忽略空白
JS pinyin-engine ✔️ 单编码
JS pinyin ✔️ ✔️
JS pinyinjs ✔️ Unicode
Perl
(Rust, Java, Python, Ruby, JS, PHP)
Text::Unidecode ✔️ 支持文字广泛

数据库:

文件搜索/启动器:

文件管理:

终端:

文本编辑: