Skip to content

A multilingual, flexible and fast string and regex matcher, supports 拼音匹配 and ローマ字検索

License

Notifications You must be signed in to change notification settings

Chaoses-Ib/ib-matcher

Repository files navigation

crates.io Documentation License

A multilingual, flexible and fast string and regex matcher, supports 拼音匹配 (Chinese pinyin match) and ローマ字検索 (Japanese romaji match).

Features

  • Unicode support
    • Fully UTF-8 support and limited support for UTF-16 and UTF-32.
    • Unicode case insensitivity.
  • Chinese pinyin matching (拼音匹配)
  • Japanese romaji matching (ローマ字検索)
    • Support characters with multiple readings (i.e. heteronyms, 同形異音語).
    • Support Hepburn romanization system only at the moment.
  • glob()-style pattern matching (i.e. ?, * and **)
  • Regular expression
    • Support the same syntax as regex, including wildcards, repetitions, alternations, groups, etc.
    • Support custom matching callbacks, which can be used to implement ad hoc look-around, backreferences, balancing groups/recursion/subroutines, combining domain-specific parsers, etc.
  • Relatively high performance

And all of the above features are optional. You don't need to pay the performance and binary size cost for features you don't use.

You can also use ib-pinyin if you only need Chinese pinyin match, which is simpler and more stable.

Usage

//! cargo add ib-matcher --features pinyin,romaji
use ib_matcher::{
    matcher::{IbMatcher, PinyinMatchConfig, RomajiMatchConfig},
    pinyin::PinyinNotation,
};

let matcher = IbMatcher::builder("pysousuoeve")
    .pinyin(PinyinMatchConfig::notations(
        PinyinNotation::Ascii | PinyinNotation::AsciiFirstLetter,
    ))
    .build();
assert!(matcher.is_match("拼音搜索Everything"));

let matcher = IbMatcher::builder("konosuba")
    .romaji(RomajiMatchConfig::default())
    .is_pattern_partial(true)
    .build();
assert!(matcher.is_match("この素晴らしい世界に祝福を"));

Regular expression

See regex module for more details. For example:

// cargo add ib-matcher --features regex,pinyin,romaji
use ib_matcher::{
    matcher::{MatchConfig, PinyinMatchConfig, RomajiMatchConfig},
    regex::{cp::Regex, Match},
};

let config = MatchConfig::builder()
    .pinyin(PinyinMatchConfig::default())
    .romaji(RomajiMatchConfig::default())
    .build();

let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("raki.suta")
    .unwrap();
assert_eq!(re.find("「らき☆すた」"), Some(Match::must(0, 3..18)));

let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("pysou.*?(any|every)thing")
    .unwrap();
assert_eq!(re.find("拼音搜索Everything"), Some(Match::must(0, 0..22)));

let config = MatchConfig::builder()
    .pinyin(PinyinMatchConfig::default())
    .romaji(RomajiMatchConfig::default())
    .mix_lang(true)
    .build();
let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("(?x)^zangsounofuri-?ren # Mixing pinyin and romaji")
    .unwrap();
assert_eq!(re.find("葬送のフリーレン"), Some(Match::must(0, 0..24)));

Custom matching callbacks:

// cargo add ib-matcher --features regex,regex-callback
use ib_matcher::regex::cp::Regex;

let re = Regex::builder()
    .callback("ascii", |input, at, push| {
        let haystack = &input.haystack()[at..];
        if haystack.len() > 0 && haystack[0].is_ascii() {
            push(1);
        }
    })
    .build(r"(ascii)+\d(ascii)+")
    .unwrap();
let hay = "that4U this4me";
assert_eq!(&hay[re.find(hay).unwrap().span()], " this4me");

一个高性能 Rust 拼音查询、匹配库。

  • 支持以下拼音编码方案:
    • 简拼(“py”)
    • 全拼(“pinyin”)
    • 带声调全拼(“pin1yin1”)
    • Unicode(“pīnyīn”)
    • 智能 ABC 双拼
    • 拼音加加双拼
    • 微软双拼
    • 华宇双拼(紫光双拼)
    • 小鹤双拼
    • 自然码双拼
  • 支持多音字。
  • 支持混合匹配多种拼音编码方案,默认匹配简拼和全拼。
  • 默认小写字母匹配拼音或字母,大写字母只匹配字母。
  • 支持 Unicode 辅助平面汉字。

支持 C、AHK2。

crates.io Documentation

use ib_pinyin::{matcher::PinyinMatcher, pinyin::PinyinNotation};

let matcher = PinyinMatcher::builder("pysousuoeve")
    .pinyin_notations(PinyinNotation::Ascii | PinyinNotation::AsciiFirstLetter)
    .build();
assert!(matcher.is_match("拼音搜索Everything"));
#include <ib_pinyin/ib_pinyin.h>
#include <ib_pinyin/notation.h>

// UTF-8
bool is_match = ib_pinyin_is_match_u8c(u8"pysousuoeve", u8"拼音搜索Everything", PINYIN_NOTATION_ASCII_FIRST_LETTER | PINYIN_NOTATION_ASCII);

// UTF-16
bool is_match = ib_pinyin_is_match_u16c(u"pysousuoeve", u"拼音搜索Everything", PINYIN_NOTATION_ASCII_FIRST_LETTER | PINYIN_NOTATION_ASCII);

// UTF-32
bool is_match = ib_pinyin_is_match_u32c(U"pysousuoeve", U"拼音搜索Everything", PINYIN_NOTATION_ASCII_FIRST_LETTER | PINYIN_NOTATION_ASCII);

C++

原实现(停止维护)

#Include <IbPinyin>

IsMatch := IbPinyin_Match("pysousuoeve", "拼音搜索Everything")
; 指定拼音编码
IsMatch := IbPinyin_Match("pysousuoeve", "拼音搜索Everything", IbPinyin_AsciiFirstLetter | IbPinyin_Ascii)
; 获取匹配范围
IsMatch := IbPinyin_Match("pysousuoeve", "拼音搜索Everything", IbPinyin_AsciiFirstLetter | IbPinyin_Ascii, &start, &end)

; 中文 API
是否匹配 := 拼音_匹配("pysousuoeve", "拼音搜索Everything")
; 指定拼音编码
是否匹配 := 拼音_匹配("pysousuoeve", "拼音搜索Everything", 拼音_简拼 | 拼音_全拼)
; 获取匹配范围
是否匹配 := 拼音_匹配("pysousuoeve", "拼音搜索Everything", 拼音_简拼 | 拼音_全拼, &开始位置, &结束位置)

下载

crates.io Documentation

A fast Japanese romanizer.

crates.io Documentation

Unicode utils.

其它拼音相关项目

语言 拼音 双拼 词典 匹配 其它
Rust
(C, AHK2)
ib-matcher/ib-pinyin ✔️ Unicode ✔️ ✔️ 支持日文;支持正则表达式;性能优先;支持 Unicode 辅助平面汉字
Rust
(Node.js)
rust-pinyin ✔️ Unicode
Rust rust-pinyin 简拼
C# ToolGood.Words.Pinyin ✔️ 单编码?
C# TinyPinyin.Net ✔️
C# Romanization.NET Unicode 支持日文、韩文、俄文、希腊文
Java PinIn ✔️ ✔️ ✔️ 支持注音输入法、模糊音
Java TinyPinyin ✔️ ✔️
Go go-pinyin ✔️ ✔️
Python python-pinyin ✔️ ✔️
TS pinyin-pro ✔️ ✔️
JS pinyin-match ✔️ 单编码 匹配时忽略空白
JS pinyin-engine ✔️ 单编码
JS pinyin ✔️ ✔️
JS pinyinjs ✔️ Unicode
Perl
(Rust, Java, Python, Ruby, JS, PHP)
Text::Unidecode ✔️ 支持文字广泛

数据库:

文件搜索/启动器:

文件管理:

终端:

文本编辑:

About

A multilingual, flexible and fast string and regex matcher, supports 拼音匹配 and ローマ字検索

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published