A PHP port of the Yosina Japanese text transliteration library.
Yosina is a library for Japanese text transliteration that provides various text normalization and conversion features commonly needed when processing Japanese text.
<?php
use Yosina\TransliterationRecipe;
use Yosina\Yosina;
// Create a recipe with multiple transformations
$recipe = new TransliterationRecipe(
replaceSpaces: true,
replaceCircledOrSquaredCharacters: true,
replaceCombinedCharacters: true,
kanjiOldNew: true,
toFullwidth: true
);
$transliterator = Yosina::makeTransliterator($recipe);
// Use it with various special characters
$input = "①②③ ⒶⒷⒸ ㍿㍑㌠㋿"; // circled numbers, letters, ideographic space, combined characters
$result = $transliterator($input);
echo $result; // "(1)(2)(3) (A)(B)(C) 株式会社リットルサンチーム令和"
// Convert old kanji to new
$oldKanji = "舊字體";
$result = $transliterator($oldKanji);
echo $result; // "旧字体"
// Convert half-width katakana to full-width
$halfWidth = "テストモジレツ";
$result = $transliterator($halfWidth);
echo $result; // "テストモジレツ"<?php
use Yosina\Yosina;
// Chain multiple transliterators
$transliterator = Yosina::makeTransliterator([
['kanji-old-new', []],
['spaces', []],
['radicals', []],
]);
$result = $transliterator($inputText);- PHP 8.2 or higher
composer require yosina-lib/yosinaConverts circled or squared characters to their plain equivalents.
- Options:
templates(custom rendering),includeEmojis(include emoji characters) - Example:
①②③→(1)(2)(3),㊙㊗→(秘)(祝)
Expands combined characters into their individual character sequences.
- Example:
㍻(Heisei era) →平成,㈱→(株)
Combines decomposed hiraganas and katakanas into composed equivalents.
- Options:
composeNonCombiningMarks(compose non-combining marks) - Example:
か + ゙→が,ヘ + ゜→ペ
Converts between hiragana and katakana scripts bidirectionally.
- Options:
mode("hira-to-kata" or "kata-to-hira") - Example:
ひらがな→ヒラガナ(hira-to-kata)
Replaces various dash/hyphen symbols with common ones used in Japanese.
- Options:
precedence(mapping priority order) - Available mappings: "ascii", "jisx0201", "jisx0208_90", "jisx0208_90_windows", "jisx0208_verbatim"
- Example:
2019—2020(em dash) →2019-2020
Replaces ideographic annotations used in traditional Chinese-to-Japanese translation.
- Example:
㆖㆘→上下
Handles Ideographic and Standardized Variation Selectors.
- Options:
charset,mode("ivs-or-svs" or "base"),preferSVS,dropSelectorsAltogether - Example:
葛󠄀(葛 + IVS) →葛
Expands iteration marks by repeating the preceding character.
- Example:
時々→時時,いすゞ→いすず
Handles half-width/full-width character conversion.
- Options:
fullwidthToHalfwidth,convertGL(alphanumerics/symbols),convertGR(katakana),u005cAsYenSign - Example:
ABC123→ABC123,カタカナ→カタカナ
Converts old-style kanji (旧字体) to modern forms (新字体).
- Example:
舊字體の變換→旧字体の変換
Normalizes mathematical alphanumeric symbols to plain ASCII.
- Example:
𝐀𝐁𝐂(mathematical bold) →ABC
Handles contextual conversion between hyphens and prolonged sound marks.
- Options:
skipAlreadyTransliteratedChars,allowProlongedHatsuon,allowProlongedSokuon,replaceProlongedMarksFollowingAlnums - Example:
イ−ハト−ヴォ(with hyphen) →イーハトーヴォ(prolonged mark)
Converts CJK radical characters to their corresponding ideographs.
- Example:
⾔⾨⾷(Kangxi radicals) →言門食
Normalizes various Unicode space characters to standard ASCII space.
- Example:
A B(ideographic space) →A B
Converts Unicode Roman numeral characters to their ASCII letter equivalents.
- Example:
Ⅰ Ⅱ Ⅲ→I II III,ⅰ ⅱ ⅲ→i ii iii
- PHP 7.4 or higher
- Composer (PHP dependency manager)
Install the development dependencies:
composer installThe transliterator implementations are generated from the shared data files:
php codegen/generate.phpThis generates transliterator classes from the JSON data files in the ../data/ directory.
Run the basic tests:
php tests/BasicTest.php- Make changes to the code or data files
- If you modified data files, regenerate the transliterators:
php codegen/generate.php
- Run tests to ensure everything works:
composer test
php/
├── src/
│ ├── Char.php # Character data structure
│ ├── Chars.php # Character array utilities
│ ├── TransliteratorInterface.php # Transliterator interface
│ ├── TransliteratorFactoryInterface.php # Factory interface
│ ├── ChainedTransliterator.php # Chained transliterator
│ ├── TransliterationRecipe.php # Recipe configuration
│ ├── TransliteratorRegistry.php # Transliterator registry
│ ├── Yosina.php # Main API
│ └── Transliterators/ # Generated transliterators
│ ├── SpacesTransliterator.php
│ ├── RadicalsTransliterator.php
│ └── ...
├── tests/
│ └── BasicTest.php # Basic functionality tests
├── codegen/
│ └── generate.php # Code generator
├── composer.json # Composer configuration
└── README.md # This file
MIT License. See the main project README for details.
This is part of the larger Yosina project. Please ensure changes maintain compatibility across all language implementations.