Skip to content

HolodexNet/elasticsearch-analysis-jikyo-romaji

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

elasticsearch-analysis-jikyo-romaji

elasticsearch-analysis-jikyo-romaji is a token filter to romanize Japanese hiragana/katakana string by standard and IME typing style. See more about Romaji. Note that this token filter jikyo_romaji assumes to work with tokenizer: keywrod.

Build information

  • JDK 17 is used with Elasticsearch 8.1.2

Supported Elasticsearch versions

Installation

$ sudo bin/elasticsearch-plugin install https://github.com/jikyo/elasticsearch-analysis-jikyo-romaji/releases/download/v7.8.0/analysis-jikyo-romaji-7.8.0.zip

Usage

settings sample

            "your_analyzer": {
                "type": "custom",
                "tokenizer": "keyword",
                "char_filter": [
                    "icu_normalizer"
                ],
                "filter": [
                    "jikyo_romaji"
                ]
            },

_analyze sample

$ curl -H "Content-Type: application/json" -XGET "localhost:9200/_analyze?pretty" -d '
{
  "tokenizer" : "keyword",
  "filter" : ["jikyo_romaji"],
  "text" : "あっち"
}'
{
  "tokens" : [
    {
      "token" : "あっち",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "acchi",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "altsuchi",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "altuchi",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "altuti",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "atti",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "axtuti",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    }
  ]
}

Build

# Verification task
$ gradle check -Dtests.security.manager=false
# Build task
$ gradle assemble
# see build/distributions/

About

A token filter to romanize Japanese hiragana/katakana string.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%