Stempel Polish Analysis Plugin

The Stempel Analysis plugin integrates Lucene’s Stempel analysis module for Polish into elasticsearch.

It provides high quality stemming for Polish, based on the Egothor project.

`stempel` tokenizer and token filters

The plugin provides the polish analyzer and the polish_stem and polish_stop token filters, which are not configurable.

Reimplementing and extending the analyzers

The polish analyzer could be reimplemented as a custom analyzer that can then be extended and configured differently as follows:

PUT /stempel_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "rebuilt_stempel": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "polish_stop",
            "polish_stem"
          ]
        }
      }
    }
  }
}

`polish_stop` token filter

The polish_stop token filter filters out Polish stopwords (polish), and any other custom stopwords specified by the user. This filter only supports the predefined polish stopwords list. If you want to use a different predefined list, then use the {ref}/analysis-stop-tokenfilter.html[stop token filter] instead.

PUT /polish_stop_example
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "analyzer_with_stop": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "polish_stop"
            ]
          }
        },
        "filter": {
          "polish_stop": {
            "type": "polish_stop",
            "stopwords": [
              "_polish_",
              "jeść"
            ]
          }
        }
      }
    }
  }
}

GET polish_stop_example/_analyze
{
  "analyzer": "analyzer_with_stop",
  "text": "Gdzie kucharek sześć, tam nie ma co jeść."
}

The above request returns:

{
  "tokens" : [
    {
      "token" : "kucharek",
      "start_offset" : 6,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "sześć",
      "start_offset" : 15,
      "end_offset" : 20,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis-stempel.asciidoc

analysis-stempel.asciidoc

Stempel Polish Analysis Plugin

`stempel` tokenizer and token filters

Reimplementing and extending the analyzers

`polish_stop` token filter

Files

analysis-stempel.asciidoc

Latest commit

History

analysis-stempel.asciidoc

File metadata and controls

Stempel Polish Analysis Plugin

stempel tokenizer and token filters

Reimplementing and extending the analyzers

polish_stop token filter

`stempel` tokenizer and token filters

`polish_stop` token filter