Skip to content

Commit b957085

Browse files
authored
230_Stemming/50_Controlling_stemming.asciidoc (elasticsearch-cn#460)
* translate 50_Controlling_stemming.asciidoc * improve
1 parent cac7362 commit b957085

File tree

1 file changed

+32
-45
lines changed

1 file changed

+32
-45
lines changed

230_Stemming/50_Controlling_stemming.asciidoc

Lines changed: 32 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,26 @@
11
[[controlling-stemming]]
2-
=== Controlling Stemming
2+
=== 控制词干提取
33

4-
Out-of-the-box stemming solutions are never perfect.((("stemming words", "controlling stemming"))) Algorithmic stemmers,
5-
especially, will blithely apply their rules to any words they encounter,
6-
perhaps conflating words that you would prefer to keep separate. Maybe, for
7-
your use case, it is important to keep `skies` and `skiing` as distinct words
8-
rather than stemming them both down to `ski` (as would happen with the
9-
`english` analyzer).
4+
开箱即用的词干提取方案永远也不可能完美。((("stemming words", "controlling stemming")))
5+
尤其是算法提取器,他们可以愉快的将规则应用于任何他们遇到的词,包含那些你希望保持独立的词。
6+
也许,在你的场景,保持独立的 `skies` 和 `skiing` 是重要的,你不希望把他们提取为 `ski` (正如 `english` 分析器那样)。
107

11-
The {ref}/analysis-keyword-marker-tokenfilter.html[`keyword_marker`] and
12-
{ref}/analysis-stemmer-override-tokenfilter.html[`stemmer_override`] token filters((("stemmer_override token filter")))((("keyword_marker token filter")))
13-
allow us to customize the stemming process.
8+
语汇单元过滤器 {ref}/analysis-keyword-marker-tokenfilter.html[`keyword_marker`]
9+
{ref}/analysis-stemmer-override-tokenfilter.html[`stemmer_override`] ((("stemmer_override token filter")))((("keyword_marker token filter")))
10+
能让我们自定义词干提取过程。
1411

1512
[[preventing-stemming]]
16-
==== Preventing Stemming
13+
==== 阻止词干提取
1714

18-
The <<stem-exclusion,`stem_exclusion`>> parameter for language analyzers (see
19-
<<configuring-language-analyzers>>) allowed ((("stemming words", "controlling stemming", "preventing stemming")))us to specify a list of words that
20-
should not be stemmed. Internally, these language analyzers use the
21-
{ref}/analysis-keyword-marker-tokenfilter.html[`keyword_marker` token filter]
22-
to mark the listed words as _keywords_, which prevents subsequent stemming
23-
token filters from touching those words.((("keyword_marker token filter", "preventing stemming of certain words")))
15+
语言分析器(查看 <<configuring-language-analyzers>>)的参数 <<stem-exclusion,`stem_exclusion`>>
16+
允许我们指定一个词语列表,让他们不被词干提取。((("stemming words", "controlling stemming", "preventing stemming")))
2417

25-
For instance, we can create a simple custom analyzer that uses the
26-
{ref}/analysis-porterstem-tokenfilter.html[`porter_stem`] token filter,
27-
but prevents the word `skies` from((("porter_stem token filter"))) being stemmed:
18+
在内部,这些语言分析器使用
19+
{ref}/analysis-keyword-marker-tokenfilter.html[`keyword_marker` 语汇单元过滤器]
20+
来标记这些词语列表为 _keywords_ ,用来阻止后续的词干提取过滤器来触碰这些词语。((("keyword_marker token filter", "preventing stemming of certain words")))
21+
22+
例如,我们创建一个简单自定义分析器,使用
23+
{ref}/analysis-porterstem-tokenfilter.html[`porter_stem`] 语汇单元过滤器,同时阻止 `skies` 的词干提取:((("porter_stem token filter")))
2824

2925
[source,json]
3026
------------------------------------------
@@ -52,41 +48,34 @@ PUT /my_index
5248
}
5349
}
5450
------------------------------------------
55-
<1> They `keywords` parameter could accept multiple words.
51+
<1> 参数 `keywords` 可以允许接收多个词语。
5652

57-
Testing it with the `analyze` API shows that just the word `skies` has
58-
been excluded from stemming:
53+
使用 `analyze` API 来测试,可以看到词 `skies` 没有被提取:
5954

6055
[source,json]
6156
------------------------------------------
6257
GET /my_index/_analyze?analyzer=my_english
6358
sky skies skiing skis <1>
6459
------------------------------------------
65-
<1> Returns: `sky`, `skies`, `ski`, `ski`
60+
<1> 返回: `sky`, `skies`, `ski`, `ski`
6661

6762
[[keyword-path]]
6863

6964
[TIP]
7065
==========================================
7166
72-
While the language analyzers allow ((("language analyzers", "stem_exclusion parameter")))us only to specify an array of words in the
73-
`stem_exclusion` parameter, the `keyword_marker` token filter also accepts a
74-
`keywords_path` parameter that allows us to store all of our keywords in a
75-
file. ((("keyword_marker token filter", "keywords_path parameter")))The file should contain one word per line, and must be present on every
76-
node in the cluster. See <<updating-stopwords>> for tips on how to update this
77-
file.
67+
虽然语言分析器只允许我们通过参数 `stem_exclusion` 指定一个词语列表来排除词干提取,((("language analyzers", "stem_exclusion parameter")))
68+
不过 `keyword_marker` 语汇单元过滤器同样还接收一个 `keywords_path` 参数允许我们将所有的关键字存在一个文件。
69+
这个文件应该是每行一个字,并且存在于集群的每个节点。查看 <<updating-stopwords>> 了解更新这些文件的提示。
7870
7971
==========================================
8072

8173
[[customizing-stemming]]
82-
==== Customizing Stemming
74+
==== 自定义提取
8375

84-
In the preceding example, we prevented `skies` from being stemmed, but perhaps we
85-
would prefer it to be stemmed to `sky` instead.((("stemming words", "controlling stemming", "customizing stemming"))) The
86-
{ref}/analysis-stemmer-override-tokenfilter.html[`stemmer_override`] token
87-
filter allows us ((("stemmer_override token filter")))to specify our own custom stemming rules. At the same time,
88-
we can handle some irregular forms like stemming `mice` to `mouse` and `feet`
89-
to `foot`:
76+
在上面的例子中,我们阻止了 `skies` 被词干提取,但是也许我们希望他能被提干为 `sky` 。((("stemming words", "controlling stemming", "customizing stemming"))) The
77+
{ref}/analysis-stemmer-override-tokenfilter.html[`stemmer_override`] 语汇单元过滤器允许我们指定自定义的提取规则。((("stemmer_override token filter")))
78+
与此同时,我们可以处理一些不规则的形式,如:`mice` 提取为 `mouse` 和 `feet` 到 `foot` :
9079

9180
[source,json]
9281
------------------------------------------
@@ -121,11 +110,9 @@ PUT /my_index
121110
GET /my_index/_analyze?analyzer=my_english
122111
The mice came down from the skies and ran over my feet <3>
123112
------------------------------------------
124-
<1> Rules take the form `original=>stem`.
125-
<2> The `stemmer_override` filter must be placed before the stemmer.
126-
<3> Returns `the`, `mouse`, `came`, `down`, `from`, `the`, `sky`,
127-
`and`, `ran`, `over`, `my`, `foot`.
128-
129-
TIP: Just as for the `keyword_marker` token filter, rules can be stored
130-
in a file whose location should be specified with the `rules_path`
131-
parameter.
113+
<1> 规则来自 `original=>stem` 。
114+
<2> `stemmer_override` 过滤器必须放置在词干提取器之前。
115+
<3> 返回 `the`, `mouse`, `came`, `down`, `from`, `the`, `sky`,
116+
`and`, `ran`, `over`, `my`, `foot` 。
117+
118+
TIP: 正如 `keyword_marker` 语汇单元过滤器,规则可以被存放在一个文件中,通过参数 `rules_path` 来指定位置。

0 commit comments

Comments
 (0)