Skip to content

Commit 478b011

Browse files
Remove examples from reindex API (#4542)
1 parent 1dd4972 commit 478b011

14 files changed

+3
-1284
lines changed

specification/_doc_ids/table.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -583,6 +583,7 @@ redact-processor,https://www.elastic.co/docs/reference/enrich-processor/redact-p
583583
regexp-syntax,https://www.elastic.co/docs/reference/query-languages/query-dsl/regexp-syntax
584584
register-repository,https://www.elastic.co/docs/deploy-manage/tools/snapshot-and-restore/self-managed
585585
registered-domain-processor,https://www.elastic.co/docs/reference/enrich-processor/registered-domain-processor
586+
reindex-indices,https://www.elastic.co/docs/reference/elasticsearch/rest-apis/reindex-indices
586587
relevance-scores,https://www.elastic.co/docs/explore-analyze/query-filter/languages/querydsl#relevance-scores
587588
remove-processor,https://www.elastic.co/docs/reference/enrich-processor/remove-processor
588589
remote-clusters-api-key,https://www.elastic.co/docs/deploy-manage/remote-clusters/remote-clusters-api-key

specification/_global/reindex/ReindexRequest.ts

Lines changed: 2 additions & 141 deletions
Original file line numberDiff line numberDiff line change
@@ -66,152 +66,13 @@ import { Destination, Source } from './types'
6666
* Note that the handling of other error types is unaffected by the `conflicts` property.
6767
* Additionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than `max_docs` until it has successfully indexed `max_docs` documents into the target or it has gone through every document in the source query.
6868
*
69-
* NOTE: The reindex API makes no effort to handle ID collisions.
70-
* The last document written will "win" but the order isn't usually predictable so it is not a good idea to rely on this behavior.
71-
* Instead, make sure that IDs are unique by using a script.
72-
*
73-
* **Running reindex asynchronously**
74-
*
75-
* If the request contains `wait_for_completion=false`, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task.
76-
* Elasticsearch creates a record of this task as a document at `_tasks/<task_id>`.
77-
*
78-
* **Reindex from multiple sources**
79-
*
80-
* If you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources.
81-
* That way you can resume the process if there are any errors by removing the partially completed source and starting over.
82-
* It also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.
83-
*
84-
* For example, you can use a bash script like this:
85-
*
86-
* ```
87-
* for index in i1 i2 i3 i4 i5; do
88-
* curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{
89-
* "source": {
90-
* "index": "'$index'"
91-
* },
92-
* "dest": {
93-
* "index": "'$index'-reindexed"
94-
* }
95-
* }'
96-
* done
97-
* ```
98-
*
99-
* **Throttling**
100-
*
101-
* Set `requests_per_second` to any positive decimal number (`1.4`, `6`, `1000`, for example) to throttle the rate at which reindex issues batches of index operations.
102-
* Requests are throttled by padding each batch with a wait time.
103-
* To turn off throttling, set `requests_per_second` to `-1`.
104-
*
105-
* The throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding.
106-
* The padding time is the difference between the batch size divided by the `requests_per_second` and the time spent writing.
107-
* By default the batch size is `1000`, so if `requests_per_second` is set to `500`:
108-
*
109-
* ```
110-
* target_time = 1000 / 500 per second = 2 seconds
111-
* wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
112-
* ```
113-
*
114-
* Since the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set.
115-
* This is "bursty" instead of "smooth".
116-
*
117-
* **Slicing**
118-
*
119-
* Reindex supports sliced scroll to parallelize the reindexing process.
120-
* This parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.
121-
*
122-
* NOTE: Reindexing from remote clusters does not support manual or automatic slicing.
123-
*
124-
* You can slice a reindex request manually by providing a slice ID and total number of slices to each request.
125-
* You can also let reindex automatically parallelize by using sliced scroll to slice on `_id`.
126-
* The `slices` parameter specifies the number of slices to use.
127-
*
128-
* Adding `slices` to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:
129-
*
130-
* * You can see these requests in the tasks API. These sub-requests are "child" tasks of the task for the request with slices.
131-
* * Fetching the status of the task for the request with `slices` only contains the status of completed slices.
132-
* * These sub-requests are individually addressable for things like cancellation and rethrottling.
133-
* * Rethrottling the request with `slices` will rethrottle the unfinished sub-request proportionally.
134-
* * Canceling the request with `slices` will cancel each sub-request.
135-
* * Due to the nature of `slices`, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.
136-
* * Parameters like `requests_per_second` and `max_docs` on a request with `slices` are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using `max_docs` with `slices` might not result in exactly `max_docs` documents being reindexed.
137-
* * Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.
138-
*
139-
* If slicing automatically, setting `slices` to `auto` will choose a reasonable number for most indices.
140-
* If slicing manually or otherwise tuning automatic slicing, use the following guidelines.
141-
*
142-
* Query performance is most efficient when the number of slices is equal to the number of shards in the index.
143-
* If that number is large (for example, `500`), choose a lower number as too many slices will hurt performance.
144-
* Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
145-
*
146-
* Indexing performance scales linearly across available resources with the number of slices.
147-
*
148-
* Whether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.
149-
*
150-
* **Modify documents during reindexing**
151-
*
152-
* Like `_update_by_query`, reindex operations support a script that modifies the document.
153-
* Unlike `_update_by_query`, the script is allowed to modify the document's metadata.
154-
*
155-
* Just as in `_update_by_query`, you can set `ctx.op` to change the operation that is run on the destination.
156-
* For example, set `ctx.op` to `noop` if your script decides that the document doesn’t have to be indexed in the destination. This "no operation" will be reported in the `noop` counter in the response body.
157-
* Set `ctx.op` to `delete` if your script decides that the document must be deleted from the destination.
158-
* The deletion will be reported in the `deleted` counter in the response body.
159-
* Setting `ctx.op` to anything else will return an error, as will setting any other field in `ctx`.
160-
*
161-
* Think of the possibilities! Just be careful; you are able to change:
162-
*
163-
* * `_id`
164-
* * `_index`
165-
* * `_version`
166-
* * `_routing`
167-
*
168-
* Setting `_version` to `null` or clearing it from the `ctx` map is just like not sending the version in an indexing request.
169-
* It will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.
170-
*
171-
* **Reindex from remote**
172-
*
173-
* Reindex supports reindexing from a remote Elasticsearch cluster.
174-
* The `host` parameter must contain a scheme, host, port, and optional path.
175-
* The `username` and `password` parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication.
176-
* Be sure to use HTTPS when using basic authentication or the password will be sent in plain text.
177-
* There are a range of settings available to configure the behavior of the HTTPS connection.
178-
*
179-
* When using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key.
180-
* Remote hosts must be explicitly allowed with the `reindex.remote.whitelist` setting.
181-
* It can be set to a comma delimited list of allowed remote host and port combinations.
182-
* Scheme is ignored; only the host and port are used.
183-
* For example:
184-
*
185-
* ```
186-
* reindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*"]
187-
* ```
188-
*
189-
* The list of allowed hosts must be configured on any nodes that will coordinate the reindex.
190-
* This feature should work with remote clusters of any version of Elasticsearch.
191-
* This should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.
192-
*
193-
* WARNING: Elasticsearch does not support forward compatibility across major versions.
194-
* For example, you cannot reindex from a 7.x cluster into a 6.x cluster.
195-
*
196-
* To enable queries sent to older versions of Elasticsearch, the `query` parameter is sent directly to the remote host without validation or modification.
197-
*
198-
* NOTE: Reindexing from remote clusters does not support manual or automatic slicing.
199-
*
200-
* Reindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb.
201-
* If the remote index includes very large documents you'll need to use a smaller batch size.
202-
* It is also possible to set the socket read timeout on the remote connection with the `socket_timeout` field and the connection timeout with the `connect_timeout` field.
203-
* Both default to 30 seconds.
204-
*
205-
* **Configuring SSL parameters**
206-
*
207-
* Reindex from remote supports configurable SSL settings.
208-
* These must be specified in the `elasticsearch.yml` file, with the exception of the secure settings, which you add in the Elasticsearch keystore.
209-
* It is not possible to configure SSL in the body of the reindex request.
69+
* Refer to the linked documentation for examples of how to reindex documents.
21070
* @rest_spec_name reindex
21171
* @availability stack since=2.3.0 stability=stable
21272
* @availability serverless stability=stable visibility=public
21373
* @index_privileges read, write
21474
* @doc_tag document
75+
* @ext_doc_id reindex-indices
21576
* @doc_id docs-reindex
21677
*/
21778
export interface Request extends RequestBase {

specification/_global/reindex/examples/request/ReindexRequestExample10.yaml

Lines changed: 0 additions & 98 deletions
This file was deleted.

specification/_global/reindex/examples/request/ReindexRequestExample11.yaml

Lines changed: 0 additions & 113 deletions
This file was deleted.

0 commit comments

Comments
 (0)