elastic · lcawl · Apr 12, 2025 · Apr 11, 2025 · Apr 12, 2025 · Apr 12, 2025
diff --git a/docs/images/create-index-template.png b/docs/images/create-index-template.png
diff --git a/docs/images/hybrid-architecture.png b/docs/images/hybrid-architecture.png
diff --git a/docs/images/mongodb-connector-config.png b/docs/images/mongodb-connector-config.png
diff --git a/docs/images/mongodb-load-sample-data.png b/docs/images/mongodb-load-sample-data.png
diff --git a/docs/images/mongodb-sample-document.png b/docs/images/mongodb-sample-document.png
diff --git a/docs/images/token-graph-dns-invalid-ex.svg b/docs/images/token-graph-dns-invalid-ex.svg
diff --git a/docs/images/token-graph-dns-synonym-ex.svg b/docs/images/token-graph-dns-synonym-ex.svg
diff --git a/docs/images/use-a-connector-workflow.png b/docs/images/use-a-connector-workflow.png
diff --git a/...ns/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md b/...ns/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md
@@ -6,7 +6,7 @@ For a precision threshold of `c`, the implementation that we are using requires
 
 The following chart shows how the error varies before and after the threshold:
 
-![cardinality error](/images/cardinality_error.png "")
+![cardinality error](/reference/query-languages/images/cardinality_error.png "")
 
 For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
 this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,

diff --git a/...ons/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md b/...ons/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
@@ -12,6 +12,6 @@ When using this metric, there are a few guidelines to keep in mind:
 
 The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile:
 
-![percentiles error](/images/percentiles_error.png "")
+![percentiles error](/reference/query-languages/images/percentiles_error.png "")
 
 It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
diff --git a/docs/images/cardinality_error.png → ...aggregations/images/cardinality_error.png b/docs/images/cardinality_error.png → ...aggregations/images/cardinality_error.png
diff --git a/docs/images/percentiles_error.png → ...aggregations/images/percentiles_error.png b/docs/images/percentiles_error.png → ...aggregations/images/percentiles_error.png
diff --git a/docs/reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md b/docs/reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md
@@ -65,9 +65,23 @@ Computing exact counts requires loading values into a hash set and returning its
 
 This `cardinality` aggregation is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:
 
-:::{include} _snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md
-:::
+* configurable precision, which decides on how to trade memory for accuracy,
+* excellent accuracy on low-cardinality sets,
+* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
 
+For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
+
+The following chart shows how the error varies before and after the threshold:
+
+![cardinality error](/reference/aggregations/images/cardinality_error.png "")
+
+For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
+this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
+most datasets show consistently good accuracy. Also note that even with a threshold as low as 100,
+the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
+
+The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of
+hashes in a dataset can affect the accuracy of the cardinality.
 
 ## Pre-computed hashes [_pre_computed_hashes]
 

diff --git a/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md b/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md
@@ -175,8 +175,23 @@ GET latency/_search
 
 ## Percentiles are (usually) approximate [search-aggregations-metrics-percentile-aggregation-approximation]
 
-:::{include} /reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
-:::
+There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
+
+Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
+
+The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).
+
+When using this metric, there are a few guidelines to keep in mind:
+
+* Accuracy is proportional to `q(1-q)`. This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median
+* For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough).
+* As the quantity of values in a bucket grows, the algorithm begins to approximate the percentiles. It is effectively trading accuracy for memory savings. The exact level of inaccuracy is difficult to generalize, since it depends on your data distribution and volume of data being aggregated
+
+The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile:
+
+![percentiles error](images/percentiles_error.png "")
+
+It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
 
 ::::{warning}
 Percentile aggregations are also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.

diff --git a/docs/reference/query-languages/eql/eql-syntax.md b/docs/reference/query-languages/eql/eql-syntax.md
@@ -788,7 +788,7 @@ You cannot use EQL to search the values of a [`nested`](/reference/elasticsearch
 * If two pending sequences are in the same state at the same time, the most recent sequence overwrites the older one.
 * If the query includes [`by` fields](#eql-by-keyword), the query uses a separate state machine for each unique `by` field value.
 
-:::::{dropdown} **Example**
+:::::{dropdown} Example
 A data set contains the following `process` events in ascending chronological order:
 
 ```js
@@ -831,13 +831,13 @@ The query’s event items correspond to the following states:
 * State B:  `[process where process.name == "bash"]`
 * Complete: `[process where process.name == "cat"]`
 
-:::{image} /images/sequence-state-machine.svg
+:::{image} ../images/sequence-state-machine.svg
 :alt: sequence state machine
 :::
 
 To find matching sequences, the query uses separate state machines for each unique `user.name` value. Based on the data set, you can expect two state machines: one for the `root` user and one for `elkbee`.
 
-:::{image} /images/separate-state-machines.svg
+:::{image} ../images/separate-state-machines.svg
 :alt: separate state machines
 :::
 

diff --git a/docs/images/Exponential.png → ...ce/query-languages/images/Exponential.png b/docs/images/Exponential.png → ...ce/query-languages/images/Exponential.png
diff --git a/docs/images/Gaussian.png → ...rence/query-languages/images/Gaussian.png b/docs/images/Gaussian.png → ...rence/query-languages/images/Gaussian.png
diff --git a/docs/images/Linear.png → ...ference/query-languages/images/Linear.png b/docs/images/Linear.png → ...ference/query-languages/images/Linear.png
diff --git a/docs/reference/query-languages/images/cardinality_error.png b/docs/reference/query-languages/images/cardinality_error.png
diff --git a/docs/images/decay_2d.png → ...rence/query-languages/images/decay_2d.png b/docs/images/decay_2d.png → ...rence/query-languages/images/decay_2d.png
diff --git a/...mages/exponential-decay-keyword-exp-1.png → ...mages/exponential-decay-keyword-exp-1.png b/...mages/exponential-decay-keyword-exp-1.png → ...mages/exponential-decay-keyword-exp-1.png
diff --git a/...mages/exponential-decay-keyword-exp-2.png → ...mages/exponential-decay-keyword-exp-2.png b/...mages/exponential-decay-keyword-exp-2.png → ...mages/exponential-decay-keyword-exp-2.png
diff --git a/docs/images/lambda.png → ...ference/query-languages/images/lambda.png b/docs/images/lambda.png → ...ference/query-languages/images/lambda.png
diff --git a/docs/images/lambda_calc.png → ...ce/query-languages/images/lambda_calc.png b/docs/images/lambda_calc.png → ...ce/query-languages/images/lambda_calc.png
diff --git a/.../images/linear-decay-keyword-linear-1.png → .../images/linear-decay-keyword-linear-1.png b/.../images/linear-decay-keyword-linear-1.png → .../images/linear-decay-keyword-linear-1.png
diff --git a/.../images/linear-decay-keyword-linear-2.png → .../images/linear-decay-keyword-linear-2.png b/.../images/linear-decay-keyword-linear-2.png → .../images/linear-decay-keyword-linear-2.png
diff --git a/docs/images/normal-decay-keyword-gauss-1.png → ...s/images/normal-decay-keyword-gauss-1.png b/docs/images/normal-decay-keyword-gauss-1.png → ...s/images/normal-decay-keyword-gauss-1.png
diff --git a/docs/images/normal-decay-keyword-gauss-2.png → ...s/images/normal-decay-keyword-gauss-2.png b/docs/images/normal-decay-keyword-gauss-2.png → ...s/images/normal-decay-keyword-gauss-2.png
diff --git a/docs/reference/query-languages/images/percentiles_error.png b/docs/reference/query-languages/images/percentiles_error.png
diff --git a/docs/images/s_calc.png → ...ference/query-languages/images/s_calc.png b/docs/images/s_calc.png → ...ference/query-languages/images/s_calc.png
diff --git a/docs/images/separate-state-machines.svg → ...guages/images/separate-state-machines.svg b/docs/images/separate-state-machines.svg → ...guages/images/separate-state-machines.svg
diff --git a/docs/images/sequence-state-machine.svg → ...nguages/images/sequence-state-machine.svg b/docs/images/sequence-state-machine.svg → ...nguages/images/sequence-state-machine.svg
diff --git a/docs/images/sigma.png → ...eference/query-languages/images/sigma.png b/docs/images/sigma.png → ...eference/query-languages/images/sigma.png
diff --git a/docs/images/sigma_calc.png → ...nce/query-languages/images/sigma_calc.png b/docs/images/sigma_calc.png → ...nce/query-languages/images/sigma_calc.png
diff --git a/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md b/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md
@@ -360,42 +360,42 @@ The `DECAY_FUNCTION` determines the shape of the decay:
 `gauss`
 :   Normal decay, computed as:
 
-![Gaussian](/images/Gaussian.png "")
+![Gaussian](../images/Gaussian.png "")
 
-where ![sigma](/images/sigma.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
+where ![sigma](../images/sigma.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
 
-![sigma calc](/images/sigma_calc.png "")
+![sigma calc](../images/sigma_calc.png "")
 
 See [Normal decay, keyword `gauss`](#gauss-decay) for graphs demonstrating the curve generated by the `gauss` function.
 
 
 `exp`
 :   Exponential decay, computed as:
 
-![Exponential](/images/Exponential.png "")
+![Exponential](../images/Exponential.png "")
 
-where again the parameter ![lambda](/images/lambda.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
+where again the parameter ![lambda](../images/lambda.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
 
-![lambda calc](/images/lambda_calc.png "")
+![lambda calc](../images/lambda_calc.png "")
 
 See [Exponential decay, keyword `exp`](#exp-decay) for graphs demonstrating the curve generated by the `exp` function.
 
 
 `linear`
 :   Linear decay, computed as:
 
-![Linear](/images/Linear.png "").
+![Linear](../images/Linear.png "").
 
 where again the parameter `s` is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
 
-![s calc](/images/s_calc.png "")
+![s calc](../images/s_calc.png "")
 
 In contrast to the normal and exponential decay, this function actually sets the score to 0 if the field value exceeds twice the user given scale value.
 
 
 For single functions the three decay functions together with their parameters can be visualized like this (the field in this example called "age"):
 
-![decay 2d](/images/decay_2d.png "")
+![decay 2d](../images/decay_2d.png "")
 
 
 ### Multi-values fields [_multi_values_fields]
@@ -510,10 +510,10 @@ Next, we show how the computed score looks like for each of the three possible d
 
 When choosing `gauss` as the decay function in the above example, the contour and surface plot of the multiplier looks like this:
 
-:::{image} /images/normal-decay-keyword-gauss-1.png
+:::{image} ../images/normal-decay-keyword-gauss-1.png
 :::
 
-:::{image} /images/normal-decay-keyword-gauss-2.png
+:::{image} ../images/normal-decay-keyword-gauss-2.png
 :::
 
 Suppose your original search results matches three hotels :
@@ -529,20 +529,20 @@ Suppose your original search results matches three hotels :
 
 When choosing `exp` as the decay function in the above example, the contour and surface plot of the multiplier looks like this:
 
-:::{image} /images/exponential-decay-keyword-exp-1.png
+:::{image} ../images/exponential-decay-keyword-exp-1.png
 :::
 
-:::{image} /images/exponential-decay-keyword-exp-2.png
+:::{image} ../images/exponential-decay-keyword-exp-2.png
 :::
 
 ### Linear decay, keyword `linear` [linear-decay]
 
 When choosing `linear` as the decay function in the above example, the contour and surface plot of the multiplier looks like this:
 
-:::{image} /images/linear-decay-keyword-linear-1.png
+:::{image} ../images/linear-decay-keyword-linear-1.png
 :::
 
-:::{image} /images/linear-decay-keyword-linear-2.png
+:::{image} ../images/linear-decay-keyword-linear-2.png
 :::
 
 ## Supported fields for decay functions [_supported_fields_for_decay_functions]