Skip to content

[DOCS] Fix broken images #126648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed docs/images/create-index-template.png
Binary file not shown.
Binary file removed docs/images/hybrid-architecture.png
Binary file not shown.
Binary file removed docs/images/mongodb-connector-config.png
Binary file not shown.
Binary file removed docs/images/mongodb-load-sample-data.png
Binary file not shown.
Binary file removed docs/images/mongodb-sample-document.png
Binary file not shown.
72 changes: 0 additions & 72 deletions docs/images/token-graph-dns-invalid-ex.svg

This file was deleted.

72 changes: 0 additions & 72 deletions docs/images/token-graph-dns-synonym-ex.svg

This file was deleted.

Binary file removed docs/images/use-a-connector-workflow.png
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ For a precision threshold of `c`, the implementation that we are using requires

The following chart shows how the error varies before and after the threshold:

![cardinality error](/images/cardinality_error.png "")
![cardinality error](/reference/query-languages/images/cardinality_error.png "")

For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ When using this metric, there are a few guidelines to keep in mind:

The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile:

![percentiles error](/images/percentiles_error.png "")
![percentiles error](/reference/query-languages/images/percentiles_error.png "")

It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,23 @@ Computing exact counts requires loading values into a hash set and returning its

This `cardinality` aggregation is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:

:::{include} _snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md
:::
* configurable precision, which decides on how to trade memory for accuracy,
* excellent accuracy on low-cardinality sets,
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.

For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.

The following chart shows how the error varies before and after the threshold:

![cardinality error](/reference/aggregations/images/cardinality_error.png "")

For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
most datasets show consistently good accuracy. Also note that even with a threshold as low as 100,
the error remains very low (1-6% as seen in the above graph) even when counting millions of items.

The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of
hashes in a dataset can affect the accuracy of the cardinality.

## Pre-computed hashes [_pre_computed_hashes]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,23 @@ GET latency/_search

## Percentiles are (usually) approximate [search-aggregations-metrics-percentile-aggregation-approximation]

:::{include} /reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
:::
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.

Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.

The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).

When using this metric, there are a few guidelines to keep in mind:

* Accuracy is proportional to `q(1-q)`. This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median
* For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough).
* As the quantity of values in a bucket grows, the algorithm begins to approximate the percentiles. It is effectively trading accuracy for memory savings. The exact level of inaccuracy is difficult to generalize, since it depends on your data distribution and volume of data being aggregated

The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile:

![percentiles error](images/percentiles_error.png "")

It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.

::::{warning}
Percentile aggregations are also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
Expand Down
6 changes: 3 additions & 3 deletions docs/reference/query-languages/eql/eql-syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -788,7 +788,7 @@ You cannot use EQL to search the values of a [`nested`](/reference/elasticsearch
* If two pending sequences are in the same state at the same time, the most recent sequence overwrites the older one.
* If the query includes [`by` fields](#eql-by-keyword), the query uses a separate state machine for each unique `by` field value.

:::::{dropdown} **Example**
:::::{dropdown} Example
A data set contains the following `process` events in ascending chronological order:

```js
Expand Down Expand Up @@ -831,13 +831,13 @@ The query’s event items correspond to the following states:
* State B: `[process where process.name == "bash"]`
* Complete: `[process where process.name == "cat"]`

:::{image} /images/sequence-state-machine.svg
:::{image} ../images/sequence-state-machine.svg
:alt: sequence state machine
:::

To find matching sequences, the query uses separate state machines for each unique `user.name` value. Based on the data set, you can expect two state machines: one for the `root` user and one for `elkbee`.

:::{image} /images/separate-state-machines.svg
:::{image} ../images/separate-state-machines.svg
:alt: separate state machines
:::

Expand Down
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
Original file line number Diff line number Diff line change
Expand Up @@ -360,42 +360,42 @@ The `DECAY_FUNCTION` determines the shape of the decay:
`gauss`
: Normal decay, computed as:

![Gaussian](/images/Gaussian.png "")
![Gaussian](../images/Gaussian.png "")

where ![sigma](/images/sigma.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
where ![sigma](../images/sigma.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`

![sigma calc](/images/sigma_calc.png "")
![sigma calc](../images/sigma_calc.png "")

See [Normal decay, keyword `gauss`](#gauss-decay) for graphs demonstrating the curve generated by the `gauss` function.


`exp`
: Exponential decay, computed as:

![Exponential](/images/Exponential.png "")
![Exponential](../images/Exponential.png "")

where again the parameter ![lambda](/images/lambda.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`
where again the parameter ![lambda](../images/lambda.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`

![lambda calc](/images/lambda_calc.png "")
![lambda calc](../images/lambda_calc.png "")

See [Exponential decay, keyword `exp`](#exp-decay) for graphs demonstrating the curve generated by the `exp` function.


`linear`
: Linear decay, computed as:

![Linear](/images/Linear.png "").
![Linear](../images/Linear.png "").

where again the parameter `s` is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset`

![s calc](/images/s_calc.png "")
![s calc](../images/s_calc.png "")

In contrast to the normal and exponential decay, this function actually sets the score to 0 if the field value exceeds twice the user given scale value.


For single functions the three decay functions together with their parameters can be visualized like this (the field in this example called "age"):

![decay 2d](/images/decay_2d.png "")
![decay 2d](../images/decay_2d.png "")


### Multi-values fields [_multi_values_fields]
Expand Down Expand Up @@ -510,10 +510,10 @@ Next, we show how the computed score looks like for each of the three possible d

When choosing `gauss` as the decay function in the above example, the contour and surface plot of the multiplier looks like this:

:::{image} /images/normal-decay-keyword-gauss-1.png
:::{image} ../images/normal-decay-keyword-gauss-1.png
:::

:::{image} /images/normal-decay-keyword-gauss-2.png
:::{image} ../images/normal-decay-keyword-gauss-2.png
:::

Suppose your original search results matches three hotels :
Expand All @@ -529,20 +529,20 @@ Suppose your original search results matches three hotels :

When choosing `exp` as the decay function in the above example, the contour and surface plot of the multiplier looks like this:

:::{image} /images/exponential-decay-keyword-exp-1.png
:::{image} ../images/exponential-decay-keyword-exp-1.png
:::

:::{image} /images/exponential-decay-keyword-exp-2.png
:::{image} ../images/exponential-decay-keyword-exp-2.png
:::

### Linear decay, keyword `linear` [linear-decay]

When choosing `linear` as the decay function in the above example, the contour and surface plot of the multiplier looks like this:

:::{image} /images/linear-decay-keyword-linear-1.png
:::{image} ../images/linear-decay-keyword-linear-1.png
:::

:::{image} /images/linear-decay-keyword-linear-2.png
:::{image} ../images/linear-decay-keyword-linear-2.png
:::

## Supported fields for decay functions [_supported_fields_for_decay_functions]
Expand Down
Loading