-
Notifications
You must be signed in to change notification settings - Fork 86
apm: Document sampling.tail.discard_on_write_failure config #1453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
apm: Document sampling.tail.discard_on_write_failure config #1453
Conversation
@@ -53,6 +53,10 @@ If a setting is not supported by {{ech}}, you will get an error message when you | |||
Some settings that could break your cluster if set incorrectly are blocklisted. The following settings are generally safe in cloud environments. For detailed information about APM settings, check the [APM documentation](/solutions/observability/apm/configure-apm-server.md). | |||
:::: | |||
|
|||
### Version 9.1+ [ec_version_9_1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, a nit on config description. Please hold off from merging until 9.1 release
@@ -85,6 +85,18 @@ Policies map trace events to a sample rate. Each policy must specify a sample ra | |||
| APM Server binary | `sampling.tail.policies` | | |||
| Fleet-managed | `Policies` | | |||
|
|||
### Discard On Write Failure [sampling-tail-discard-on-write-failure-ref] | |||
|
|||
Defines the indexing behavior when trace events fail to be written to storage (e.g. when the storage limit is reached). When set to `false`, traces will be indexed, significantly increasing the indexing load. When set to `true`, traces will be discarded, there will be data loss potentially resulting in broken traces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defines the indexing behavior when trace events fail to be written to storage (e.g. when the storage limit is reached). When set to `false`, traces will be indexed, significantly increasing the indexing load. When set to `true`, traces will be discarded, there will be data loss potentially resulting in broken traces. | |
Defines the indexing behavior when trace events fail to be written to storage (e.g. when the storage limit is reached). When set to `false`, traces will be indexed regardless of the configured sample rate in policies, significantly increasing the indexing load. When set to `true`, traces will be discarded, there will be data loss potentially resulting in broken traces. |
nit on description. Trying to make the implication clear. Feel free to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I added a note to specify we bypass sampling
@@ -146,7 +146,7 @@ Due to [OpenTelemetry tail-based sampling limitations](/solutions/observability/ | |||
|
|||
Tail-based sampling (TBS), by definition, requires storing events locally temporarily, such that they can be retrieved and forwarded when a sampling decision is made. | |||
|
|||
In an APM Server implementation, the events are stored temporarily on disk instead of in memory for better scalability. Therefore, it requires local disk storage proportional to the APM event ingestion rate and additional memory to facilitate disk reads and writes. If the [storage limit](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-storage_limit-ref) is insufficient, sampling will be bypassed. | |||
In an APM Server implementation, the events are stored temporarily on disk instead of in memory for better scalability. Therefore, it requires local disk storage proportional to the APM event ingestion rate and additional memory to facilitate disk reads and writes. If the [storage limit](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-storage_limit-ref) is insufficient, trace events will be indexed or discarded based on the [discard on write failure](/solutions/observability/apm/tail-based-sampling.md#sampling-tail-discard-on-write-failure-ref) configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found one other place where the storage limit and sampling bypass was mentioned. Updated to describe the new behavior
Document
sampling.tail.discard_on_write_failure
config.I sourced the config explanation from here please let me know if the description is incorrect or unclear in any way.
Updated pages can be found in the docs preview here:
Checklist
Related issues
Part of elastic/apm-server#15330