Implement Elasticsearch query tracing to a source in Kibana

While this issue aims to address https://github.com/elastic/kibana/issues/97934 main concern: `provide the ability to trace ES query back to a source in Kibana code that initiated the request`, we want to lay the foundation for e2e tracing in the whole Stack. To make it happen, Kibana will rely on the built-in capabilities of APM-RUM and nodejs APM agents, and their integration with Elasticsearch service. 

### High-level picture
#### Kibana Frontend

Context should allow Kibana users to unambiguously identify the source of a query in the Kibana App in the browser, Kibana server, or the `task manager`. 
```typescript
interface KibanaExecutionContext {
  // kibana entity type
  type: 'visualization' | 'actions' | 'alert' | ..;
  // kibana entity id
  id: string;
  // human readable description, a vis title, action name,
  description: string;
  // in browser - url to navigate to a current page, on server - endpoint path, for task: task SO url
  url?: string;
}
```
APM RUM agent doesn't provide support for async context propagation in the browser. Kibana will have to implement manual context passing.

A plugin creates an `execution context` object with API provided by Core. Returned value is opaque to the plugin.
```typescript
const executionContext: KibanaExecutionContext = createExecutionContext({ .. })
```

Obtained `execution context` should be passed to the Kibana server manually through all the layers of abstractions in Kibana. Kibana sets it as a custom request header before issuing a request to the Kibana server:
```js
await fetch('/api/something', {
  headers: {
    'kbn-context': executionContext.toString(),
  }
});
await fetch('/api/something', {
  method: 'post',
  body: {
    contest: executionContext.toJSON(),
  }
});
```
For the first implementation, we start with `context` capturing the single context level - `visualizations`. 
In the next iteration, we can add support for nested execution contexts. It can be used to compose execution context relationships across different apps. 
`Application service context` --> `Dashboard context` --> `Visualization context`.


#### Server-side
**Depends on**: APM agents can be used without APM server https://github.com/elastic/apm-agent-nodejs/issues/2101
- The APM Node.js agent intercepts all the incoming requests and creates an APM transaction. 
- The APM Node.js agent instruments all the requests to the Elasticsearch server to pass the current transaction id via the `traceparent` header.
- Elasticsearch team is working on adding support for tracing headers https://github.com/elastic/elasticsearch/pull/74210
We need to get their commitment shipping it in `v7.15`.
- This `traceparent` header will be used for log correlation across Kibana and Elasticsearch server. To make it possible, Kibana should add `trace.id` to the log records.
**TODO**: discuss with the Elasticsearch team in what form they are going to include it into the Elasticsearch logs. It's likely will be present in ECS-JSON logs by default. Presence in the Text logs is discussable. 
- Kibana intercepts all the incoming requests and retrieves `execution context` from the `'kbn-context'` header. The context + `trace.id` are emitted to Kibana logs. The minimal subset of the `execution context` data, in the form `kibana:type:name:id` (`kibana:visualization:gauge:1234-5678`, for example) is attached to the current APM transaction as `kibanaContext` label.
- Kibana server plugins may create `execution context` on the server-side as well. The context passing works in the same way as for the client-side counterpart.
- Whenever Kibana requests Elasticsearch server, Kibana adds the `kibanaContext` label to `x-opaque-id` header. It allows Stack users to identify the source of a query in `slowlogs` without the necessity to inspect Kibana logs.
**TODO**: discuss with the Elasticsearch team `trace.id` is included in the `slowlogs` as well.

### Instrumentation
The list of instrumentation points should be discussed with every team separately. We are primarily interested in instrumenting plugins that may cause performance problems in Elasticsearch:

-  Visualizations
  - [x] vis_type_metric
  - [x]  vis_type_table
  - [x] vis_type_tagcloud
  - [x] vis_type_timelion
  - [x] vis_type_timeseries
  - [x] vis_type_vega
  - [x] vis_type_vislib
  - [x] vis_type_xy
  - [x] vis_type_pie
  - [ ] input_control_vis
- [x] Lens
- [x] Discover
- [ ] Kibana server request handlers
- Tasks
  - [x] Actions
  - [x] Alerts
  - [ ] Reporting
- [ ] Canvas
- [ ] Maps
- [ ] Observability
- [ ] APM
- [ ] Security solutions
- [ ] ML
- [ ] Logs
- [ ] Metrics
- [ ] Console

During the initial implementation, the Core team will instrument several plugins and implements integration testing as an example. Later, we will create separate issues for code owners to help us with this work.

### List of sub-tasks

#### Context propagation
- [x] Implement context management service on the client-side https://github.com/elastic/kibana/issues/102626
- [x] Implement manual context propagation for Kibana Entities: https://github.com/elastic/kibana/issues/102629
- [ ] Provide recommendations on debugging Kibana with data sent to Elasticsearch `slowlogs`

#### Log correlation
- [ ] update APM nodejs agent https://github.com/elastic/kibana/issues/102624
- [ ] Refactor logging system to include `trace.id` in the logs for log correlation purposes. https://github.com/elastic/kibana/issues/102699
   - [ ] align with the Elasticsearch team on the logging format 
- [ ] Provide settings to run Kibana with APM agent enabled, APM agent disabled, APM agent working in the tracing mode (without sending data to APM server) https://github.com/elastic/kibana/issues/102704
- [ ] Measure the solution overhead and its influence on the Kibana performance https://github.com/elastic/kibana/issues/102706
- [ ] updated APM RUM agent https://github.com/elastic/kibana/issues/102625


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Elasticsearch query tracing to a source in Kibana #101587

mshustov
openedon Jun 8, 2021

High-level picture

Kibana Frontend

Server-side

Instrumentation

List of sub-tasks

Context propagation

Log correlation

Assignees

Labels

Type

Projects

Milestone

Relationships

Development