Skip to content

Instrument Elasticsearch with APM #84369

Closed
@pugnascotia

Description

@pugnascotia

NOTE: this issue will evolve as we scope out this work.

Description

"Why is Elasticsearch slow?" is a common question from users. We have tools to investigate certain aspects of this question already, for instance the search slowlog (good if the shard-level searches are slow) and the hot threads API (good if the slowness is an ongoing thing) but there are many gaps too. For instance, how would we discover that a Kibana dashboard triggers unreasonably many searches if each of those searches completes fairly quickly? How would we discover that requests are spending unexpectedly long in queues? How do we see if the slow steps all involve a particular node? What if that node is on a remote cluster? It's hard to take a structured approach to performance questions with the tools we have today.

Distributed tracing is a great way to answer questions of this nature. Elastic has a distributed tracing product, APM, which sits on top of Elasticsearch, but today Elasticsearch itself is opaque to APM: we cannot trace the execution of a request through Elasticsearch. Let's fix that.

This work will build on an existing exploratory project that instrumented a number of "tasks" in Elasticsearch. More types of tasks will be instrumented, as well as requests / responses at the REST level.

Tasks

Out-of-scope

The focus of this work is making is instrumenting Elasticsearch for Elastic's own purposes. Making it available to users and licensing it for that purpose is not currently in scope.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions