-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Sort ingest pipeline stats by time spent executing #88035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort ingest pipeline stats by time spent executing #88035
Conversation
This commit changes the /_nodes/stats API to return the pipelines in descending order by the count of how often they are used. If a pipeline ties another pipeline, they are sorted by ingest time. This ensures that the most used and most expensive pipelines show up first in the JSON response (to help tracking down which pipelines may be most prevalent). For example: ```json { "nodes" : { "yiv09_ETRKSv4q8kz0d59A" : { ... "ingest" : { "total" : {...}, "pipelines" : { "set-name" : { "count" : 4, // <-- sorted first by this "time_in_millis" : 3, // <-- then by this ... }, "set-face" : { "count" : 2, // <-- less than 4 "time_in_millis" : 8, ... } ```
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @dakrone, I've created a changelog YAML for you. |
@elasticmachine run elasticsearch-ci/bwc |
final IngestStats.Stats p1Stats = p1.stats; | ||
final int ingestCountCompare = Long.compare(p2Stats.ingestCount, p1Stats.ingestCount); | ||
if (ingestCountCompare == 0) { | ||
return Long.compare(p2Stats.ingestTimeInMillis, p1Stats.ingestTimeInMillis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether ingestTimeInMillis
should be the primary sort property?
Because I think now pipelines that are used often but are relatively fast show up before pipelines that are used less often but are slower. I think the latter is a more important signal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was on the fence about which one to be the primary and which one to be the fallback. I could go either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @jakelandis would have an opinion here too, since he's looked more into ingest performance than I have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer ingestTimeInMillis
to be the primary sort property in this case, because I think pipelines with higher time spent are more interesting to look at first irregardless of how often these pipelines were used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that works for me too, I'll swap the order here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine update branch |
This commit changes the
/_nodes/stats
API to return the pipelines in descending order by the timespent executing. If a pipeline ties another pipeline, they are sorted by ingest count.
This ensures that the most used and most expensive pipelines show up first in the JSON response (to
help tracking down which pipelines may be most prevalent).
For example: