Skip to content

[ML] Add extra debug logging to enable end-to-end profiling of jobs #29857

Open
@elasticmachine

Description

@elasticmachine

Original comment by @droberts195:

When an ML job is running time can be spent in the following areas:

  • Searching Elasticsearch indices for input LINK REDACTED
  • Pre-processing this input in the ML Java code prior to sending it to the ML C++ process
  • (Possibly) categorization inside the C++ process
  • "Data gathering" in the anomaly detection part of the C++ process
  • End-of-bucket processing in the C++ process
  • Result processing in the ML Java code

@richcollier has found that it is extremely hard to pinpoint which of these processing phases is responsible for an ML job running slower than real-time at a customer.

We calculate and store the end-of-bucket processing time in the C++ anomaly detection code, but time spent in other areas is not easy to determine (other than by using a profiler in a development environment).

Such troubleshooting would be greatly helped by the following instrumentation:

  1. Debug messages at the beginning and end of every Elasticsearch search that the datafeed does
  2. Debug messages at the beginning and end of post_data processing
  3. Some sort of instrumentation of the categorization code in the C++ process, with debug logging to report periodically how long it is taking

Item (3) is that hardest here, as the categorization and data gathering are both done in sequence per input record. Using a millisecond timer to time the categorization part is probably not accurate enough, and using our current nanosecond timer on some platforms (Windows) is quite slow.

But even if just items (1) and (2) are added then it will improve our ability to troubleshoot certain performance problems at customer sites.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions