Skip to content

[ML] Data frame GET _stats response is confusing  #43767

Closed
@sophiec20

Description

@sophiec20

Found in 7.3.0-SNAPSHOT

Continuous data frame GET _stats returns the following. Hopefully we can make this response a little less confusing:

  • indexer_state - a value of started means it is idle whereas indexing means it is either searching or indexing. This is not precise, but is inherited from rollups so might be best to leave as is. Also having two states is confusing.
  • checkpoint - this is the last known completed (current) checkpoint. This could be confused with the currently underway checkpoint.
  • progress - this is the progress of the currently underway checkpoint. By calling it progress and leaving at the top level, it gives a false impression that it is somehow indicative of the whole transform. percent_complete indicates the progress of a single checkpoint. With batch data frames there is only ever 1 checkpoint, so the other values make sense. However for continuous the total_docs and doc_remaining should ideally be reset. progress could be renamed to checkpoint_progress or combined with the checkpointing info to keep together.
  • current_position - this is the position of the cursor for the composite agg and will only be visible whilst the composite agg search is scrolling. This is context for the currently underway checkpoint. If a composite agg is not in progress, then this entire object is missing. A small nit, but its sporadic existence is weird.
  • checkpointing
    • we show both timestamp_millis and time_upper_bound_millis where the latter is timestamp_millis - sync.delay. Do we need both?
    • current refers to the current completed checkpoint whereas in_progress refers to the currently underway checkpoint. This is confusing in conjunction with progress. Perhaps we could just keep upper and lower bound?
    • in_progress sometimes does not exist.
    • Should progress and current_position and maybe indexer_state sit here?
  • Many of these stats refer to the next checkpoint checkpoint: 101 - this is not clear

To summarise the priority points,

  • progress.total_docs progress.docs_remaining and is incorrect for continuous. This is checkpoint_progress for the next checkpoint.
  • there is slightly confusing usage of the terms current* and *progress which may lead to confusion when trying to operationally manage and/or troubleshoot.
{
  "count" : 1,
  "transforms" : [
    {
      "id" : "sycn1844",
      "state" : {
        "task_state" : "started",
        "indexer_state" : "indexing",
        "current_position" : {
          "hashtag" : "abcd1234"
        },
        "checkpoint" : 100,
        "progress" : {
          "total_docs" : 1900883,
          "docs_remaining" : 1722762,
          "percent_complete" : 9.370434687458408
        }
      },
      "stats" : {
        "pages_processed" : 559,
        "documents_processed" : 4207753,
        "documents_indexed" : 278783,
        "trigger_count" : 2,
        "index_time_in_ms" : 14467,
        "index_total" : 558,
        "index_failures" : 0,
        "search_time_in_ms" : 284161,
        "search_total" : 559,
        "search_failures" : 0
      },
      "checkpointing" : {
        "current" : {
          "timestamp" : "2019-06-28T16:44:12.497Z",
          "timestamp_millis" : 1561740252497,
          "time_upper_bound" : "2019-06-28T16:43:12.497Z",
          "time_upper_bound_millis" : 1561740192497
        },
        "in_progress" : {
          "timestamp" : "2019-06-28T16:50:29.172Z",
          "timestamp_millis" : 1561740629172,
          "time_upper_bound" : "2019-06-28T16:49:29.172Z",
          "time_upper_bound_millis" : 1561740569172
        },
        "operations_behind" : 27000
      }
    }
  ]
}

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions