[Ingest Node Pipelines] More detailed _simulate response

**Describe the feature**:

In Kibana, new UI was built to support management of ingest node pipelines (https://github.com/elastic/kibana/pull/62321). The UI also gives users a way of simulating the pipeline while creating or editing it.

The simulate UI we are envisioning will provide users with detailed information about the path each document has traveled through the simulated pipeline. Including the following information:

1. An indication of all processors that ran against the document. This will enable a tree view of the set (or subset) of processors.
2. At each processor did it fail or not pass the conditional (`if`). This will enable ✅  and ❌ indications at each processor. Critically, it would be important to know _why_ something went wrong.
3. For a specific processor what was updated in the document (or a point-in-time look at the document post-processor) - something like this would enable a visual diff at each step.

On ES `8.0.0` snapshot this is an example request-response pair:

<details>
<summary>Request</summary>

```json
{
	"pipeline": {
		"description": "_description",
		"processors": [
			{
				"set": {
					"field": "field3",
					"value": "_value3",
					"tag": "THIS SHOULD BE THERE"
				}
			},
			{
				"rename": {
					"if": "ctx.foo == 'bar'",
					"field": "foo1",
					"target_field": "fieldA",
					"tag": "field1_renamer",
					"on_failure": [
						{
							"set": {
								"field": "field4",
								"value": "THIS SHOULD BE THERE FROM FAILURE",
								"tag": "BLAH TEST"
							}
						}
					]
				}
			},
			{
				"set": {
					"field": "field3",
					"value": "_value3",
					"tag": "THIS SHOULD BE THERE"
				}
			}
		]
	},
	"docs": [
		{
			"_index": "index",
			"_id": "id",
			"_source": {
				"foo": "bar"
			}
		},
		{
			"_index": "index",
			"_id": "id",
			"_source": {
				"foo": "123"
			}
		}
	]
}
```

</details>

<details>
<summary>Response</summary>

```json
{
  "docs": [
    {
      "processor_results": [
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        },
        {
          "tag": "field1_renamer",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar",
              "field4": "THIS SHOULD BE THERE FROM FAILURE"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        },
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar",
              "field4": "THIS SHOULD BE THERE FROM FAILURE"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        }
      ]
    },
    {
      "processor_results": [
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "123"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842165Z"
            }
          }
        },
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "123"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842165Z"
            }
          }
        }
      ]
    }
  ]
}
```

</details>

Given the above, we do not have a way of mapping these results back to the submitted pipeline to achieve 1 and 2, only a less detailed version of 3. (Where does `THIS SHOULD BE THERE FROM FAILURE` actually come from?)

One solution is that the response would be a structural mirror of the pipeline submitted to simulate. This would be simplest for consumers to map the result tree back to the submitted tree, each path would map back to a specific processor.

Alternatively, a flat structure could still work if we use `tag` as a placeholder for a serialised path which points to the processor in the submitted pipeline (e.g., `0.on_failure.1`). However `tag` will still be exposed to users as a field they can enter values into (so we would be hijacking it for the call to `_simulate`). See this issue (https://github.com/elastic/elasticsearch/issues/56000) concerning the multiple concerns of tag.

Assistance here would be greatly appreciated.

CC @jakelandis @talevy @cjcenizal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Ingest Node Pipelines] More detailed _simulate response #56004

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Ingest Node Pipelines] More detailed _simulate response #56004

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions