Skip to content

[Ingest Node Pipelines] More detailed _simulate response #56004

Closed
@jloleysens

Description

@jloleysens

Describe the feature:

In Kibana, new UI was built to support management of ingest node pipelines (elastic/kibana#62321). The UI also gives users a way of simulating the pipeline while creating or editing it.

The simulate UI we are envisioning will provide users with detailed information about the path each document has traveled through the simulated pipeline. Including the following information:

  1. An indication of all processors that ran against the document. This will enable a tree view of the set (or subset) of processors.
  2. At each processor did it fail or not pass the conditional (if). This will enable ✅ and ❌ indications at each processor. Critically, it would be important to know why something went wrong.
  3. For a specific processor what was updated in the document (or a point-in-time look at the document post-processor) - something like this would enable a visual diff at each step.

On ES 8.0.0 snapshot this is an example request-response pair:

Request
{
	"pipeline": {
		"description": "_description",
		"processors": [
			{
				"set": {
					"field": "field3",
					"value": "_value3",
					"tag": "THIS SHOULD BE THERE"
				}
			},
			{
				"rename": {
					"if": "ctx.foo == 'bar'",
					"field": "foo1",
					"target_field": "fieldA",
					"tag": "field1_renamer",
					"on_failure": [
						{
							"set": {
								"field": "field4",
								"value": "THIS SHOULD BE THERE FROM FAILURE",
								"tag": "BLAH TEST"
							}
						}
					]
				}
			},
			{
				"set": {
					"field": "field3",
					"value": "_value3",
					"tag": "THIS SHOULD BE THERE"
				}
			}
		]
	},
	"docs": [
		{
			"_index": "index",
			"_id": "id",
			"_source": {
				"foo": "bar"
			}
		},
		{
			"_index": "index",
			"_id": "id",
			"_source": {
				"foo": "123"
			}
		}
	]
}
Response
{
  "docs": [
    {
      "processor_results": [
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        },
        {
          "tag": "field1_renamer",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar",
              "field4": "THIS SHOULD BE THERE FROM FAILURE"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        },
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar",
              "field4": "THIS SHOULD BE THERE FROM FAILURE"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        }
      ]
    },
    {
      "processor_results": [
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "123"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842165Z"
            }
          }
        },
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "123"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842165Z"
            }
          }
        }
      ]
    }
  ]
}

Given the above, we do not have a way of mapping these results back to the submitted pipeline to achieve 1 and 2, only a less detailed version of 3. (Where does THIS SHOULD BE THERE FROM FAILURE actually come from?)

One solution is that the response would be a structural mirror of the pipeline submitted to simulate. This would be simplest for consumers to map the result tree back to the submitted tree, each path would map back to a specific processor.

Alternatively, a flat structure could still work if we use tag as a placeholder for a serialised path which points to the processor in the submitted pipeline (e.g., 0.on_failure.1). However tag will still be exposed to users as a field they can enter values into (so we would be hijacking it for the call to _simulate). See this issue (#56000) concerning the multiple concerns of tag.

Assistance here would be greatly appreciated.

CC @jakelandis @talevy @cjcenizal

Metadata

Metadata

Assignees

Labels

:Data Management/Ingest NodeExecution or management of Ingest Pipelines including GeoIPTeam:Data ManagementMeta label for data/management teamTeam:Deployment ManagementMeta label for Management Experience - Deployment Management team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions