Description
Describe the feature:
In Kibana, new UI was built to support management of ingest node pipelines (elastic/kibana#62321). The UI also gives users a way of simulating the pipeline while creating or editing it.
The simulate UI we are envisioning will provide users with detailed information about the path each document has traveled through the simulated pipeline. Including the following information:
- An indication of all processors that ran against the document. This will enable a tree view of the set (or subset) of processors.
- At each processor did it fail or not pass the conditional (
if
). This will enable ✅ and ❌ indications at each processor. Critically, it would be important to know why something went wrong. - For a specific processor what was updated in the document (or a point-in-time look at the document post-processor) - something like this would enable a visual diff at each step.
On ES 8.0.0
snapshot this is an example request-response pair:
Request
{
"pipeline": {
"description": "_description",
"processors": [
{
"set": {
"field": "field3",
"value": "_value3",
"tag": "THIS SHOULD BE THERE"
}
},
{
"rename": {
"if": "ctx.foo == 'bar'",
"field": "foo1",
"target_field": "fieldA",
"tag": "field1_renamer",
"on_failure": [
{
"set": {
"field": "field4",
"value": "THIS SHOULD BE THERE FROM FAILURE",
"tag": "BLAH TEST"
}
}
]
}
},
{
"set": {
"field": "field3",
"value": "_value3",
"tag": "THIS SHOULD BE THERE"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"foo": "bar"
}
},
{
"_index": "index",
"_id": "id",
"_source": {
"foo": "123"
}
}
]
}
Response
{
"docs": [
{
"processor_results": [
{
"tag": "THIS SHOULD BE THERE",
"doc": {
"_index": "index",
"_id": "id",
"_source": {
"field3": "_value3",
"foo": "bar"
},
"_ingest": {
"pipeline": "_simulate_pipeline",
"timestamp": "2020-04-30T08:26:53.842161Z"
}
}
},
{
"tag": "field1_renamer",
"doc": {
"_index": "index",
"_id": "id",
"_source": {
"field3": "_value3",
"foo": "bar",
"field4": "THIS SHOULD BE THERE FROM FAILURE"
},
"_ingest": {
"pipeline": "_simulate_pipeline",
"timestamp": "2020-04-30T08:26:53.842161Z"
}
}
},
{
"tag": "THIS SHOULD BE THERE",
"doc": {
"_index": "index",
"_id": "id",
"_source": {
"field3": "_value3",
"foo": "bar",
"field4": "THIS SHOULD BE THERE FROM FAILURE"
},
"_ingest": {
"pipeline": "_simulate_pipeline",
"timestamp": "2020-04-30T08:26:53.842161Z"
}
}
}
]
},
{
"processor_results": [
{
"tag": "THIS SHOULD BE THERE",
"doc": {
"_index": "index",
"_id": "id",
"_source": {
"field3": "_value3",
"foo": "123"
},
"_ingest": {
"pipeline": "_simulate_pipeline",
"timestamp": "2020-04-30T08:26:53.842165Z"
}
}
},
{
"tag": "THIS SHOULD BE THERE",
"doc": {
"_index": "index",
"_id": "id",
"_source": {
"field3": "_value3",
"foo": "123"
},
"_ingest": {
"pipeline": "_simulate_pipeline",
"timestamp": "2020-04-30T08:26:53.842165Z"
}
}
}
]
}
]
}
Given the above, we do not have a way of mapping these results back to the submitted pipeline to achieve 1 and 2, only a less detailed version of 3. (Where does THIS SHOULD BE THERE FROM FAILURE
actually come from?)
One solution is that the response would be a structural mirror of the pipeline submitted to simulate. This would be simplest for consumers to map the result tree back to the submitted tree, each path would map back to a specific processor.
Alternatively, a flat structure could still work if we use tag
as a placeholder for a serialised path which points to the processor in the submitted pipeline (e.g., 0.on_failure.1
). However tag
will still be exposed to users as a field they can enter values into (so we would be hijacking it for the call to _simulate
). See this issue (#56000) concerning the multiple concerns of tag.
Assistance here would be greatly appreciated.