[Feature Request] Provide way of defining configuration for the pipeline #15921

martin-gaievski · 2024-09-12T23:31:00Z

Is your feature request related to a problem? Please describe

If feature or custom workflow requires certain configuration of the pipeline it has to be done manually. Those additional steps may lead to issues: some steps can be missing or configuration may have errors. Another issue may come from lack of awareness or knowledge: customer not being aware of additional pipeline configuration.

Describe the solution you'd like

Template of the default pipeline configuration that is created by the system itself and is based on information provided by developer/engineer together with the code of new feature.

Related component

Search

Describe alternatives you've considered

Simplified version of the solution may be: a dependency between processors. If one processor depends on another processor, then that foundational/child processor is added to the pipeline configuration (or just executed) by the system. Such "depends on" relation can be part of the processor registration.

Example:
Can be extension for Factory class

it already accepts the map of processor factories final Map<String, Processor.Factory<SearchPhaseResultsProcessor>> processorFactories, but it can only return one instance of the Processor class. Factory can return collection of the processors depending on what it needs.

Additional context

Example of such use can would be search flow with a search pipeline, with existing SearchPhaseResultsProcessor that requires another Response processor to finalize its results.

Today we have to tell user to configure a pipeline in a certain way, something like following example:

PUT /_search/pipeline/nlp-search-pipeline

{
    "description": "My search pipeline",
    "phase_results_processors": [
        {
            "normalization-processor": {}
        }
    ],
    "response_processors": [
        {
            "processor_explain_publisher": {}
        }
    ]
}

It can be even more problematic if user already has a pipeline with one processor.

This is applicable to ingest pipelines as well.

The text was updated successfully, but these errors were encountered:

msfroh · 2024-09-18T16:41:51Z

[Search community meeting triage]:

@martin-gaievski -- Ingest pipelines have the pipeline processor that embeds a pipeline as a processor, allowing reuse. If that embedded pipeline contains a single processor, it's a convenient way of embedding a single pre-configured processor. We don't have that for search pipelines yet, but it was part of the original proposal.

Would that level of reuse address your needs? Or do we need more of a semi-configured template?

martin-gaievski · 2025-01-07T02:04:08Z

I was thinking of a generic pipeline template that can have blocks of related processors configured for certain user cases.
Please check below my example from the issue description, it configures the search pipeline with phase results processor that does the score combination/normalization, and it automatically needs the response processor if we want to use explain flag for that processor.

{
    "description": "My search pipeline",
    "phase_results_processors": [
        {
            "normalization-processor": {}
        }
    ],
    "response_processors": [
        {
            "processor_explain_publisher": {}
        }
    ]
}

It can have simpler alternative of processor dependencies configured programmatically. To address pipeline configuration from my example we can do following:

extend Processor.Factory interface and add method that returns dependencies for this processor. It can be a collection of the Factory objects, each factory return instance of dependent processor. By default we return empty collection or null.
extend PipelineWithMetrics.create method, read dependent processors from the factory and if there are such processors then add them to the pipeline.
implement new method from step 1 in the "parent" processor factory

with such implementation we can completely skip configuration of the dependent processor and have only main one, in my example only "normalization-processor" and response_processors that does explain will be added programmatically:

{
    "description": "My search pipeline",
    "phase_results_processors": [
        {
            "normalization-processor": {}
        }
    ]
}

@msfroh let me know if such approach makes sense, I can start PR then.

martin-gaievski added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 12, 2024

github-actions bot added the Search Search query, autocomplete ...etc label Sep 12, 2024

github-project-automation bot added this to Search Project Board Sep 12, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Sep 12, 2024

martin-gaievski changed the title ~~[Feature Request] Provide way of defining required configuration for the pipeline~~ [Feature Request] Provide way of defining configuration for the pipeline Sep 12, 2024

This was referenced Sep 12, 2024

[FEATURE] Provide way of defining configuration for the pipeline opensearch-project/neural-search#904

Open

[RFC] Explainability for Hybrid query opensearch-project/neural-search#905

Open

getsaurabh02 removed the untriaged label Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Provide way of defining configuration for the pipeline #15921

[Feature Request] Provide way of defining configuration for the pipeline #15921

martin-gaievski commented Sep 12, 2024

msfroh commented Sep 18, 2024

martin-gaievski commented Jan 7, 2025

[Feature Request] Provide way of defining configuration for the pipeline #15921

[Feature Request] Provide way of defining configuration for the pipeline #15921

Comments

martin-gaievski commented Sep 12, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

msfroh commented Sep 18, 2024

martin-gaievski commented Jan 7, 2025