Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Provide way of defining configuration for the pipeline #15921

Open
martin-gaievski opened this issue Sep 12, 2024 · 2 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@martin-gaievski
Copy link
Member

Is your feature request related to a problem? Please describe

If feature or custom workflow requires certain configuration of the pipeline it has to be done manually. Those additional steps may lead to issues: some steps can be missing or configuration may have errors. Another issue may come from lack of awareness or knowledge: customer not being aware of additional pipeline configuration.

Describe the solution you'd like

Template of the default pipeline configuration that is created by the system itself and is based on information provided by developer/engineer together with the code of new feature.

Related component

Search

Describe alternatives you've considered

Simplified version of the solution may be: a dependency between processors. If one processor depends on another processor, then that foundational/child processor is added to the pipeline configuration (or just executed) by the system. Such "depends on" relation can be part of the processor registration.

Example:
Can be extension for Factory class

it already accepts the map of processor factories final Map<String, Processor.Factory<SearchPhaseResultsProcessor>> processorFactories, but it can only return one instance of the Processor class. Factory can return collection of the processors depending on what it needs.

Additional context

Example of such use can would be search flow with a search pipeline, with existing SearchPhaseResultsProcessor that requires another Response processor to finalize its results.

Today we have to tell user to configure a pipeline in a certain way, something like following example:

PUT /_search/pipeline/nlp-search-pipeline

{
    "description": "My search pipeline",
    "phase_results_processors": [
        {
            "normalization-processor": {}
        }
    ],
    "response_processors": [
        {
            "processor_explain_publisher": {}
        }
    ]
}

It can be even more problematic if user already has a pipeline with one processor.

This is applicable to ingest pipelines as well.

@martin-gaievski martin-gaievski added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 12, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Sep 12, 2024
@martin-gaievski martin-gaievski changed the title [Feature Request] Provide way of defining required configuration for the pipeline [Feature Request] Provide way of defining configuration for the pipeline Sep 12, 2024
@msfroh
Copy link
Collaborator

msfroh commented Sep 18, 2024

[Search community meeting triage]:

@martin-gaievski -- Ingest pipelines have the pipeline processor that embeds a pipeline as a processor, allowing reuse. If that embedded pipeline contains a single processor, it's a convenient way of embedding a single pre-configured processor. We don't have that for search pipelines yet, but it was part of the original proposal.

Would that level of reuse address your needs? Or do we need more of a semi-configured template?

@martin-gaievski
Copy link
Member Author

I was thinking of a generic pipeline template that can have blocks of related processors configured for certain user cases.
Please check below my example from the issue description, it configures the search pipeline with phase results processor that does the score combination/normalization, and it automatically needs the response processor if we want to use explain flag for that processor.

{
    "description": "My search pipeline",
    "phase_results_processors": [
        {
            "normalization-processor": {}
        }
    ],
    "response_processors": [
        {
            "processor_explain_publisher": {}
        }
    ]
}

It can have simpler alternative of processor dependencies configured programmatically. To address pipeline configuration from my example we can do following:

  • extend Processor.Factory interface and add method that returns dependencies for this processor. It can be a collection of the Factory objects, each factory return instance of dependent processor. By default we return empty collection or null.
  • extend PipelineWithMetrics.create method, read dependent processors from the factory and if there are such processors then add them to the pipeline.
  • implement new method from step 1 in the "parent" processor factory

with such implementation we can completely skip configuration of the dependent processor and have only main one, in my example only "normalization-processor" and response_processors that does explain will be added programmatically:

{
    "description": "My search pipeline",
    "phase_results_processors": [
        {
            "normalization-processor": {}
        }
    ]
}

@msfroh let me know if such approach makes sense, I can start PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants