Handle inference parameters #580

Calychas · 2024-08-13T07:56:31Z

Description

Closes #445

This PR enables passing in inference params during prediction to the packaged model as an additional input to the inference pipeline.

Development notes

This change is fully backwards compatible. One can choose to use inference params during prediction by supplying params_input_name in addition to the input_name during PipelineML creation. The inference pipeline then would need to have a secondary input called after params_input_name, where the params dict passed in predict would be saved.

The motivation for this change was to quickly have a way to pass parameters at runtime that influences the model logic (e.g. choosing different endpoint to hit for data extraction). Thought it's worth sharing.

Although no changes to unit tests were made yet, the ones that exist already are passing, which should indicate that it's indeed backwards compatible. The proposed solution works in a real-life kedro project where the model is being packaged and served.

No time was spent to think about #445 (comment). User can pass anything he pleases that can be saved as a separate dict later to be used in the pipeline. It's impossible to overwrite other catalog entries or parameters.

Opening as a draft PR. I'm happy to collaborate on this more, but unsure if I'll have capacity to finish all the remaining tasks

Checklist

Read the contributing guidelines
Open this PR as a 'Draft Pull Request' if it is work-in-progress
Update the documentation to reflect the code changes
Add a description of this change and add your name to the list of supporting contributions in the CHANGELOG.md file. Please respect Keep a Changelog guidelines.
Add tests to cover your changes

Notice

I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

Galileo-Galilei · 2024-08-13T12:57:59Z

Thank you very much for taking a stab at it. I really want to make this work, but the main reason I haven't made it yet is that this change :

breaks existing model (i.e. PipelineML trained with current version of kedro mlflow ) because it changes the signature of the serialized (as a pickle) python object by adding an argument to the method load_context. This implies that anyone who wants to upgrade kedro-mlflow needs to retrain their existing model. This has caused a lot of trouble in the past and should be done very carefully
requires dropping mlflow 1.X. I should do it at one point because of the many open CVE, and it becomes less and less an issue since mlflow 2.X exists for a while now, but it still something I should warn people in advance.

I am considering creating a develop branch to be released at the same time of kedro 0.20 and which will drop all mlflow 1 support, but this is an extra maintenance burden and I am having hard time to maintain the repo recently so I have to find the best trade off.

Galileo-Galilei · 2024-10-01T21:31:52Z

Hi, I've look at the implementation and I'd like to go even further to make it more generic. There are two different features:

supporting passing params to the predict method of custom models
supporting default value by propagating these paramas along pipeline_ml_factory.

For now, you only support one single param which must be a kedro parameter.

I think we should support something much more flexible to enable passing objects like that :

# pseudo code, just to demonstrate the idea
def predict(self, context, model_input, params: Optional[dict[str, Any]] = None):

        runner_params=params.get("runner_params")
        hook_params=params.get("hook_params")
        extra_params=params.get("extra_params")

        hook_manager = _create_hook_manager(**hook_params)

        for key, value in extra_params.items():
             if key in self.loaded_catalog.inputs:
                 self.loaded_catalog.save(name=key, data=value)
       
       runner=runner_params.get("runner_class")
        run_output = runner.run(
            pipeline=self.pipeline,
            catalog=self.loaded_catalog,
            **runner_params.get("runner_kwargs"))

This would help solving #587 #404 and your use case with the same implementation.

Calychas · 2024-10-15T10:27:01Z

Hey @Galileo-Galilei! That's a really nice suggestion. I like the idea of multiple types of params solving different problems (hooks is a big one, that we would actually need for kedro-pandera).

#587 may be already solvable with the current implementation (see this), but I like the idea of specifying runner_params explicitly.

For now, you only support one single param which must be a kedro parameter.

Kedro dataset to be exact, but yes, only a single one

Galileo-Galilei · 2024-10-15T19:52:50Z

If we agree to do this as described above, I'd like to "make it right" directly in this PR to avoid introducing the extra params_input_name and deprecate it immediately. I can help if necessary.

Good catch for 587, I did not remember the runner was configurable! But to be honest this feels very bad to store it at training time, it makes much more sense to specify it at inference time (because tyou don't want to retrain you model just to change the runner!)

Galileo-Galilei · 2025-02-06T07:40:17Z

Closes in favour of #612

Calychas force-pushed the inference-params branch from 309a513 to 1f2f8b3 Compare September 23, 2024 12:01

Calychas added 2 commits October 15, 2024 11:41

handle inference parameters

f46ced6

fix: compat for python <3.10

65f8ae1

Calychas force-pushed the inference-params branch from f0e0e4e to 65f8ae1 Compare October 15, 2024 09:41

Calychas force-pushed the inference-params branch 2 times, most recently from b2138c1 to 65f8ae1 Compare October 15, 2024 10:49

Galileo-Galilei mentioned this pull request Oct 15, 2024

Enable asynchronous mode when serving inference pipeline #587

Closed

Calychas mentioned this pull request Oct 31, 2024

WIP: enabling hooks for KedroPipelineModel serving #608

Draft

6 tasks

Galileo-Galilei mentioned this pull request Nov 1, 2024

Add inference parameters to KedroPipelineModel #445

Closed

Galileo-Galilei mentioned this pull request Nov 30, 2024

✨ Enable to override params at predict time in KedroPipelineModel #612

Merged

6 tasks

Galileo-Galilei closed this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle inference parameters #580

Handle inference parameters #580

Calychas commented Aug 13, 2024

Galileo-Galilei commented Aug 13, 2024

Galileo-Galilei commented Oct 1, 2024 •

edited

Loading

Calychas commented Oct 15, 2024 •

edited

Loading

Galileo-Galilei commented Oct 15, 2024

Galileo-Galilei commented Feb 6, 2025

Handle inference parameters #580

Handle inference parameters #580

Conversation

Calychas commented Aug 13, 2024

Description

Development notes

Checklist

Notice

Galileo-Galilei commented Aug 13, 2024

Galileo-Galilei commented Oct 1, 2024 • edited Loading

Calychas commented Oct 15, 2024 • edited Loading

Galileo-Galilei commented Oct 15, 2024

Galileo-Galilei commented Feb 6, 2025

Galileo-Galilei commented Oct 1, 2024 •

edited

Loading

Calychas commented Oct 15, 2024 •

edited

Loading