Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle inference parameters #580

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Calychas
Copy link
Contributor

Description

Closes #445

This PR enables passing in inference params during prediction to the packaged model as an additional input to the inference pipeline.

Development notes

This change is fully backwards compatible. One can choose to use inference params during prediction by supplying params_input_name in addition to the input_name during PipelineML creation. The inference pipeline then would need to have a secondary input called after params_input_name, where the params dict passed in predict would be saved.

The motivation for this change was to quickly have a way to pass parameters at runtime that influences the model logic (e.g. choosing different endpoint to hit for data extraction). Thought it's worth sharing.

Although no changes to unit tests were made yet, the ones that exist already are passing, which should indicate that it's indeed backwards compatible. The proposed solution works in a real-life kedro project where the model is being packaged and served.

No time was spent to think about #445 (comment). User can pass anything he pleases that can be saved as a separate dict later to be used in the pipeline. It's impossible to overwrite other catalog entries or parameters.

Opening as a draft PR. I'm happy to collaborate on this more, but unsure if I'll have capacity to finish all the remaining tasks

Checklist

  • Read the contributing guidelines
  • Open this PR as a 'Draft Pull Request' if it is work-in-progress
  • Update the documentation to reflect the code changes
  • Add a description of this change and add your name to the list of supporting contributions in the CHANGELOG.md file. Please respect Keep a Changelog guidelines.
  • Add tests to cover your changes

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@Galileo-Galilei
Copy link
Owner

Thank you very much for taking a stab at it. I really want to make this work, but the main reason I haven't made it yet is that this change :

  • breaks existing model (i.e. PipelineML trained with current version of kedro mlflow ) because it changes the signature of the serialized (as a pickle) python object by adding an argument to the method load_context. This implies that anyone who wants to upgrade kedro-mlflow needs to retrain their existing model. This has caused a lot of trouble in the past and should be done very carefully
  • requires dropping mlflow 1.X. I should do it at one point because of the many open CVE, and it becomes less and less an issue since mlflow 2.X exists for a while now, but it still something I should warn people in advance.

I am considering creating a develop branch to be released at the same time of kedro 0.20 and which will drop all mlflow 1 support, but this is an extra maintenance burden and I am having hard time to maintain the repo recently so I have to find the best trade off.

@Galileo-Galilei
Copy link
Owner

Galileo-Galilei commented Oct 1, 2024

Hi, I've look at the implementation and I'd like to go even further to make it more generic. There are two different features:

  • supporting passing params to the predict method of custom models
  • supporting default value by propagating these paramas along pipeline_ml_factory.

For now, you only support one single param which must be a kedro parameter.

I think we should support something much more flexible to enable passing objects like that :

# pseudo code, just to demonstrate the idea
def predict(self, context, model_input, params: Optional[dict[str, Any]] = None):

        runner_params=params.get("runner_params")
        hook_params=params.get("hook_params")
        extra_params=params.get("extra_params")

        hook_manager = _create_hook_manager(**hook_params)

        for key, value in extra_params.items():
             if key in self.loaded_catalog.inputs:
                 self.loaded_catalog.save(name=key, data=value)
       
       runner=runner_params.get("runner_class")
        run_output = runner.run(
            pipeline=self.pipeline,
            catalog=self.loaded_catalog,
            **runner_params.get("runner_kwargs"))

This would help solving #587 #404 and your use case with the same implementation.

@Calychas
Copy link
Contributor Author

Calychas commented Oct 15, 2024

Hey @Galileo-Galilei! That's a really nice suggestion. I like the idea of multiple types of params solving different problems (hooks is a big one, that we would actually need for kedro-pandera).

#587 may be already solvable with the current implementation (see this), but I like the idea of specifying runner_params explicitly.

For now, you only support one single param which must be a kedro parameter.

Kedro dataset to be exact, but yes, only a single one

@Calychas Calychas force-pushed the inference-params branch 2 times, most recently from b2138c1 to 65f8ae1 Compare October 15, 2024 10:49
@Galileo-Galilei
Copy link
Owner

If we agree to do this as described above, I'd like to "make it right" directly in this PR to avoid introducing the extra params_input_name and deprecate it immediately. I can help if necessary.

Good catch for 587, I did not remember the runner was configurable! But to be honest this feels very bad to store it at training time, it makes much more sense to specify it at inference time (because tyou don't want to retrain you model just to change the runner!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add inference parameters to KedroPipelineModel
2 participants