Enable asynchronous mode when serving inference pipeline



## Description
The `pipeline_ml_factory` allows the isolation of an inference pipeline that would be run during model serving.  The run sequentially loads the I/O per node, and there could be potential performance gains if asynchronous mode can be enabled instead, like when the `kedro run --async` command is used ([reference](https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline.html#load-and-save-asynchronously)).

## Context
We have a MLflow model that uses the `pipeline_ml_factory` and is hosted by a platform which enforces an API response timeout. We already optimized our code base, and are hoping that the processing speed could still be significantly reduced if the many I/O to our inference pipeline's nodes could be loaded/saved asynchronously. 

The platform serves the model similar to how `mlflow models serve` does, where only the MLflow model itself is accessed. Within the docker container deployed by the hosting platform, our entrypoint script only has access to the MLflow model and cannot access the Kedro project path, so we cannot load any configurations set in the project's `/conf` directory. Thus, we are hoping that  enabling the asynchronous mode could be somehow "encoded" within the MLflow model itself.

## Possible Implementation/Alternatives
Unfortunately I have no suggestions on how this could be implemented, and actually unsure whether this feature is already available.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable asynchronous mode when serving inference pipeline #587

Description

Context

Possible Implementation/Alternatives

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Enable asynchronous mode when serving inference pipeline #587

Description

Description

Context

Possible Implementation/Alternatives

Activity

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions