Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow MLflow URI with scheme #839

Merged
merged 2 commits into from
Nov 15, 2022

Conversation

regen100
Copy link
Contributor

MLflow can load many schemes. However, get_model_uri accepts only local file paths.

This PR enables MLflowRuntime to load models by URIs such as s3://... and file:....

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @regen100 ,

Thanks a lot for taking a first stab at this one. This is certainly something that other MLflow users have asked in the past.

In general, we try to shy away from downloading artefacts in MLServer, as we see it as more of an infrastructure concern (i.e. something that is better handled by orchestrators like Seldon Core or KServe). However, it's true that it could make sense to allow downloads from the MLflow model registry in certain cases.

Having said that, I've added a couple questions and suggestions below. It would be great to hear your thoughts on those!

mlserver/utils.py Outdated Show resolved Hide resolved
return ModelSettings(
name="mlflow-model",
implementation=MLflowRuntime,
parameters=ModelParameters(uri="file:" + model_uri),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding a new fixture, should we parametrise one of the tests to run with file:// or no-schema. That way we add less boilerplate code to maintain later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the tests by pytest parameterization.

model_uri = await get_model_uri(
self._settings,
allowed_schemes=mlflow.store.artifact.artifact_repository_registry._artifact_repository_registry._registry,
)
self._model = mlflow.pyfunc.load_model(model_uri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much experience downloading things from dbfs:// / s3:// using MLflow, but would load_model() just handle everything else? How would we configure the DBFS / S3 host, credentials, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of S3 URLs, MLflow retrieves the artifact through boto3, which retrieves the credentials from an environment variables AWS_ACCESS_KEY_ID or the default credential store ~/.aws.

@regen100
Copy link
Contributor Author

we try to shy away from downloading artefacts in MLServer, as we see it as more of an infrastructure concern

Your policy is reasonable. Downloading artifacts introduces complexity such as cache control.

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good! Nice one defaulting the scheme on the urlparse() method 👍

Once the tests are green, this should be good to go.

@adriangonz adriangonz merged commit 6ef2f1e into SeldonIO:master Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants