Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(huggingface): Load local artefacts in HuggingFace runtime #1319

Merged

Conversation

vtaskow
Copy link
Contributor

@vtaskow vtaskow commented Jul 28, 2023

Why

Currently, it is not convenient to load local model artefacts in the HuggingFace runtime. In the scenario when users would like to specify a custom model or perhaps would like to avoid downloading a model while mlserver operates, there must be a convenient way to specify a location to the desired model artefacts folder.
Furthermore, there is a way to specify such a location now by using the parameters.extra.pretrained_model in model-settings.json config. However, the intention of this field is to be used for ready models from the HuggingFace hub.

What and How

This PR provides a new way for a user to use a local HuggingFace model, not having to download a ready one from the HuggingFace hub or to limit themselves to those available there. Essentially, if the parameters.extra.pretrained_model is not specified in model-settings.json, then parameters.uri will be used to locate and load a custom model that is already downloaded by rclone. Furthermore, parameters.uri if not set, it's automatically filled by the HuggingFace runtime to point to the default location for downloading models in MLServer.

Testing in Seldon Core v1

Background

PREDICTIVE_UNIT_PARAMETERS is an environment variable containing a list of parameters describing a HuggingFace model. It is the predecessor or model-settings.json. In SCv1, if this variable is not empty, it is used instead of model-settings.json(even if supplied).

PREDICTIVE_UNIT_PARAMETERS is empty AND model-settings.json is missing

  • provided gs path to HF model artefacts folder not containing model-settings.json; no UI params set
    Result: raises MissingHuggingFaceSettings exception because there are no predictive params set or model-settings.json EVEN THOUGH MLSERVER_MODEL_URI is auto set to /mnt/models

PREDICTIVE_UNIT_PARAMETERS is not empty AND model-settings.json is missing

  • PREDICTIVE_UNIT_PARAMETERS contained: task = text-generation; MLSERVER_MODEL_URI is auto set to /mnt/models
    Result: able to initialise HF settings, params has task = text-generation; HF settings -> pretrained_model is not set but falls back to MLSERVER_MODEL_URI - this is the change I am making in this PR, Able to predict ✔️

  • PREDICTIVE_UNIT_PARAMETERS contained: task = text-generation, pretrained_model = /mnt/models(the same as MLSERVER_MODEL_URI which is auto set to that value)
    Result: able to initialise HF settings, params has task = text-generation and pretrained_model = /mnt/models which is being used as a location for the custom model. Able to predict ✔️

PREDICTIVE_UNIT_PARAMETERS is empty and using model-settings.json

  • MLSERVER_MODEL_URI is auto set to /mnt/models and model-settings.json contained:
{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
      "task": "text-generation"
    }
  }
}

Result: able to initialise HF settings, pretrained model is not used because it's empty, correctly falls back to using MLSERVER_MODEL_URI instead which points to /mnt/models where rclone has downloaded the model artefact folder. ✔️
Important to note: no models were downloaded on demand from the HuggingFace hub.

  • MLSERVER_MODEL_URI is auto set to /mnt/models and model-settings.json contained:
{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
      "pretrained_model": "/mnt/models",
      "task": "text-generation"
    }
  }
}

Result: able to initialise HF settings, pretrained model IS used and correctly points to /mnt/models where rclone has downloaded the model artefact folder ✔️
Important to note: no models were downloaded on demand from the HuggingFace hub.

  • MLSERVER_MODEL_URI is auto set to /mnt/models and model-settings.json contained:
{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "uri": "/mnt/models",
    "extra": {
      "task": "text-generation"
    }
  }
}

Result: able to initialise HF settings, uri IS used and correctly points to /mnt/models where rclone has downloaded the model artefact folder ✔️
Important to note: no models were downloaded on demand from the HuggingFace hub.

Testing in Seldon Core v2

model-settings.json is missing

Message: ModelFailed
Reason: Can't create model-settings from requirements [huggingface]
Result is expected ✔️

model-settings.json doesn't have parameters.extra

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {}
  }
}

mlserver_huggingface.errors.MissingHuggingFaceSettings: Missing HuggingFace Runtime settings.
Result is expected ✔️

model-settings.json doesn't have parameters.extra but has parameters.uri

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "uri": "/mnt/models",
  "parameters": {
    "extra": {}
  }
}

mlserver_huggingface.errors.MissingHuggingFaceSettings: Missing HuggingFace Runtime settings.
Result is expected ✔️

model-settings.json has only task field in parameters.extra

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
      "task": "text-generation"
    }
  }
}

Result: It's correctly using the downloaded model in /mnt/agent/models because even though neither parameters.uri is set nor parameters.extra.pretrained_model, the runtime model registry by default fills in parameters.uri to be the model-settings.json path. Able to predict correctly ✔️
This is possible because of the fallback to parameters.uri if parameters.extra.pretrained_model is not set.

model-settings.json has task and pretrained_model fields in parameters.extra but pretrained_model points to a HuggingFace hub model

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
     "pretrained_model": "distilgpt2",
      "task": "text-generation"
    }
  }
}

Result: The specified pretrained model is correctly downloaded from the HuggingFace hub. Able to predict correctly ✔️

Further testing with model artefacts folder created from transformers.pipeline.save_pipeline()

{
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
      "task": "text-generation"
    }
  }
}
  • Works for SCv1 with model-settings.json
  • Works for SCv1 without model-settings.json but with PREDICTIVE_UNIT_PARAMS
  • Works for SCv1 with model-settings.json but with PREDICTIVE_UNIT_PARAMS
  • Works for SCv2 with model-setting.json

Resolves #1076

@vtaskow vtaskow changed the title feat(huggingface): Use HF model params uri to specify a local path to the model artefacts feat(huggingface): Load local artefacts in HuggingFace runtime Jul 28, 2023
@vtaskow vtaskow self-assigned this Jul 28, 2023
@vtaskow vtaskow added the enhancement New feature or request label Jul 28, 2023
@vtaskow vtaskow marked this pull request as ready for review July 28, 2023 16:09
mlserver/cli/serve.py Outdated Show resolved Hide resolved
runtimes/huggingface/README.md Outdated Show resolved Hide resolved
runtimes/huggingface/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! Changes look good from my side - this should be good to go once tests are green. 🚀

Copy link
Contributor

@agrski agrski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions to improve clarity and UX, but nothing major that I can see

runtimes/huggingface/README.md Outdated Show resolved Hide resolved
runtimes/huggingface/README.md Outdated Show resolved Hide resolved
runtimes/huggingface/README.md Outdated Show resolved Hide resolved
Comment on lines 33 to 34
if not model:
if settings.parameters is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ What happens if neither the pretained_model nor the parameters are specified? Previously, the pretrained_model was mandatory and, presumably, the code would blow up if it wasn't there. Now we're saying both are optional, but I can't see clear error handling here to inform the user that something needs to be provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a UX consideration, might we also want to clarify the decision being made here? That is to say that, rather than using a default and only checking something else if not provided, we instead check if both are provided and explain that one overrides the other, or that the one hasn't been provided so we're using the other instead.

That's slightly more code, but shouldn't add appreciable time cost to loading models and should make it easier for users to understand why things are happening the way they are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of the huggingface runtime, settings.parameters is always not empty, otherwise this function get_huggingface_settings will raise an exception. And also, settings.parameters.uri is always present(if it's not present in the model-settings.json, it will be set to the default model path by the registry load method).
Let me know if that makes sense to you, if not I am open to change 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. Makes sense that we need certain parameters and validate this first. The links are very helpful, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One follow-up question: if settings.parameters is always provided, there should be no need to check if settings.parameters is not None, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. I can remove it to avoid confusion 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to further support the claim that settings.parameters is always provided:
In case there are no model-setting.json files found from the current path, there is a fallback to create model settings from the environment, where also it is the case that model params are initialised and therefore not empty.

runtimes/huggingface/README.md Outdated Show resolved Hide resolved
@@ -1,6 +1,8 @@
from unittest.mock import patch, MagicMock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Is there a particular order defined for imports? I can't see grouping by standard library vs. third-party groupings, so not sure if there is an expectation around this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant for blocks of imports, rather than imports within a single line. It's okay if not, but wanted to clarify what I meant

Copy link
Contributor Author

@vtaskow vtaskow Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so we are using a tool called black that formats the files and controls whether an import should be 1 or a multi-line block depending on the set line length. If it's too long it will format it into a multi-line block. In this case, after running black locally, I can confirm it is correct to be 1 line. In the tests, the code is also checked by black to verify correct formatting and it confirmed this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, black settings are configured from pyproject.toml in the root folder under [tool.black] section. In this case, the line length is not set so it's using a default length of 88 chars.

runtimes/huggingface/tests/test_common.py Outdated Show resolved Hide resolved
Comment on lines 33 to 34
if not model:
if settings.parameters is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of the huggingface runtime, settings.parameters is always not empty, otherwise this function get_huggingface_settings will raise an exception. And also, settings.parameters.uri is always present(if it's not present in the model-settings.json, it will be set to the default model path by the registry load method).
Let me know if that makes sense to you, if not I am open to change 👍

runtimes/huggingface/README.md Show resolved Hide resolved
mlserver/repository/factory.py Outdated Show resolved Hide resolved
@@ -1,6 +1,8 @@
from unittest.mock import patch, MagicMock
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @vtaskow ! Thanks for making that change - looks great from my side.

@@ -55,6 +55,17 @@ MLSERVER_MODEL_HUGGINGFACE_OPTIMUM_MODEL=true
```
````

### Loading models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This is a clear, concise section - I like it!

],
)
@patch("mlserver_huggingface.common._get_pipeline_class")
def test_pipeline_was_initialised_when_pretrained_model_is_not_supplied(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😕 I'm confused by these tests for two reasons. While the actual logic makes sense, the test names are verbose (what does "was initialised" mean in the context of the tests?) and the test logic is identical between the two tests bar the value for pretrained_model as far as I can see.

Is this not clearer as a single, tabulated test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the test logic, I've refactored it to be more concise. About the test name: pipeline_is_initialised, I mean that when pipeline(name, model, ....) constructor is called, the correct model param is passed.

@@ -1,6 +1,8 @@
from unittest.mock import patch, MagicMock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant for blocks of imports, rather than imports within a single line. It's okay if not, but wanted to clarify what I meant

Copy link
Contributor

@agrski agrski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a couple of questions about structure/format but, as before, functionally this looks simple and correct

@vtaskow vtaskow merged commit 50b51fe into SeldonIO:master Aug 18, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load local artefacts in HuggingFace runtime
3 participants