feat(huggingface): Load local artefacts in HuggingFace runtime #1319

vtaskow · 2023-07-28T14:27:02Z

Why

Currently, it is not convenient to load local model artefacts in the HuggingFace runtime. In the scenario when users would like to specify a custom model or perhaps would like to avoid downloading a model while mlserver operates, there must be a convenient way to specify a location to the desired model artefacts folder.
Furthermore, there is a way to specify such a location now by using the parameters.extra.pretrained_model in model-settings.json config. However, the intention of this field is to be used for ready models from the HuggingFace hub.

What and How

This PR provides a new way for a user to use a local HuggingFace model, not having to download a ready one from the HuggingFace hub or to limit themselves to those available there. Essentially, if the parameters.extra.pretrained_model is not specified in model-settings.json, then parameters.uri will be used to locate and load a custom model that is already downloaded by rclone. Furthermore, parameters.uri if not set, it's automatically filled by the HuggingFace runtime to point to the default location for downloading models in MLServer.

Testing in Seldon Core v1

Background

PREDICTIVE_UNIT_PARAMETERS is an environment variable containing a list of parameters describing a HuggingFace model. It is the predecessor or model-settings.json. In SCv1, if this variable is not empty, it is used instead of model-settings.json(even if supplied).

PREDICTIVE_UNIT_PARAMETERS is empty AND `model-settings.json` is missing

provided gs path to HF model artefacts folder not containing model-settings.json; no UI params set
Result: raises MissingHuggingFaceSettings exception because there are no predictive params set or model-settings.json EVEN THOUGH MLSERVER_MODEL_URI is auto set to /mnt/models

PREDICTIVE_UNIT_PARAMETERS is not empty AND `model-settings.json` is missing

PREDICTIVE_UNIT_PARAMETERS contained: task = text-generation; MLSERVER_MODEL_URI is auto set to /mnt/models
Result: able to initialise HF settings, params has task = text-generation; HF settings -> pretrained_model is not set but falls back to MLSERVER_MODEL_URI - this is the change I am making in this PR, Able to predict ✔️
PREDICTIVE_UNIT_PARAMETERS contained: task = text-generation, pretrained_model = /mnt/models(the same as MLSERVER_MODEL_URI which is auto set to that value)
Result: able to initialise HF settings, params has task = text-generation and pretrained_model = /mnt/models which is being used as a location for the custom model. Able to predict ✔️

PREDICTIVE_UNIT_PARAMETERS is empty and using `model-settings.json`

MLSERVER_MODEL_URI is auto set to /mnt/models and model-settings.json contained:

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
      "task": "text-generation"
    }
  }
}

Result: able to initialise HF settings, pretrained model is not used because it's empty, correctly falls back to using MLSERVER_MODEL_URI instead which points to /mnt/models where rclone has downloaded the model artefact folder. ✔️
Important to note: no models were downloaded on demand from the HuggingFace hub.

MLSERVER_MODEL_URI is auto set to /mnt/models and model-settings.json contained:

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
      "pretrained_model": "/mnt/models",
      "task": "text-generation"
    }
  }
}

Result: able to initialise HF settings, pretrained model IS used and correctly points to /mnt/models where rclone has downloaded the model artefact folder ✔️
Important to note: no models were downloaded on demand from the HuggingFace hub.

MLSERVER_MODEL_URI is auto set to /mnt/models and model-settings.json contained:

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "uri": "/mnt/models",
    "extra": {
      "task": "text-generation"
    }
  }
}

Result: able to initialise HF settings, uri IS used and correctly points to /mnt/models where rclone has downloaded the model artefact folder ✔️
Important to note: no models were downloaded on demand from the HuggingFace hub.

Testing in Seldon Core v2

`model-settings.json` is missing

Message: ModelFailed
Reason: Can't create model-settings from requirements [huggingface]
Result is expected ✔️

`model-settings.json` doesn't have `parameters.extra`

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {}
  }
}

mlserver_huggingface.errors.MissingHuggingFaceSettings: Missing HuggingFace Runtime settings.
Result is expected ✔️

`model-settings.json` doesn't have `parameters.extra` but has `parameters.uri`

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "uri": "/mnt/models",
  "parameters": {
    "extra": {}
  }
}

mlserver_huggingface.errors.MissingHuggingFaceSettings: Missing HuggingFace Runtime settings.
Result is expected ✔️

`model-settings.json` has only `task` field in `parameters.extra`

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
      "task": "text-generation"
    }
  }
}

Result: It's correctly using the downloaded model in /mnt/agent/models because even though neither parameters.uri is set nor parameters.extra.pretrained_model, the runtime model registry by default fills in parameters.uri to be the model-settings.json path. Able to predict correctly ✔️
This is possible because of the fallback to parameters.uri if parameters.extra.pretrained_model is not set.

`model-settings.json` has `task` and `pretrained_model` fields in `parameters.extra` but `pretrained_model` points to a HuggingFace hub model

{
  "name": "transformer",
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
     "pretrained_model": "distilgpt2",
      "task": "text-generation"
    }
  }
}

Result: The specified pretrained model is correctly downloaded from the HuggingFace hub. Able to predict correctly ✔️

Further testing with model artefacts folder created from `transformers.pipeline.save_pipeline()`

{
  "implementation": "mlserver_huggingface.HuggingFaceRuntime",
  "parameters": {
    "extra": {
      "task": "text-generation"
    }
  }
}

Works for SCv1 with model-settings.json
Works for SCv1 without model-settings.json but with PREDICTIVE_UNIT_PARAMS
Works for SCv1 with model-settings.json but with PREDICTIVE_UNIT_PARAMS
Works for SCv2 with model-setting.json

Resolves #1076

…nd params.extra.pretrained_model

mlserver/cli/serve.py

runtimes/huggingface/README.md

adriangonz

Nice one! Changes look good from my side - this should be good to go once tests are green. 🚀

agrski

A few suggestions to improve clarity and UX, but nothing major that I can see

runtimes/huggingface/README.md

agrski · 2023-08-01T11:51:23Z

runtimes/huggingface/mlserver_huggingface/common.py

+    if not model:
+        if settings.parameters is not None:


❓ What happens if neither the pretained_model nor the parameters are specified? Previously, the pretrained_model was mandatory and, presumably, the code would blow up if it wasn't there. Now we're saying both are optional, but I can't see clear error handling here to inform the user that something needs to be provided.

As a UX consideration, might we also want to clarify the decision being made here? That is to say that, rather than using a default and only checking something else if not provided, we instead check if both are provided and explain that one overrides the other, or that the one hasn't been provided so we're using the other instead.

That's slightly more code, but shouldn't add appreciable time cost to loading models and should make it easier for users to understand why things are happening the way they are.

In the context of the huggingface runtime, settings.parameters is always not empty, otherwise this function get_huggingface_settings will raise an exception. And also, settings.parameters.uri is always present(if it's not present in the model-settings.json, it will be set to the default model path by the registry load method).
Let me know if that makes sense to you, if not I am open to change 👍

Thanks for clarifying. Makes sense that we need certain parameters and validate this first. The links are very helpful, thanks!

One follow-up question: if settings.parameters is always provided, there should be no need to check if settings.parameters is not None, right?

Yes, correct. I can remove it to avoid confusion 👍

And to further support the claim that settings.parameters is always provided:
In case there are no model-setting.json files found from the current path, there is a fallback to create model settings from the environment, where also it is the case that model params are initialised and therefore not empty.

runtimes/huggingface/README.md

agrski · 2023-08-02T09:13:34Z

runtimes/huggingface/tests/test_common.py

@@ -1,6 +1,8 @@
+from unittest.mock import patch, MagicMock


❓ Is there a particular order defined for imports? I can't see grouping by standard library vs. third-party groupings, so not sure if there is an expectation around this

I meant for blocks of imports, rather than imports within a single line. It's okay if not, but wanted to clarify what I meant

Right, so we are using a tool called black that formats the files and controls whether an import should be 1 or a multi-line block depending on the set line length. If it's too long it will format it into a multi-line block. In this case, after running black locally, I can confirm it is correct to be 1 line. In the tests, the code is also checked by black to verify correct formatting and it confirmed this 👍

Also, black settings are configured from pyproject.toml in the root folder under [tool.black] section. In this case, the line length is not set so it's using a default length of 88 chars.

runtimes/huggingface/tests/test_common.py

vtaskow · 2023-08-11T16:04:49Z

runtimes/huggingface/mlserver_huggingface/common.py

+    if not model:
+        if settings.parameters is not None:


In the context of the huggingface runtime, settings.parameters is always not empty, otherwise this function get_huggingface_settings will raise an exception. And also, settings.parameters.uri is always present(if it's not present in the model-settings.json, it will be set to the default model path by the registry load method).
Let me know if that makes sense to you, if not I am open to change 👍

runtimes/huggingface/README.md

mlserver/repository/factory.py

vtaskow · 2023-08-11T16:07:07Z

runtimes/huggingface/tests/test_common.py

@@ -1,6 +1,8 @@
+from unittest.mock import patch, MagicMock


…models

runtimes/huggingface/README.md

adriangonz

Nice one @vtaskow ! Thanks for making that change - looks great from my side.

agrski · 2023-08-16T10:59:46Z

runtimes/huggingface/README.md

@@ -55,6 +55,17 @@ MLSERVER_MODEL_HUGGINGFACE_OPTIMUM_MODEL=true
 ```
 ````

+### Loading models


👍 This is a clear, concise section - I like it!

agrski · 2023-08-16T11:08:57Z

runtimes/huggingface/tests/test_common.py

+    ],
+)
+@patch("mlserver_huggingface.common._get_pipeline_class")
+def test_pipeline_was_initialised_when_pretrained_model_is_not_supplied(


😕 I'm confused by these tests for two reasons. While the actual logic makes sense, the test names are verbose (what does "was initialised" mean in the context of the tests?) and the test logic is identical between the two tests bar the value for pretrained_model as far as I can see.

Is this not clearer as a single, tabulated test?

Good point about the test logic, I've refactored it to be more concise. About the test name: pipeline_is_initialised, I mean that when pipeline(name, model, ....) constructor is called, the correct model param is passed.

agrski · 2023-08-16T11:11:15Z

runtimes/huggingface/tests/test_common.py

@@ -1,6 +1,8 @@
+from unittest.mock import patch, MagicMock


I meant for blocks of imports, rather than imports within a single line. It's okay if not, but wanted to clarify what I meant

agrski

Still a couple of questions about structure/format but, as before, functionally this looks simple and correct

…stry load

…models

vtaskow added 3 commits July 28, 2023 15:26

Use HF model params uri to specify a local path to the model artefacts

8660664

Add a section in HF runtime README explaining the use of params.uri a…

548edce

…nd params.extra.pretrained_model

Ignore a mypy complaint

55635b3

vtaskow changed the title ~~feat(huggingface): Use HF model params uri to specify a local path to the model artefacts~~ feat(huggingface): Load local artefacts in HuggingFace runtime Jul 28, 2023

Lint

63a41dc

vtaskow requested review from adriangonz and agrski July 28, 2023 16:08

vtaskow self-assigned this Jul 28, 2023

vtaskow added the enhancement New feature or request label Jul 28, 2023

vtaskow marked this pull request as ready for review July 28, 2023 16:09

adriangonz suggested changes Jul 28, 2023

View reviewed changes

mlserver/cli/serve.py Outdated Show resolved Hide resolved

runtimes/huggingface/README.md Outdated Show resolved Hide resolved

runtimes/huggingface/README.md Outdated Show resolved Hide resolved

vtaskow added 2 commits July 28, 2023 17:28

Refactor documentation

f05c27f

Refactor documentation again

55505dd

adriangonz approved these changes Jul 28, 2023

View reviewed changes

agrski reviewed Aug 2, 2023

View reviewed changes

Refactor after review

b607d12

vtaskow commented Aug 11, 2023

View reviewed changes

vtaskow added 2 commits August 11, 2023 17:39

Revert change out of scope

5c55951

Merge branch 'master' into V2D-1138-hugging-face-allow-loading-local-…

c98f3ce

…models

adriangonz reviewed Aug 15, 2023

View reviewed changes

runtimes/huggingface/README.md Outdated Show resolved Hide resolved

Refactor loading models section in READMe

77a7eb7

adriangonz approved these changes Aug 15, 2023

View reviewed changes

agrski reviewed Aug 16, 2023

View reviewed changes

vtaskow added 4 commits August 16, 2023 16:19

Refactored tests

dce1316

Rename test

a21c743

Ignore mypy because settings.params is always filled in by model regi…

d52c9c2

…stry load

Merge branch 'master' into V2D-1138-hugging-face-allow-loading-local-…

cd2d718

…models

agrski approved these changes Aug 18, 2023

View reviewed changes

Resolve conflicts

7500cff

vtaskow merged commit 50b51fe into SeldonIO:master Aug 18, 2023
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(huggingface): Load local artefacts in HuggingFace runtime #1319

feat(huggingface): Load local artefacts in HuggingFace runtime #1319

vtaskow commented Jul 28, 2023 •

edited

Loading

adriangonz left a comment

agrski left a comment

agrski Aug 1, 2023

agrski Aug 1, 2023

vtaskow Aug 11, 2023

agrski Aug 16, 2023

agrski Aug 16, 2023

vtaskow Aug 16, 2023

vtaskow Aug 17, 2023

agrski Aug 2, 2023

vtaskow Aug 11, 2023

agrski Aug 16, 2023

vtaskow Aug 16, 2023 •

edited

Loading

vtaskow Aug 16, 2023

vtaskow Aug 11, 2023

vtaskow Aug 11, 2023

adriangonz left a comment

agrski Aug 16, 2023

agrski Aug 16, 2023

vtaskow Aug 16, 2023

agrski Aug 16, 2023

agrski left a comment

feat(huggingface): Load local artefacts in HuggingFace runtime #1319

feat(huggingface): Load local artefacts in HuggingFace runtime #1319

Conversation

vtaskow commented Jul 28, 2023 • edited Loading

Why

What and How

Testing in Seldon Core v1

Background

PREDICTIVE_UNIT_PARAMETERS is empty AND model-settings.json is missing

PREDICTIVE_UNIT_PARAMETERS is not empty AND model-settings.json is missing

PREDICTIVE_UNIT_PARAMETERS is empty and using model-settings.json

Testing in Seldon Core v2

model-settings.json is missing

model-settings.json doesn't have parameters.extra

model-settings.json doesn't have parameters.extra but has parameters.uri

model-settings.json has only task field in parameters.extra

model-settings.json has task and pretrained_model fields in parameters.extra but pretrained_model points to a HuggingFace hub model

Further testing with model artefacts folder created from transformers.pipeline.save_pipeline()

adriangonz left a comment

Choose a reason for hiding this comment

agrski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtaskow Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangonz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrski left a comment

Choose a reason for hiding this comment

vtaskow commented Jul 28, 2023 •

edited

Loading

PREDICTIVE_UNIT_PARAMETERS is empty AND `model-settings.json` is missing

PREDICTIVE_UNIT_PARAMETERS is not empty AND `model-settings.json` is missing

PREDICTIVE_UNIT_PARAMETERS is empty and using `model-settings.json`

`model-settings.json` is missing

`model-settings.json` doesn't have `parameters.extra`

`model-settings.json` doesn't have `parameters.extra` but has `parameters.uri`

`model-settings.json` has only `task` field in `parameters.extra`

`model-settings.json` has `task` and `pretrained_model` fields in `parameters.extra` but `pretrained_model` points to a HuggingFace hub model

Further testing with model artefacts folder created from `transformers.pipeline.save_pipeline()`

vtaskow Aug 16, 2023 •

edited

Loading