Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mlserver_huggingface settings device type #1486

Merged
merged 4 commits into from
Mar 11, 2024
Merged

Conversation

geodavic
Copy link
Contributor

@geodavic geodavic commented Nov 15, 2023

If using a model-settings.json of the following form:

{
    "name": "my-model",
    "implementation": "mlserver_huggingface.HuggingFaceRuntime",
    "parameters": {
        "extra": {
            "task": "text-generation",
            "pretrained_model": "model/path",
            "model_kwargs": {
                "load_in_8bit": true
            }
        }
    }
}

an error occurs when spinning up the server:

The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object.

because the load_in_8bit model kwarg makes forces the model to be loaded with Accelerate, and HuggingFaceSettings.device default value of -1 is passed. I tried simply passing "device": null to the model settings json to prevent this default value, but this wasn't accepted because the schema is typed to int.

The solution is to expand the typing of HuggingFaceSettings.device to accept null (and string, while we're at it). Therefore I have updated the typing to Optional[Union[int,str]], which is very close to the type hint used in the underlying transformers pipeline.

I have tested the above model settings json with this change and the server starts as expected.

@geodavic geodavic changed the title Update mlserver_huggingface settings device type Fix mlserver_huggingface settings device type Nov 15, 2023
@nanbo-liu
Copy link
Contributor

nanbo-liu commented Nov 15, 2023

@adriangonz, please take a look when you have a chance. It's a small change

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @geodavic! 🚀

Would it make sense adding a small test to make sure this behaviour doesn't change in the future? This could be as simple as loading a model with device set to empty, -1, or cpu? And just asserting that the model loads correctly.

@geodavic
Copy link
Contributor Author

@adriangonz Thanks! I added a test that passes None, -1 and cpu for the device to ensure they all load into cpu.

@geodavic
Copy link
Contributor Author

@adriangonz :)

@geodavic
Copy link
Contributor Author

@sakoush if you don't mind taking a look, much appreciated!

@jyono
Copy link

jyono commented Jan 16, 2024

@adriangonz @sakoush

is there anything holding this back?

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Many thanks for your contribution.

@sakoush sakoush merged commit e4ed13b into SeldonIO:master Mar 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants