-
Notifications
You must be signed in to change notification settings - Fork 183
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Huggingface Transformer Runtime with Optimum Integration (#573)
- Loading branch information
Showing
13 changed files
with
915 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
mlruns |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,261 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8c519626", | ||
"metadata": {}, | ||
"source": [ | ||
"# Serving HuggingFace Transformer Models\n", | ||
"\n", | ||
"Out of the box, MLServer supports the deployment and serving of HuggingFace Transformer models with the following features:\n", | ||
"\n", | ||
"- Loading of Transformer Model artifacts from Hub.\n", | ||
"- Supports Optimized models using Optimum library\n", | ||
"\n", | ||
"In this example, we will showcase some of this features using an example model." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "655ea442", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from IPython.core.magic import register_line_cell_magic\n", | ||
"\n", | ||
"@register_line_cell_magic\n", | ||
"def writetemplate(line, cell):\n", | ||
" with open(line, 'w') as f:\n", | ||
" f.write(cell.format(**globals()))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f1fed35e", | ||
"metadata": {}, | ||
"source": [ | ||
"## Serving\n", | ||
"\n", | ||
"Now that we have trained and serialised our model, we are ready to start serving it.\n", | ||
"For that, the initial step will be to set up a `model-settings.json` that instructs MLServer to load our artifact using the HuggingFace Inference Runtime." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 12, | ||
"id": "6df62443", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%writetemplate ./model-settings.json\n", | ||
"{{\n", | ||
" \"name\": \"gpt2-model\",\n", | ||
" \"implementation\": \"mlserver_huggingface.HuggingFaceRuntime\",\n", | ||
" \"parallel_workers\": 0,\n", | ||
" \"parameters\": {{\n", | ||
" \"extra\": {{\n", | ||
" \"task\": \"text-generation\",\n", | ||
" \"pretrained_model\": \"distilgpt2\"\n", | ||
" }}\n", | ||
" }}\n", | ||
"}}" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c6a6e8b2", | ||
"metadata": {}, | ||
"source": [ | ||
"Now that we have our config in-place, we can start the server by running `mlserver start .`. This needs to either be ran from the same directory where our config files are or pointing to the folder where they are.\n", | ||
"\n", | ||
"```shell\n", | ||
"mlserver start .\n", | ||
"```\n", | ||
"\n", | ||
"Since this command will start the server and block the terminal, waiting for requests, this will need to be ran in the background on a separate terminal." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b664c591", | ||
"metadata": {}, | ||
"source": [ | ||
"### Send test inference request\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 14, | ||
"id": "759ad7df", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"{'model_name': 'gpt2-model',\n", | ||
" 'model_version': None,\n", | ||
" 'id': 'f65f1b35-840e-4872-83b8-270271501b6c',\n", | ||
" 'parameters': None,\n", | ||
" 'outputs': [{'name': 'huggingface',\n", | ||
" 'shape': [1],\n", | ||
" 'datatype': 'BYTES',\n", | ||
" 'parameters': None,\n", | ||
" 'data': ['[[{\"generated_text\": \"this is a test of this. Once it is finished, it will generate a clean and easy clean install page and the build file will be loaded in the terminal. I need to install the build files for this project.\\\\n\\\\nNow to have my\"}]]']}]}" | ||
] | ||
}, | ||
"execution_count": 14, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"import requests\n", | ||
"\n", | ||
"inference_request = {\n", | ||
" \"inputs\": [\n", | ||
" {\n", | ||
" \"name\": \"huggingface\",\n", | ||
" \"shape\": [1],\n", | ||
" \"datatype\": \"BYTES\",\n", | ||
" \"data\": [\"this is a test\"],\n", | ||
" }\n", | ||
" ]\n", | ||
"}\n", | ||
"\n", | ||
"endpoint = \"http://localhost:8080/v2/models/gpt2-model/infer\"\n", | ||
"response = requests.post(endpoint, json=inference_request)\n", | ||
"\n", | ||
"response.json()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0edbcf72", | ||
"metadata": {}, | ||
"source": [ | ||
"### Using Optimum Optimized Models\n", | ||
"\n", | ||
"We can also leverage the Optimum library that allows us to access quantized and optimized models. \n", | ||
"\n", | ||
"We can download pretrained optimized models from the hub if available by enabling the `optimum_model` flag:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"id": "6d185281", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%writetemplate ./model-settings.json\n", | ||
"{{\n", | ||
" \"name\": \"gpt2-model\",\n", | ||
" \"implementation\": \"mlserver_huggingface.HuggingFaceRuntime\",\n", | ||
" \"parallel_workers\": 0,\n", | ||
" \"parameters\": {{\n", | ||
" \"extra\": {{\n", | ||
" \"task\": \"text-generation\",\n", | ||
" \"pretrained_model\": \"distilgpt2\",\n", | ||
" \"optimum_model\": true\n", | ||
" }}\n", | ||
" }}\n", | ||
"}}" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "90925ef0", | ||
"metadata": {}, | ||
"source": [ | ||
"Once again, you are able to run the model using the MLServer CLI. As before this needs to either be ran from the same directory where our config files are or pointing to the folder where they are.\n", | ||
"\n", | ||
"```shell\n", | ||
"mlserver start .\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6945db96", | ||
"metadata": {}, | ||
"source": [ | ||
"### Send Test Request to Optimum Optimized Model" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 11, | ||
"id": "39d8b438", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"{'model_name': 'gpt2-model',\n", | ||
" 'model_version': None,\n", | ||
" 'id': 'fdce1497-6b14-4fc6-a414-d437ca7783c6',\n", | ||
" 'parameters': None,\n", | ||
" 'outputs': [{'name': 'huggingface',\n", | ||
" 'shape': [1],\n", | ||
" 'datatype': 'BYTES',\n", | ||
" 'parameters': None,\n", | ||
" 'data': ['[[{\"generated_text\": \"this is a test case where I get a message telling me not to let the system know.\\\\nThis post was created using the \\\\\"Help Coding Standard\\\\\" tool. You can find information about the standard version of the Coding Standard here.\"}]]']}]}" | ||
] | ||
}, | ||
"execution_count": 11, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"import requests\n", | ||
"\n", | ||
"inference_request = {\n", | ||
" \"inputs\": [\n", | ||
" {\n", | ||
" \"name\": \"huggingface\",\n", | ||
" \"shape\": [1],\n", | ||
" \"datatype\": \"BYTES\",\n", | ||
" \"data\": [\"this is a test\"],\n", | ||
" }\n", | ||
" ]\n", | ||
"}\n", | ||
"\n", | ||
"endpoint = \"http://localhost:8080/v2/models/gpt2-model/infer\"\n", | ||
"response = requests.post(endpoint, json=inference_request)\n", | ||
"\n", | ||
"response.json()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d7aaf365", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.