Skip to content

Commit

Permalink
Huggingface Transformer Runtime with Optimum Integration (#573)
Browse files Browse the repository at this point in the history
  • Loading branch information
axsaucedo authored May 10, 2022
1 parent 47ae828 commit 0ed4a40
Show file tree
Hide file tree
Showing 13 changed files with 915 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/examples/huggingface/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mlruns
261 changes: 261 additions & 0 deletions docs/examples/huggingface/README.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8c519626",
"metadata": {},
"source": [
"# Serving HuggingFace Transformer Models\n",
"\n",
"Out of the box, MLServer supports the deployment and serving of HuggingFace Transformer models with the following features:\n",
"\n",
"- Loading of Transformer Model artifacts from Hub.\n",
"- Supports Optimized models using Optimum library\n",
"\n",
"In this example, we will showcase some of this features using an example model."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "655ea442",
"metadata": {},
"outputs": [],
"source": [
"from IPython.core.magic import register_line_cell_magic\n",
"\n",
"@register_line_cell_magic\n",
"def writetemplate(line, cell):\n",
" with open(line, 'w') as f:\n",
" f.write(cell.format(**globals()))"
]
},
{
"cell_type": "markdown",
"id": "f1fed35e",
"metadata": {},
"source": [
"## Serving\n",
"\n",
"Now that we have trained and serialised our model, we are ready to start serving it.\n",
"For that, the initial step will be to set up a `model-settings.json` that instructs MLServer to load our artifact using the HuggingFace Inference Runtime."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "6df62443",
"metadata": {},
"outputs": [],
"source": [
"%%writetemplate ./model-settings.json\n",
"{{\n",
" \"name\": \"gpt2-model\",\n",
" \"implementation\": \"mlserver_huggingface.HuggingFaceRuntime\",\n",
" \"parallel_workers\": 0,\n",
" \"parameters\": {{\n",
" \"extra\": {{\n",
" \"task\": \"text-generation\",\n",
" \"pretrained_model\": \"distilgpt2\"\n",
" }}\n",
" }}\n",
"}}"
]
},
{
"cell_type": "markdown",
"id": "c6a6e8b2",
"metadata": {},
"source": [
"Now that we have our config in-place, we can start the server by running `mlserver start .`. This needs to either be ran from the same directory where our config files are or pointing to the folder where they are.\n",
"\n",
"```shell\n",
"mlserver start .\n",
"```\n",
"\n",
"Since this command will start the server and block the terminal, waiting for requests, this will need to be ran in the background on a separate terminal."
]
},
{
"cell_type": "markdown",
"id": "b664c591",
"metadata": {},
"source": [
"### Send test inference request\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "759ad7df",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'model_name': 'gpt2-model',\n",
" 'model_version': None,\n",
" 'id': 'f65f1b35-840e-4872-83b8-270271501b6c',\n",
" 'parameters': None,\n",
" 'outputs': [{'name': 'huggingface',\n",
" 'shape': [1],\n",
" 'datatype': 'BYTES',\n",
" 'parameters': None,\n",
" 'data': ['[[{\"generated_text\": \"this is a test of this. Once it is finished, it will generate a clean and easy clean install page and the build file will be loaded in the terminal. I need to install the build files for this project.\\\\n\\\\nNow to have my\"}]]']}]}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import requests\n",
"\n",
"inference_request = {\n",
" \"inputs\": [\n",
" {\n",
" \"name\": \"huggingface\",\n",
" \"shape\": [1],\n",
" \"datatype\": \"BYTES\",\n",
" \"data\": [\"this is a test\"],\n",
" }\n",
" ]\n",
"}\n",
"\n",
"endpoint = \"http://localhost:8080/v2/models/gpt2-model/infer\"\n",
"response = requests.post(endpoint, json=inference_request)\n",
"\n",
"response.json()"
]
},
{
"cell_type": "markdown",
"id": "0edbcf72",
"metadata": {},
"source": [
"### Using Optimum Optimized Models\n",
"\n",
"We can also leverage the Optimum library that allows us to access quantized and optimized models. \n",
"\n",
"We can download pretrained optimized models from the hub if available by enabling the `optimum_model` flag:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "6d185281",
"metadata": {},
"outputs": [],
"source": [
"%%writetemplate ./model-settings.json\n",
"{{\n",
" \"name\": \"gpt2-model\",\n",
" \"implementation\": \"mlserver_huggingface.HuggingFaceRuntime\",\n",
" \"parallel_workers\": 0,\n",
" \"parameters\": {{\n",
" \"extra\": {{\n",
" \"task\": \"text-generation\",\n",
" \"pretrained_model\": \"distilgpt2\",\n",
" \"optimum_model\": true\n",
" }}\n",
" }}\n",
"}}"
]
},
{
"cell_type": "markdown",
"id": "90925ef0",
"metadata": {},
"source": [
"Once again, you are able to run the model using the MLServer CLI. As before this needs to either be ran from the same directory where our config files are or pointing to the folder where they are.\n",
"\n",
"```shell\n",
"mlserver start .\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "6945db96",
"metadata": {},
"source": [
"### Send Test Request to Optimum Optimized Model"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "39d8b438",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'model_name': 'gpt2-model',\n",
" 'model_version': None,\n",
" 'id': 'fdce1497-6b14-4fc6-a414-d437ca7783c6',\n",
" 'parameters': None,\n",
" 'outputs': [{'name': 'huggingface',\n",
" 'shape': [1],\n",
" 'datatype': 'BYTES',\n",
" 'parameters': None,\n",
" 'data': ['[[{\"generated_text\": \"this is a test case where I get a message telling me not to let the system know.\\\\nThis post was created using the \\\\\"Help Coding Standard\\\\\" tool. You can find information about the standard version of the Coding Standard here.\"}]]']}]}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import requests\n",
"\n",
"inference_request = {\n",
" \"inputs\": [\n",
" {\n",
" \"name\": \"huggingface\",\n",
" \"shape\": [1],\n",
" \"datatype\": \"BYTES\",\n",
" \"data\": [\"this is a test\"],\n",
" }\n",
" ]\n",
"}\n",
"\n",
"endpoint = \"http://localhost:8080/v2/models/gpt2-model/infer\"\n",
"response = requests.post(endpoint, json=inference_request)\n",
"\n",
"response.json()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d7aaf365",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 0ed4a40

Please sign in to comment.