Goal: Make a text-generation server that is compatible with the OpenAI API Reference so it can plug-in readily with applications that use the interface.
pip install -r requirements.txt
Set up the server:
python examples/openai-server/server.py --model zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2023-08-07 17:18:32 __main__ INFO args: Namespace(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none', max_model_len=512, prompt_sequence_length=16, host='localhost', port=8000, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], served_model_name=None)
2023-08-07 17:18:32 deepsparse.transformers WARNING The neuralmagic fork of transformers may not be installed. It can be installed via `pip install nm_transformers`
Using pad_token, but it is not set yet.
2023-08-07 17:18:34 deepsparse.transformers.engines.nl_decoder_engine INFO Overwriting in-place the input shapes of the transformer model at /home/mgoin/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-base/model.onnx
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0 COMMUNITY | (98238edf) (optimized) (system=avx512_vnni, binary=avx512)
2023-08-07 17:18:48 deepsparse.transformers.engines.nl_decoder_engine INFO Overwriting in-place the input shapes of the transformer model at /home/mgoin/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-base/model.onnx
INFO: Started server process [314509]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
Query the server for what models are available using the Models API:
curl http://localhost:8000/v1/models
{"object":"list","data":[{"id":"zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none","object":"model","created":1691444523,"owned_by":"neuralmagic","root":"zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none","parent":null,"permission":[{"id":"modelperm-d0d9f0bb6a5c48458848e6b9a8cb8aca","object":"model_permission","created":1691444523,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}
Then you can hit the Completions API with a curl
command and see the output:
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none",
"prompt": "def fib():",
"max_tokens": 30
}'
{"id":"cmpl-4d7c32ea65e14468bbe93c63d1687ba9","object":"text_completion","created":1693451394,"model":"zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none","choices":[{"index":0,"text":"\n a, b = 0, 1\n while True:\n yield a\n a, b = b, a + b","logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":2,"total_tokens":4,"completion_tokens":2}}
There is also streaming output to enable with "stream": true
:
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none",
"prompt": "def fib():",
"max_tokens": 30,
"stream": true
}'
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "def fib():\n", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": " ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "a, ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "b ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "= ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "0, ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "1\n", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": " ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "while ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "True:\n", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": " ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "yield ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "a\n", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": " ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "a, ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "b ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "= ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "b, ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "a ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "+ ", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "b", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-14fcb54b0716430bb4f155ffd8882c8f", "object": "text_completion", "created": 1693451416, "model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": "stop"}]}
data: [DONE]
====
import openai
# Modify OpenAI's API values to use the DeepSparse API server.
openai.api_key = "EMPTY"
openai.api_base = "http://localhost:8000/v1"
# List models API
models = openai.Model.list()
print("Models:", models)
model = models["data"][0]["id"]
# Completion API
stream = False
completion = openai.Completion.create(
model=model, prompt="def fib():", stream=stream, max_tokens=30
)
print("Completion results:")
if stream:
text = ""
for c in completion:
print(c)
text += c["choices"][0]["text"]
print(text)
else:
print(completion)
Output:
Models: {
"object": "list",
"data": [
{
"id": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none",
"object": "model",
"created": 1693451467,
"owned_by": "neuralmagic",
"root": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none",
"parent": null,
"permission": [
{
"id": "modelperm-611e8298e6974b389e2da6e93b7b576b",
"object": "model_permission",
"created": 1693451467,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
]
}
Completion results:
{
"id": "cmpl-caca545954ad4c169b607e36f6a967e4",
"object": "text_completion",
"created": 1693451467,
"model": "zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none",
"choices": [
{
"index": 0,
"text": "\n a, b = 0, 1\n while True:\n yield a\n a, b = b, a + b",
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 2,
"total_tokens": 4,
"completion_tokens": 2
}
}