Skip to content

Commit 509fffd

Browse files
authored
simple-inference (#47)
1 parent 882f190 commit 509fffd

File tree

6 files changed

+285
-0
lines changed

6 files changed

+285
-0
lines changed

python/llamacpp/README.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Python Llama_cpp Function (HTTP)
2+
3+
Welcome to your Llama-cpp Function which integrates a basic client side structure
4+
of the [Llama-cpp library](). The Function accepts JSON input which it processes
5+
through a local LLM and returns the generated response.
6+
7+
The Function itself uses ASGI protocol.
8+
9+
## Deployment
10+
11+
> [!Note]
12+
> We recommend using the host builder
13+
14+
```bash
15+
#Run the function locally
16+
func run --builder=host
17+
18+
#Deploy to clustert
19+
func deploy --builder=host
20+
```
21+
22+
## How to use the API
23+
24+
The Function accepts POST requests with JSON data. You can create a request like
25+
this:
26+
```bash
27+
curl localhost:8080 -d '{"input":"The largest mountain in the world is"}'
28+
```
29+
30+
GET requests return 'OK' string for a quick check.
31+
32+
## Customization
33+
34+
- The Function uses the ASGI protocol and is compatible with
35+
`handle(scope,receive,send)` signature.
36+
- You can use a local model (eg: passed through via a base image -- Dockerfile)
37+
by switching the `Llama()` function calls in the `handle()` function for the
38+
commented out code. You will need to provide a path to the model via `model_path`
39+
argument instead of a `repo_id` and `filename`.
40+
- As per usual, the Function implements a readiness and liveness checks as well
41+
as start and stop methods implemented via functions matching their names
42+
respectivelly. These can be found at the bottom of the Function class with more
43+
detailed information in the comments.
44+
45+
## Tests
46+
47+
Tests use the `pytest` framework with asyncio.
48+
49+
The function tests can be found in `tests` directory. It contains a simple
50+
http request test. This is where you can create your own tests for desired
51+
functionality.
52+
53+
```bash
54+
#Install dependencies (if not done already)
55+
pip install -e .
56+
57+
# Run the tests
58+
pytest
59+
60+
# Run verbosely
61+
pytest -v
62+
```
63+
64+
## Dependencies
65+
66+
All dependencies can be found in the `pyproject.toml` file. Any additional
67+
dependencies (eg: A model when running locally) can be also provided via the
68+
mentioned base image. You can create a Dockerfile like so:
69+
70+
```Dockerfile
71+
FROM python3.13:slim
72+
## RUN any bash commands for pip install etc.
73+
COPY /path/to/model/on/host/machine /path/to/model/in/container
74+
```
75+
76+
You will build this image for example using podman and then pass it into the
77+
Function when building it via `--base-image` flag.
78+
```bash
79+
# build my base image
80+
podman build -f Dockerfile -t my-base-image
81+
82+
# use the base image when building my Function image
83+
func build --base-image=localhost/my-base-image --builder=host
84+
85+
# or deploy immediately (builds internally)
86+
func deploy --base-image=localhost/my-base-image --builder=host
87+
```
88+
89+
which will make the model accesible for the Function.
90+
91+
For more, see [the complete documentation]('https://github.com/knative/func/tree/main/docs')
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .func import new

python/llamacpp/function/func.py

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Function
2+
import logging
3+
from llama_cpp import Llama
4+
import json
5+
6+
def new():
7+
""" New is the only method that must be implemented by a Function.
8+
The instance returned can be of any name.
9+
"""
10+
return Function()
11+
12+
class Function:
13+
def __init__(self):
14+
""" The init method is an optional method where initialization can be
15+
performed. See the start method for a startup hook which includes
16+
configuration.
17+
"""
18+
19+
async def sender(self,send,obj):
20+
# echo the obj to the calling client
21+
await send({
22+
'type': 'http.response.start',
23+
'status': 200,
24+
'headers': [
25+
[b'content-type', b'text/plain'],
26+
],
27+
})
28+
await send({
29+
'type': 'http.response.body',
30+
'body': obj.encode(),
31+
})
32+
33+
async def handle(self, scope, receive, send):
34+
"""
35+
accepts data in form of JSON with the key "input" which should
36+
contain the input string for the LLM
37+
{
38+
"input": "this is passed to the LLM"
39+
}
40+
ex: curl localhost:8080 -d '{"input":"The largest mountain in the world is"}'
41+
"""
42+
if scope["method"] == "GET":
43+
await self.sender(send,"OK")
44+
return
45+
46+
input = ""
47+
48+
# fetch all of the body from request
49+
body = b''
50+
more_body = True
51+
while more_body:
52+
message = await receive()
53+
body += message.get('body', b'')
54+
more_body = message.get('more_body', False)
55+
# decode json
56+
try:
57+
data = json.loads(body.decode('utf-8'))
58+
input = data['input']
59+
except json.JSONDecodeError:
60+
ret = "Invalid Json"
61+
except KeyError:
62+
ret = "invalid key, expected 'input'"
63+
64+
if input == "":
65+
self.sender(send,"OK")
66+
67+
# Pull model from Hugging Face Hub
68+
llm = Llama.from_pretrained(
69+
repo_id="ibm-granite/granite-3b-code-base-2k-GGUF",
70+
filename="granite-3b-code-base.Q4_K_M.gguf",
71+
n_ctx=1024,
72+
)
73+
74+
## Use a local image instead
75+
#llm = Llama (
76+
# model_path = "/granite-7b-lab-Q4_K_M.gguf/snapshots/sha256-6adeaad8c048b35ea54562c55e454cc32c63118a32c7b8152cf706b290611487/granite-7b-lab-Q4_K_M.gguf",
77+
# n_ctx = 1024,
78+
# )
79+
80+
output = llm(
81+
input,
82+
max_tokens=32,
83+
## Stop generating just before "Q:"; doesnt work well with small models
84+
## some models are more tuned to the Q: ... A: ... "chat"
85+
## You would literally type that in your input as: f' Q: {input}. A:'
86+
#stop=["Q:","\n"],
87+
echo=False,
88+
)
89+
#logging.info("------------")
90+
#logging.info(output['choices'][0]['text'])
91+
await self.sender(send,output['choices'][0]['text'])
92+
93+
def start(self, cfg):
94+
""" start is an optional method which is called when a new Function
95+
instance is started, such as when scaling up or during an update.
96+
Provided is a dictionary containing all environmental configuration.
97+
Args:
98+
cfg (Dict[str, str]): A dictionary containing environmental config.
99+
In most cases this will be a copy of os.environ, but it is
100+
best practice to use this cfg dict instead of os.environ.
101+
"""
102+
logging.info("Function starting")
103+
104+
def stop(self):
105+
""" stop is an optional method which is called when a function is
106+
stopped, such as when scaled down, updated, or manually canceled. Stop
107+
can block while performing function shutdown/cleanup operations. The
108+
process will eventually be killed if this method blocks beyond the
109+
platform's configured maximum studown timeout.
110+
"""
111+
logging.info("Function stopping")
112+
113+
def alive(self):
114+
""" alive is an optional method for performing a deep check on your
115+
Function's liveness. If removed, the system will assume the function
116+
is ready if the process is running. This is exposed by default at the
117+
path /health/liveness. The optional string return is a message.
118+
"""
119+
return True, "Alive"
120+
121+
def ready(self):
122+
""" ready is an optional method for performing a deep check on your
123+
Function's readiness. If removed, the system will assume the function
124+
is ready if the process is running. This is exposed by default at the
125+
path /health/rediness.
126+
"""
127+
return True, "Ready"

python/llamacpp/manifest.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
build:
2+
base-image: quay.io/dfridric/custom_llamacpp_base

python/llamacpp/pyproject.toml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
[project]
2+
name = "function"
3+
description = ""
4+
version = "0.1.0"
5+
requires-python = ">=3.9"
6+
readme = "README.md"
7+
license = "MIT"
8+
dependencies = [
9+
"httpx",
10+
"pytest",
11+
"pytest-asyncio",
12+
"llama_cpp-python",
13+
"huggingface-hub"
14+
]
15+
authors = [
16+
{ name="Your Name", email="you@example.com"},
17+
]
18+
19+
[build-system]
20+
requires = ["hatchling"]
21+
build-backend = "hatchling.build"
22+
23+
[tool.pytest.ini_options]
24+
asyncio_mode = "strict"
25+
asyncio_default_fixture_loop_scope = "function"
26+

python/llamacpp/tests/test_func.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
"""
2+
An example set of unit tests which confirm that the main handler (the
3+
callable function) returns 200 OK for a simple HTTP GET.
4+
"""
5+
import pytest
6+
from function import new
7+
8+
9+
@pytest.mark.asyncio
10+
async def test_function_handle():
11+
f = new() # Instantiate Function to Test
12+
13+
sent_ok = False
14+
sent_headers = False
15+
sent_body = False
16+
17+
# Mock Send
18+
async def send(message):
19+
nonlocal sent_ok
20+
nonlocal sent_headers
21+
nonlocal sent_body
22+
23+
if message.get('status') == 200:
24+
sent_ok = True
25+
26+
if message.get('type') == 'http.response.start':
27+
sent_headers = True
28+
29+
if message.get('type') == 'http.response.body':
30+
sent_body = True
31+
32+
# Invoke the Function
33+
await f.handle({}, {}, send)
34+
35+
# Assert send was called
36+
assert sent_ok, "Function did not send a 200 OK"
37+
assert sent_headers, "Function did not send headers"
38+
assert sent_body, "Function did not send a body"

0 commit comments

Comments
 (0)