Description
Darwin Feedloops-Mac-Studio-2.local 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:31:00 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6020 arm64
command: python -m llama_cpp.server --model ./llava-v1.6-mistral-7b.Q8_0.gguf --port 9007 --host localhost --n_gpu_layers 33 --chat_format chatml --clip_model_path ./mmproj-mistral7b-f16.gguf
curl --location 'http://localhost:9007/v1/chat/completions'
--header 'Authorization: Bearer 1n66q24dexb1cc8abc62b185dee0dd802pn92'
--header 'Content-Type: application/json'
--data '{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "hello"
}
]
}
],
"max_tokens": 1000,
"temperature": 0,
}'
INFO: Started server process [71075]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:9007 (Press CTRL+C to quit)
llama_print_timings: load time = 1491.98 ms
llama_print_timings: sample time = 2.17 ms / 26 runs ( 0.08 ms per token, 12009.24 tokens per second)
llama_print_timings: prompt eval time = 1491.90 ms / 37 tokens ( 40.32 ms per token, 24.80 tokens per second)
llama_print_timings: eval time = 66226.55 ms / 25 runs ( 2649.06 ms per token, 0.38 tokens per second)
llama_print_timings: total time = 67791.77 ms / 62 tokens
INFO: ::1:55485 - "POST /v1/chat/completions HTTP/1.1" 200 OK
can someone help? thanks
Activity