feat(model): Support llama.cpp server deploy #2263

fangyinc · 2024-12-31T10:43:01Z

Description

Support llama.cpp server inference
API Server support /v1/completions
Support native generate function

How Has This Been Tested?

Install dependencies

pip install -e ".[llama_cpp_server]"

If you want to accelerate the inference speed, and you have a GPU, you can install the following dependencies:

CMAKE_ARGS="-DGGML_CUDA=ON" pip install -e ".[llama_cpp_server]"

Download the model

Here, we use the qwen2.5-0.5b-instruct model as an example. You can download the model from the Huggingface.

wget https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_k_m.gguf?download=true -O /tmp/qwen2.5-0.5b-instruct-q4_k_m.gguf

Modify configuration file

In the .env configuration file, modify the inference type of the model to start llama.cpp inference.

LLM_MODEL=qwen2.5-0.5b-instruct
LLM_MODEL_PATH=/tmp/qwen2.5-0.5b-instruct-q4_k_m.gguf
MODEL_TYPE=llama_cpp_server

Start the DB-GPT server

python dbgpt/app/dbgpt_server.py

Snapshots:

Include snapshots for easier review.

Checklist:

My code follows the style guidelines of this project
I have already rebased the commits and make the commit message conform to the project standard.
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
Any dependent changes have been merged and published in downstream modules

Aries-ckt

LGTM

csunny

LGTM

github-actions bot added enhancement New feature or request model Module: model labels Dec 31, 2024

Aries-ckt approved these changes Jan 2, 2025

View reviewed changes

fangyinc added 4 commits January 2, 2025 16:14

feat(model): Support llama.cpp server inference

6825680

feat(model): llama.cpp support accelerate

abcc241

chore: Fix parse device error

e1d21ae

feat(model): Support completions api

8350b3e

fangyinc force-pushed the llama-cpp-server branch from 866157f to 8350b3e Compare January 2, 2025 08:15

csunny approved these changes Jan 2, 2025

View reviewed changes

csunny merged commit 0b2af2e into eosphoros-ai:main Jan 2, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(model): Support llama.cpp server deploy #2263

feat(model): Support llama.cpp server deploy #2263

Uh oh!

fangyinc commented Dec 31, 2024

Uh oh!

Aries-ckt left a comment

Uh oh!

csunny left a comment

Uh oh!

Uh oh!

Uh oh!

feat(model): Support llama.cpp server deploy #2263

feat(model): Support llama.cpp server deploy #2263

Uh oh!

Conversation

fangyinc commented Dec 31, 2024

Description

How Has This Been Tested?

Install dependencies

Download the model

Modify configuration file

Start the DB-GPT server

Snapshots:

Checklist:

Uh oh!

Aries-ckt left a comment

Choose a reason for hiding this comment

Uh oh!

csunny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!