Skip to content

dinhanhx/litellm-infinity-vllm-serving-local-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAIRD

Maid for AI Research and Development

We use

  • vllm to serve text models.
  • infinity emb to serve text embedding and rerank models.
  • litellm as proxy to serve all of these above.

These servers are OpenAI, Cohere, JinaAI compatible.

Setup environment variables

Make a file at root, vllm/.env,

HUGGING_FACE_HUB_TOKEN=<your_api_key>

Launch it

You need docker, nvidia runtime for docker, and docker compose. The setup varies on different OSes. Please have a look,

Because downloading model weights happens only once and can take very long time, we need to run some docker images first,

HF_HOME=${PWD}/.cache/huggingface huggingface-cli download your_model_name 

Read compose.yaml, and other config files before running.

docker compose up
docker compose down --rmi local

Try it

In your project folder with proper environment,

pip install openai==1.55.0 cohere==5.11.4

Read the file before running,

python main.py

Releases

No releases published

Packages

No packages published

Languages