Maid for AI Research and Development
We use
vllm
to serve text models.infinity emb
to serve text embedding and rerank models.litellm
as proxy to serve all of these above.
These servers are OpenAI, Cohere, JinaAI compatible.
Make a file at root, vllm/.env
,
HUGGING_FACE_HUB_TOKEN=<your_api_key>
You need docker, nvidia runtime for docker, and docker compose. The setup varies on different OSes. Please have a look,
Because downloading model weights happens only once and can take very long time, we need to run some docker images first,
HF_HOME=${PWD}/.cache/huggingface huggingface-cli download your_model_name
Read compose.yaml
, and other config files before running.
docker compose up
docker compose down --rmi local
In your project folder with proper environment,
pip install openai==1.55.0 cohere==5.11.4
Read the file before running,
python main.py