Are you trying to just get a simple Llama2 Docker Comtainer to just work? Welp, you might have come to the right place, as I needed the same thing, couldn't find it after Googling, and thus set this up for just that.
This repo provides a set of scripts for setting up and building a local Docker container that provides a FastAPI server to interact with a local Llama2 instance and an API endpoint for submitting a query to the LLM directly using default settings. Once setup you can:
- Go to http://localhost on your local machine to use the provided Swagger UI to send
POSTrequests Llama 2:
- Directly send
POSTrequests to the http://localhost/submit endpoint to directly query Llama 2 without any UI needed:
- A machine with a NVIDIA GPU
- Not a Team Green or Red person, but CUDA is required as per Meta's documentation.
- Docker, Git, and Git LFS installed on your machine for the scripts to work
- Successfully Requested Access to Llama2 from Meta using your email address
- Meta will email you when you have been granted access
- A Hugging Face account with an email that matches the one you requested Llama2 access with (so we can use
gitto download the model weights) - Successfully Requested Access to Llama2 on Hugging Face
- Complete the steps in the
Access Llama 2 on Hugging Facesection on this page - Hugging Face will email you when you have been granted access
- This process can take 1 to 2 days
- Complete the steps in the
- Set up Git over SSH for Hugging Face so you don't have to type in credentials.
This project supports the following model weights:
Want more? Submit a pull request with an update. This is GitHub; y'all know the drill.
git clonethis repo- Run
setup.sh <weight>with<weight>being the model weight you want to useLlama-2-7b-chatis used is a weight is not provided.- This script will:
- Validate the model weight
- Ensures
gitandgit lfsare installed - Check out the Llama 2 Python Library From GitHub
- Check out the requested model weight
- This only needs to be done once per model weight.
- Run
start.sh <weight>with<weight>being the model weight you want to useLlama-2-7b-chatis used is a weight is not provided.- This script will build and start the Docker container using docker compose.
- Go to http://localhost to use the Swagger UI to send a request to the
/submitendpoint to submit a query or send aPOSTrequest to the/submitendpoint directly.
From here, feel free to start hacking what I've got here and customizing things to your needs. I've tried to comment all this to explain what's going on but this is just a base to get started.
Llama 2 resources are governed by the LLAMA 2 COMMUNITY LICENSE AGREEMENT.
The custom code in this repo is governed by The MIT License

