-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move model download from the Docker image build step to the run script #19
Conversation
…odel url. Delete unused Dockerfiles.
@edgar971 that looks pretty good. if it gets merged, i can rebase onto that and add the GPU container things to that. |
Awesome! Thanks for working on this @edgar971. I'm testing this now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @edgar971! I left some suggestions.
Co-authored-by: Mayank Chhabra <mayankchhabra9@gmail.com>
@edgar971 @mayankchhabra how well does the model preform if it's available over network vs locally? I'm wondering if it's worth mounting a network share with lots of models, or if storage should always be local to the node where it's running. |
@todaywasawesome, I've not tried a setup like so I'm not sure. But as long as the model is locked into the RAM on start (currently enabled for 13B and 70B models in their docker-compose.yml files), then, in theory, only the initial loading time should be bandwidth-bound by the local network, and interference should have no effect in speed. |
@mayankchhabra when it runs a model it always loads it all into memory? In that case network storage should be fine. |
Yep! |
The REST API is fast. I think llama.cpp has a streaming option instead of just waiting for the full response. |
@todaywasawesome FWIW, if you're looking to do this, one should be able to modify the compose script to redefine the model as a network share. I was planning on exposing mine over NFS once this change is merged, as my compute nodes don't have a lot of local storage. |
Are you able to mount the EFS to ./models and then use that as the docker compose mount? |
@edgar971 For kubernetes, yeah we can just mount the external file share to that folder and done. |
Perfect, any changes for the PR @mayankchhabra? |
Awesome, just tested and everything looks good! Thanks again for your work on this @edgar971! |
Here's the PR that addresses issue #8 by removing the model download step from the Dockerfile for the API service. This PR updates the API service in the compose files to use a Docker Volume mounted at /models. Additionally, it enhances the run.sh script with a model download manager, which checks for model existence and downloads it if absent. As an optional improvement, this PR suggests exposing the model as an environment variable, reducing the need for multiple Dockerfiles. The run.sh model download manager can then utilize this environment variable to specify, optionally run, and launch the desired model.