-
Notifications
You must be signed in to change notification settings - Fork 512
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'upstream/master'
- Loading branch information
Showing
30 changed files
with
970 additions
and
444 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
service: | ||
readiness_probe: / | ||
replicas: 2 | ||
|
||
resources: | ||
cloud: oci | ||
region: us-sanjose-1 | ||
ports: 8080 | ||
cpus: 2+ | ||
|
||
run: python -m http.server 8080 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# service.yaml | ||
service: | ||
readiness_probe: /v1/models | ||
replicas: 2 | ||
|
||
# Fields below describe each replica. | ||
resources: | ||
cloud: oci | ||
region: us-sanjose-1 | ||
ports: 8080 | ||
accelerators: {A10:1} | ||
|
||
setup: | | ||
conda create -n vllm python=3.12 -y | ||
conda activate vllm | ||
pip install vllm | ||
pip install vllm-flash-attn | ||
run: | | ||
conda activate vllm | ||
python -u -m vllm.entrypoints.openai.api_server \ | ||
--host 0.0.0.0 --port 8080 \ | ||
--model Qwen/Qwen2-7B-Instruct \ | ||
--served-model-name Qwen2-7B-Instruct \ | ||
--device=cuda --dtype auto --max-model-len=2048 |
Oops, something went wrong.