Skip to content

Zhwang/v0.6.1 h100 #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
6640e80
Updates for release/0.5.0, and submodule TensorRT-LLM also refreshed …
juney-nvidia Oct 15, 2023
448e0b6
Fix batch manager arch (#10)
kaiyux Oct 16, 2023
9717e97
refresh release/0.5.0 to the latest revision, including the refresh o…
juney-nvidia Oct 18, 2023
35d8620
Update README (#14)
kaiyux Oct 18, 2023
060c62f
Update submodule (#15)
kaiyux Oct 18, 2023
5deb118
update submodule for aarch64 library (release/0.5.0) (#17)
Shixiaowei02 Oct 18, 2023
448976d
Fix typo (#18)
kaiyux Oct 18, 2023
49048b4
Add aarch64 docker build doc (#21)
kaiyux Oct 19, 2023
bdbb3f1
Update submodule (#22)
kaiyux Oct 19, 2023
626ae3c
Update TRT-LLM submodule (#26)
kaiyux Oct 19, 2023
e514b4a
Update README.md (#28)
kaiyux Oct 19, 2023
47b609b
Update doc (#78)
krishung5 Nov 2, 2023
fca166c
Update TensorRT-LLM backend (#172)
kaiyux Nov 30, 2023
49b904d
Update submodule url (#175)
krishung5 Dec 1, 2023
d752821
Update submodule url (#178)
Shixiaowei02 Dec 1, 2023
867306b
Update TensorRT-LLM backend (#187)
kaiyux Dec 4, 2023
e59a626
commit
sfc-gh-zhwang Dec 10, 2023
7b31be7
commit
sfc-gh-zhwang Dec 10, 2023
a24dbb3
commit
sfc-gh-zhwang Dec 10, 2023
a2a6527
commit
sfc-gh-zhwang Dec 10, 2023
4a62782
commit
sfc-gh-zhwang Dec 10, 2023
2c19376
commit
sfc-gh-zhwang Dec 10, 2023
10c567f
commit
sfc-gh-zhwang Dec 10, 2023
83453a5
commit
sfc-gh-zhwang Dec 10, 2023
5e3516a
commit
sfc-gh-zhwang Dec 10, 2023
3a275ef
commit
sfc-gh-zhwang Dec 10, 2023
76670c9
commit
sfc-gh-zhwang Dec 10, 2023
c82bf14
commit
sfc-gh-zhwang Dec 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update README.md (triton-inference-server#28)
* Update README.md

* Update TRT-LLM submodule
  • Loading branch information
kaiyux authored Oct 19, 2023
commit e514b4af5ec87477b095d3ba6fe63cc7b797055f
23 changes: 13 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,13 @@ cd /tensorrtllm_backend
python3 scripts/launch_triton_server.py --world_size=4 --model_repo=/tensorrtllm_backend/triton_model_repo
```

When successfully deployed, the server produces logs similar to the following ones.
```
I0919 14:52:10.475738 293 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
I0919 14:52:10.475968 293 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
I0919 14:52:10.517138 293 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
```

### Query the server with the Triton generate endpoint

**This feature will be available with Triton 23.10 release soon**
Expand Down Expand Up @@ -321,16 +328,17 @@ You can have a look at the client code to see how early stopping is achieved.
#!/bin/bash
#SBATCH -o logs/tensorrt_llm.out
#SBATCH -e logs/tensorrt_llm.error
#SBATCH -J gpu-comparch-ftp:mgmn
#SBATCH -A gpu-comparch
#SBATCH -p luna
#SBATCH -J <REPLACE WITH YOUR JOB's NAME>
#SBATCH -A <REPLACE WITH YOUR ACCOUNT's NAME>
#SBATCH -p <REPLACE WITH YOUR PARTITION's NAME>
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=00:30:00

sudo nvidia-smi -lgc 1410,1410

srun --mpi=pmix --container-image triton_trt_llm \
srun --mpi=pmix \
--container-image triton_trt_llm \
--container-mounts /path/to/tensorrtllm_backend:/tensorrtllm_backend \
--container-workdir /tensorrtllm_backend \
--output logs/tensorrt_llm_%t.out \
Expand All @@ -351,12 +359,7 @@ ${TRITONSERVER} --model-repository=${MODEL_REPO} --disable-auto-complete-config
sbatch tensorrt_llm_triton.sub
```

When successfully deployed, the server produces logs similar to the following ones.
```
I0919 14:52:10.475738 293 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
I0919 14:52:10.475968 293 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
I0919 14:52:10.517138 293 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
```
You might have to contact your cluster's administrator to help you customize the above script.

### Kill the Triton server

Expand Down
2 changes: 1 addition & 1 deletion tensorrt_llm