Skip to content

Commit 1a38843

Browse files
committed
Update README.md
1 parent 0ed4429 commit 1a38843

File tree

1 file changed

+13
-10
lines changed

1 file changed

+13
-10
lines changed

README.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,13 @@ cd /tensorrtllm_backend
228228
python3 scripts/launch_triton_server.py --world_size=4 --model_repo=/tensorrtllm_backend/triton_model_repo
229229
```
230230

231+
When successfully deployed, the server produces logs similar to the following ones.
232+
```
233+
I0919 14:52:10.475738 293 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
234+
I0919 14:52:10.475968 293 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
235+
I0919 14:52:10.517138 293 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
236+
```
237+
231238
### Query the server with the Triton generate endpoint
232239

233240
**This feature will be available with Triton 23.10 release soon**
@@ -321,16 +328,17 @@ You can have a look at the client code to see how early stopping is achieved.
321328
#!/bin/bash
322329
#SBATCH -o logs/tensorrt_llm.out
323330
#SBATCH -e logs/tensorrt_llm.error
324-
#SBATCH -J gpu-comparch-ftp:mgmn
325-
#SBATCH -A gpu-comparch
326-
#SBATCH -p luna
331+
#SBATCH -J <REPLACE WITH YOUR JOB's NAME>
332+
#SBATCH -A <REPLACE WITH YOUR ACCOUNT's NAME>
333+
#SBATCH -p <REPLACE WITH YOUR PARTITION's NAME>
327334
#SBATCH --nodes=1
328335
#SBATCH --ntasks-per-node=8
329336
#SBATCH --time=00:30:00
330337

331338
sudo nvidia-smi -lgc 1410,1410
332339

333-
srun --mpi=pmix --container-image triton_trt_llm \
340+
srun --mpi=pmix \
341+
--container-image triton_trt_llm \
334342
--container-mounts /path/to/tensorrtllm_backend:/tensorrtllm_backend \
335343
--container-workdir /tensorrtllm_backend \
336344
--output logs/tensorrt_llm_%t.out \
@@ -351,12 +359,7 @@ ${TRITONSERVER} --model-repository=${MODEL_REPO} --disable-auto-complete-config
351359
sbatch tensorrt_llm_triton.sub
352360
```
353361

354-
When successfully deployed, the server produces logs similar to the following ones.
355-
```
356-
I0919 14:52:10.475738 293 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
357-
I0919 14:52:10.475968 293 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
358-
I0919 14:52:10.517138 293 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
359-
```
362+
You might have to contact your cluster's administrator to help you customize the above script.
360363

361364
### Kill the Triton server
362365

0 commit comments

Comments
 (0)