@@ -228,6 +228,13 @@ cd /tensorrtllm_backend
228
228
python3 scripts/launch_triton_server.py --world_size=4 --model_repo=/tensorrtllm_backend/triton_model_repo
229
229
```
230
230
231
+ When successfully deployed, the server produces logs similar to the following ones.
232
+ ```
233
+ I0919 14:52:10.475738 293 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
234
+ I0919 14:52:10.475968 293 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
235
+ I0919 14:52:10.517138 293 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
236
+ ```
237
+
231
238
### Query the server with the Triton generate endpoint
232
239
233
240
** This feature will be available with Triton 23.10 release soon**
@@ -321,16 +328,17 @@ You can have a look at the client code to see how early stopping is achieved.
321
328
#! /bin/bash
322
329
# SBATCH -o logs/tensorrt_llm.out
323
330
# SBATCH -e logs/tensorrt_llm.error
324
- # SBATCH -J gpu-comparch-ftp:mgmn
325
- # SBATCH -A gpu-comparch
326
- # SBATCH -p luna
331
+ # SBATCH -J <REPLACE WITH YOUR JOB's NAME>
332
+ # SBATCH -A <REPLACE WITH YOUR ACCOUNT's NAME>
333
+ # SBATCH -p <REPLACE WITH YOUR PARTITION's NAME>
327
334
# SBATCH --nodes=1
328
335
# SBATCH --ntasks-per-node=8
329
336
# SBATCH --time=00:30:00
330
337
331
338
sudo nvidia-smi -lgc 1410,1410
332
339
333
- srun --mpi=pmix --container-image triton_trt_llm \
340
+ srun --mpi=pmix \
341
+ --container-image triton_trt_llm \
334
342
--container-mounts /path/to/tensorrtllm_backend:/tensorrtllm_backend \
335
343
--container-workdir /tensorrtllm_backend \
336
344
--output logs/tensorrt_llm_%t.out \
@@ -351,12 +359,7 @@ ${TRITONSERVER} --model-repository=${MODEL_REPO} --disable-auto-complete-config
351
359
sbatch tensorrt_llm_triton.sub
352
360
```
353
361
354
- When successfully deployed, the server produces logs similar to the following ones.
355
- ```
356
- I0919 14:52:10.475738 293 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
357
- I0919 14:52:10.475968 293 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
358
- I0919 14:52:10.517138 293 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
359
- ```
362
+ You might have to contact your cluster's administrator to help you customize the above script.
360
363
361
364
### Kill the Triton server
362
365
0 commit comments