Skip to content

Commit

Permalink
update tutorial and doc structure
Browse files Browse the repository at this point in the history
  • Loading branch information
rohithkrn committed Feb 22, 2024
1 parent 51ac9af commit d5bf5d9
Show file tree
Hide file tree
Showing 2 changed files with 102 additions and 148 deletions.
23 changes: 23 additions & 0 deletions serving/docs/lmi_new/deployment_guide/model-artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,29 @@ model/

Please remember to turn on `option.trust_remote_code=true` or `OPTION_TRUST_REMOTE_CODE=true` if you have customized modelling and/or customized tokenizer.py files.

## TensorRT-LLM(TRT-LLM) LMI model format
TRT-LLM LMI supports loading models in a custom format that includes compiled TRT-LLM engine files and Hugging Face model config files.
Users can create these artifacts for model architectures that are supported for JIT compilation following this [tutorial](https://github.com/deepjavalibrary/djl-serving/blob/master/serving/docs/lmi/tutorials/trtllm_aot_tutorial.md). For model architectures that are not supported by TRT-LLM LMI for JIT compilation, follow this [tutorial](https://github.com/deepjavalibrary/djl-serving/blob/master/serving/docs/lmi/tutorials/trtllm_manual_convert_tutorial.md) to create model artifacts. Users can specify the resulting artifacts path as `OPTION_MODEL_ID` during deployment for faster loading than compared to raw Hugging Face model for TRT-LLM LMI.

Below directory structure represents an example of TensorRT-LLM LMI model artifacts structure.

```
trt_llm_model_repo
└── tensorrt_llm
├── 1
│ ├── trt_llm_model_float16_tp2_rank0.engine # trt-llm engine
│ ├── trt_llm_model_float16_tp2_rank1.engine # trt-llm engine
│ ├── config.json # trt-llm config file
│ └── model.cache
├── config.pbtxt # trt-llm triton backend config
├── config.json # Below are HuggingFace model config files and may vary per model
├── pytorch_model.bin.index.json
├── requirements.txt
├── special_tokens_map.json
├── tokenizer_config.json
└── tokenizer.model
```


## Storing models in S3

Expand Down
Loading

0 comments on commit d5bf5d9

Please sign in to comment.