diff --git a/serving/docs/lmi/configurations_large_model_inference_containers.md b/serving/docs/lmi/configurations_large_model_inference_containers.md
index 45b7fd604..4e720b400 100644
--- a/serving/docs/lmi/configurations_large_model_inference_containers.md
+++ b/serving/docs/lmi/configurations_large_model_inference_containers.md
@@ -120,10 +120,15 @@ If you specify MPI engine in TensorRT LLM container, the following parameters wi
| option.batch_scheduler_policy | No | scheduler policy of Tensorrt-LLM batch manager. | `max_utilization`, `guaranteed_no_evict`
Default value is `max_utilization` |
| option.kv_cache_free_gpu_mem_fraction | No | fraction of free gpu memory allocated for kv cache. The larger value you set, the more memory the model will try to take over on the GPU. The more memory preserved, the larger KV Cache size we can use and that means longer input+output sequence or larger batch size. | float number between 0 and 1.
Default is `0.95` |
| option.max_num_sequences | No | maximum number of input requests processed in the batch. We will apply max_rolling_batch_size as the value for it if you don't set this. Generally you don't have to touch it unless you really want the model to be compiled to a batch size that not the same as model server set | Integer greater than 0
Default value is the batch size set while building Tensorrt engine |
-| option.enable_trt_overlap | No | Parameter to overlap the execution of batches of requests. It may have a negative impact on performance when the number of requests is too small. During our experiment, we saw more negative impact to turn this on than off. | `true`, `false`.
Default is `false` |
+| option.enable_trt_overlap | No | Parameter to overlap the execution of batches of requests. It may have a negative impact on performance when the number of requests is too small. During our experiment, we saw more negative impact to turn this on than off. | `true`, `false`.
Default is `false` |
| option.enable_kv_cache_reuse | No | Let the LLM model to remember the last used input KV cache and try to reuse it in the next run. An instant benefit will be blazing fast first token latency. This is typically helpful for document understanding, chat applications that usually have the same input prefix. The TRTLLM backends will remember the prefix tree of the input and reuse most of its part for the next generation. However, this does come with the cost of extra GPU memory. | `true`, `false`.
Default is `false` |
| option.baichuan_model_version | No | Parameter that exclusively for Baichuan LLM model to specify the version of the model. Need to specify the HF Baichuan checkpoint path. For v1_13b, you should use whether baichuan-inc/Baichuan-13B-Chat or baichuan-inc/Baichuan-13B-Base. For v2_13b, you should use whether baichuan-inc/Baichuan2-13B-Chat or baichuan-inc/Baichuan2-13B-Base. More Baichuan models could be found on baichuan-inc. | `v1_7b`, `v1_13b`, `v2_7b`, `v2_13b`.
Default is `v1_13b` |
-| option.chatglm_model_version | No | Parameter exclusive to ChatGLM models to specify the exact model type. Required for ChatGLM models. | `chatglm_6b`, `chatglm2_6b`, `chatglm2_6b_32k`, `chatglm3_6b`, `chatglm3_6b_base`, `chatglm3_6b_32k`, `glm_10b`.
Default is `unspecified`, which will throw an error. |
+| option.chatglm_model_version | No | Parameter exclusive to ChatGLM models to specify the exact model type. Required for ChatGLM models. | `chatglm_6b`, `chatglm2_6b`, `chatglm2_6b_32k`, `chatglm3_6b`, `chatglm3_6b_base`, `chatglm3_6b_32k`, `glm_10b`.
Default is `unspecified`, which will throw an error. |
+| option.multi_block_mode | No | Split long kv sequence into multiple blocks (applied to generation MHA kernels). It is beneifical when `batch x num_heads` cannot fully utilize GPU. This is **not** supported for qwen model type. | `true`, `false`.
Default is `false` |
+| option.use_fused_mlp | No | Enable horizontal fusion in GatedMLP, reduces layer input traffic and potentially improves performance for large Llama models(e.g. llama-2-70b). This option is only supported for Llama model type. | `true`, `false`.
Default is `false` |
+| option.rotary_base | No | Rotary base parameter for RoPE embedding. This is supported for llama, internlm, qwen model types | `float` value.
Default is `10000.0` |
+| option.rotary_dim | No | Rotary dimension parameter for RoPE embedding. This is supported for only gptj model| `int` value.
Default is `64` |
+| option.rotary_scaling_type option.rotary_scaling_factor | No | Rotary scaling parameters. These two options should always be set together to prevent errors. These are supported for llama, qwen and internlm models| The value of `rotary_scaling_type` can be either `linear` and `dynamic`. The value of `rotary_scaling_factor` can be any value larger than 1.0. Default is `None`.|
| Advanced parameters: SmmothQuant |
| option.quantize | No | Currently only supports `smoothquant` for Llama, Mistral, InternLM and Baichuan models with just in time compilation mode. | `smoothquant` |
| option.smoothquant_alpha | No | smoothquant alpha parameter | Default value is `0.8` |