[UX] Changing some default parameters #1659

sindhuvahinis · 2024-03-22T17:35:49Z

Description

Unifying TensorRT-LLM parameters:

TensorRT-LLM does not have do_sample parameters, so if top_p=0 and top_k=0, then it automatically does greedy, which are also the default values . So the default values of top_p and top_k has to be different from other optimization engines.
Retaining the same numbers for do_sample as before, but clairfied it in our schema doc.
In future, this will be changed if TensorRT-LLM's Inference API started supporting all the models, in which they do have do_sample parameters and also in python API.

Ref: https://github.com/NVIDIA/TensorRT-LLM/blob/66ca3378c61efa3154ed34a48cfc362351405eef/docs/source/gpt_runtime.md?plain=1#L385
NVIDIA/TensorRT-LLM#646 (comment)

In their c++ code, I could not find where they set their default parameters yet. https://github.com/NVIDIA/TensorRT-LLM/blob/66ca3378c61efa3154ed34a48cfc362351405eef/cpp/include/tensorrt_llm/runtime/samplingConfig.h#L66 - In here they set only beam_width=1

vLLM temperature:

vLLM also does not do_sample parameter and they decide to greedy decoding when temperature=0. Although, we set temperature=0 by default to enable greedy in our handler code, internally, they change it to temperature back to 1.

Code Ref: https://github.com/vllm-project/vllm/blob/v0.3.3/vllm/model_executor/sampling_metadata.py#L101

siddvenk · 2024-03-22T18:46:28Z

For vllm temperature - does that behavior of overriding temperature to 1 mean we can never do greedy sampling with vllm?

serving/docs/lmi/user_guides/lmi_input_output_schema.md

sindhuvahinis · 2024-03-22T18:48:42Z

For vllm temperature - does that behavior of overriding temperature to 1 mean we can never do greedy sampling with vllm?

No, When we pass it as zero, they do greedy encoding and they had to change it 1.0 because temp is used in soft max function and they want to avoid dividing by 0.

siddvenk · 2024-03-22T19:14:43Z

This can be done in a follow up, but I think it would be useful to show examples of requests for greedy, sampling, beam search. We list the available parameters, list the available decoding strategies, but it can be hard to figure out what i need to send to get certain behavior

sindhuvahinis · 2024-03-22T19:20:27Z

This can be done in a follow up, but I think it would be useful to show examples of requests for greedy, sampling, beam search. We list the available parameters, list the available decoding strategies, but it can be hard to figure out what i need to send to get certain behavior

Yes sounds good.

[UX] Changing some default parameters

27a9ee0

sindhuvahinis requested review from zachgk, frankfliu and a team as code owners March 22, 2024 17:35

sindhuvahinis requested a review from siddvenk March 22, 2024 17:36

siddvenk reviewed Mar 22, 2024

View reviewed changes

serving/docs/lmi/user_guides/lmi_input_output_schema.md Outdated Show resolved Hide resolved

fix typo

dbba043

siddvenk approved these changes Mar 22, 2024

View reviewed changes

sindhuvahinis merged commit bb91d07 into deepjavalibrary:master Mar 22, 2024
8 checks passed

sindhuvahinis deleted the def branch April 4, 2024 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UX] Changing some default parameters #1659

[UX] Changing some default parameters #1659

sindhuvahinis commented Mar 22, 2024

siddvenk commented Mar 22, 2024

sindhuvahinis commented Mar 22, 2024 •

edited

Loading

siddvenk commented Mar 22, 2024

sindhuvahinis commented Mar 22, 2024

[UX] Changing some default parameters #1659

[UX] Changing some default parameters #1659

Conversation

sindhuvahinis commented Mar 22, 2024

Description

siddvenk commented Mar 22, 2024

sindhuvahinis commented Mar 22, 2024 • edited Loading

siddvenk commented Mar 22, 2024

sindhuvahinis commented Mar 22, 2024

sindhuvahinis commented Mar 22, 2024 •

edited

Loading