Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX] Changing some default parameters #1659

Merged
merged 2 commits into from
Mar 22, 2024

Conversation

sindhuvahinis
Copy link
Contributor

Description

Unifying TensorRT-LLM parameters:

  • TensorRT-LLM does not have do_sample parameters, so if top_p=0 and top_k=0, then it automatically does greedy, which are also the default values . So the default values of top_p and top_k has to be different from other optimization engines.
  • Retaining the same numbers for do_sample as before, but clairfied it in our schema doc.
  • In future, this will be changed if TensorRT-LLM's Inference API started supporting all the models, in which they do have do_sample parameters and also in python API.

Ref: https://github.com/NVIDIA/TensorRT-LLM/blob/66ca3378c61efa3154ed34a48cfc362351405eef/docs/source/gpt_runtime.md?plain=1#L385
NVIDIA/TensorRT-LLM#646 (comment)

In their c++ code, I could not find where they set their default parameters yet. https://github.com/NVIDIA/TensorRT-LLM/blob/66ca3378c61efa3154ed34a48cfc362351405eef/cpp/include/tensorrt_llm/runtime/samplingConfig.h#L66 - In here they set only beam_width=1

vLLM temperature:

  • vLLM also does not do_sample parameter and they decide to greedy decoding when temperature=0. Although, we set temperature=0 by default to enable greedy in our handler code, internally, they change it to temperature back to 1.

Code Ref: https://github.com/vllm-project/vllm/blob/v0.3.3/vllm/model_executor/sampling_metadata.py#L101

@sindhuvahinis sindhuvahinis requested review from zachgk, frankfliu and a team as code owners March 22, 2024 17:35
@sindhuvahinis sindhuvahinis requested a review from siddvenk March 22, 2024 17:36
@siddvenk
Copy link
Contributor

For vllm temperature - does that behavior of overriding temperature to 1 mean we can never do greedy sampling with vllm?

@sindhuvahinis
Copy link
Contributor Author

sindhuvahinis commented Mar 22, 2024

For vllm temperature - does that behavior of overriding temperature to 1 mean we can never do greedy sampling with vllm?

No, When we pass it as zero, they do greedy encoding and they had to change it 1.0 because temp is used in soft max function and they want to avoid dividing by 0.

@siddvenk
Copy link
Contributor

This can be done in a follow up, but I think it would be useful to show examples of requests for greedy, sampling, beam search. We list the available parameters, list the available decoding strategies, but it can be hard to figure out what i need to send to get certain behavior

@sindhuvahinis
Copy link
Contributor Author

This can be done in a follow up, but I think it would be useful to show examples of requests for greedy, sampling, beam search. We list the available parameters, list the available decoding strategies, but it can be hard to figure out what i need to send to get certain behavior

Yes sounds good.

@sindhuvahinis sindhuvahinis merged commit bb91d07 into deepjavalibrary:master Mar 22, 2024
8 checks passed
@sindhuvahinis sindhuvahinis deleted the def branch April 4, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants