-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added echo function to OpenAI API server. #1504
Conversation
@zhuohan123 request to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! Please check the comments.
def create_logprobs( | ||
token_ids: List[int], | ||
top_logprobs: Optional[PromptLogprobs] = None, | ||
num_output_top_logprobs: Optional[int] = None, | ||
initial_text_offset: int = 0, | ||
) -> LogProbs: | ||
"""Create OpenAI-style logprobs.""" | ||
logprobs = LogProbs() | ||
last_token_len = 0 | ||
for token_id, id_logprob in zip(token_ids, id_logprobs): | ||
if num_output_top_logprobs: | ||
logprobs.top_logprobs = [] | ||
for i, token_id in enumerate(token_ids): | ||
step_top_logprobs = top_logprobs[i] | ||
if step_top_logprobs is not None: | ||
token_logprob = step_top_logprobs[token_id] | ||
else: | ||
token_logprob = None | ||
token = tokenizer.convert_ids_to_tokens(token_id) | ||
logprobs.tokens.append(token) | ||
logprobs.token_logprobs.append(id_logprob[token_id]) | ||
logprobs.token_logprobs.append(token_logprob) | ||
if len(logprobs.text_offset) == 0: | ||
logprobs.text_offset.append(initial_text_offset) | ||
else: | ||
logprobs.text_offset.append(logprobs.text_offset[-1] + | ||
last_token_len) | ||
last_token_len = len(token) | ||
|
||
logprobs.top_logprobs.append({ | ||
tokenizer.convert_ids_to_tokens(i): p | ||
for i, p in id_logprob.items() | ||
}) | ||
if num_output_top_logprobs: | ||
logprobs.top_logprobs.append({ | ||
tokenizer.convert_ids_to_tokens(i): p | ||
for i, p in step_top_logprobs.items() | ||
# Filter out additional logprobs for the chosen token | ||
# This ensures the same number of top logprobs requested | ||
if not (len(step_top_logprobs) > num_output_top_logprobs | ||
and i == token_id) | ||
} if step_top_logprobs else None) | ||
return logprobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change this function? Originally this function is following OpenAI's specification:
Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually different from the implementation. Run:
completion = openai.Completion.create(
model="text-davinci-003", prompt="A robot may not injure a human being", n=1, echo=True, best_of=1,
logprobs=2, max_tokens=5, temperature=2)
Response:
/Users/yunmochen/anaconda3/envs/vllm/bin/python /Users/yunmochen/research/vllm/examples/openai_completion_client.py
Completion results:
{
"warning": "This model version is deprecated. Migrate before January 4, 2024 to avoid disruption of service. Learn more https://platform.openai.com/docs/deprecations",
"id": "cmpl-8Ft6vvaOBBcbTtdxF3P4pLX1NrP58",
"object": "text_completion",
"created": 1698797457,
"model": "text-davinci-003",
"choices": [
{
"text": "A robot may not injure a human being or muddy pave collectorCy",
"index": 0,
"logprobs": {
"tokens": [
"A",
" robot",
" may",
" not",
" injure",
" a",
" human",
" being",
" or",
" muddy",
" pave",
" collector",
"Cy"
],
"token_logprobs": [
null,
-10.330451,
-0.605383,
-3.694869,
-0.12196065,
-0.007913817,
-0.0015808053,
-0.0075414344,
-0.049919397,
-19.736057,
-11.607917,
-18.587305,
-14.40027
],
"top_logprobs": [
null,
{
"pi": -3.0112953,
".": -3.0572982
},
{
" may": -0.605383,
" must": -2.4434366
},
{
" be": -0.22049057,
" perform": -3.1058216
},
{
" injure": -0.12196065,
" harm": -2.4350166
},
{
" a": -0.007913817,
" or": -6.1911945
},
{
" human": -0.0015808053,
"\n": -7.691319
},
{
" being": -0.0075414344,
" or": -6.095392
},
{
" or": -0.049919397,
",": -3.4204276
},
{
",": -0.013517476,
" through": -4.427107
},
{
" through": -1.5732663,
" a": -1.6822541
},
{
"ments": -0.031058038,
" the": -5.4342465
},
{
",": -0.95207053,
" through": -1.8842442
}
],
"text_offset": [
0,
1,
7,
11,
15,
22,
24,
30,
36,
39,
45,
50,
60
]
},
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 5,
"total_tokens": 13
}
}
The API guarantees the log probabilities of chosen tokens present in token_logprobs
but not top_logprobs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as another datapoint, the llama-cpp-python
OpenAI-spec server also includes the probabilities of chosen tokens in top_logprobs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just about the decision whether we would like to follow the service behavior implemented by OpenAI or the documented specifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow the OpenAI's API description and revert the changes here?
@zhuohan123 Except for the unresolved conversation, I've addressed comments. |
@zhuohan123 besides the design decision for the top logprob behavior, is there anything you would like to address? |
any updates? |
I suppose that we’re still waiting for the team to get another round of review done. @zhuohan123 do we have any concern with merging the PR? |
@zhuohan123 would it be possible to get this PR re-reviewed/merged... we're heavily using this branch but would be much better if the code was in main. Haven't reviewed the code but all seems to work well (aside from the open question on whether the logprobs of the chosen tokens should be returned in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi thanks for your contributions again! I believe after these changes the PR should be able to be merged. Could you remove all the other unnecessary style changes in this PR? Thanks
def create_logprobs( | ||
token_ids: List[int], | ||
top_logprobs: Optional[PromptLogprobs] = None, | ||
num_output_top_logprobs: Optional[int] = None, | ||
initial_text_offset: int = 0, | ||
) -> LogProbs: | ||
"""Create OpenAI-style logprobs.""" | ||
logprobs = LogProbs() | ||
last_token_len = 0 | ||
for token_id, id_logprob in zip(token_ids, id_logprobs): | ||
if num_output_top_logprobs: | ||
logprobs.top_logprobs = [] | ||
for i, token_id in enumerate(token_ids): | ||
step_top_logprobs = top_logprobs[i] | ||
if step_top_logprobs is not None: | ||
token_logprob = step_top_logprobs[token_id] | ||
else: | ||
token_logprob = None | ||
token = tokenizer.convert_ids_to_tokens(token_id) | ||
logprobs.tokens.append(token) | ||
logprobs.token_logprobs.append(id_logprob[token_id]) | ||
logprobs.token_logprobs.append(token_logprob) | ||
if len(logprobs.text_offset) == 0: | ||
logprobs.text_offset.append(initial_text_offset) | ||
else: | ||
logprobs.text_offset.append(logprobs.text_offset[-1] + | ||
last_token_len) | ||
last_token_len = len(token) | ||
|
||
logprobs.top_logprobs.append({ | ||
tokenizer.convert_ids_to_tokens(i): p | ||
for i, p in id_logprob.items() | ||
}) | ||
if num_output_top_logprobs: | ||
logprobs.top_logprobs.append({ | ||
tokenizer.convert_ids_to_tokens(i): p | ||
for i, p in step_top_logprobs.items() | ||
# Filter out additional logprobs for the chosen token | ||
# This ensures the same number of top logprobs requested | ||
if not (len(step_top_logprobs) > num_output_top_logprobs | ||
and i == token_id) | ||
} if step_top_logprobs else None) | ||
return logprobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow the OpenAI's API description and revert the changes here?
@zhuohan123 I've addressed the comments. Please let me know if you found anything else to address. |
Hi thanks for fixing the issues! I tried running the following script to test: import openai
# Modify OpenAI's API key and API base to use vLLM's API server.
openai.api_key = "EMPTY"
openai.api_base = "http://localhost:8000/v1"
# List models API
models = openai.Model.list()
print("Models:", models)
model = models["data"][0]["id"]
# Completion API
stream = False
completion = openai.Completion.create(
model=model,
prompt="A robot may not injure a human being",
echo=True,
n=2,
stream=stream,
logprobs=3)
print("Completion results:")
if stream:
for c in completion:
print(c)
else:
print(completion) And got the following error on the server side:
@wanmok Can you double check on your side? I believe the bug is caused by that the first token of the prompt does not have log probability, and thus its |
I have it fixed and tested. Please let me know if you found any other issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for your contribution!
does this work? I'm getting "echo is not currently supported" @zhuohan123 @wanmok |
Are you using the latest main branch? This PR hasn't been included in the latest release yet. |
When will this be released? I work in a cuda 11.8 environment, and I haven't figured out a way to compile the repo. |
I'm having the same problem as @lifengjin. How do you get a pull request to compile? I'm running into an abi compiler incompatibility error @zhuohan123 . File "/root/miniconda3/envs/aphroditee/lib/python3.10/runpy.py", line 187, in _run_module_as_main |
The feature implements the echo function for OpenAI API server. It supersedes previous PR #959 on top of #1328. Also, it supports the issue #201.
For OpenAI API server, the PR makes following modifications:
echo=True
andmax_tokens=0
, which is a valid case in OpenAI APIs. In this case, the SamplingParams.max_tokens would be set to 1 and enable an echo_self flag. The flag is then used in the following post processing to ensure the additionally generated token being removed from the final output.